Phantom
1 slot

Data Scraping Crawler

Tutorial

  1. Setup summary

    Here's a tutorial to help you set up the Data Scraping Crawler:

  2. Give the URLs of the web pages you're interested in

    You have two options:

    1. Process a single website
    Enter your URL directly into the Phantom's setup.

    2. Process multiple websites
    Create a spreadsheet with Google Sheets. Copy the website URLs and paste them into your spreadsheet - one URL per row, all in column A.

    list of urls

    Make this spreadsheet public so PhantomBuster can access it.

    Screenshot_from_2019 11 20_19 01 17

    Copy the spreadsheet URL and paste it into your Phantom's setup.

  3. Specify which contact and social media data you want to scrape

    Select which of the data below you would like to scrape from each website by checking the boxes of each one you want:

    • Email addresses

    • Phone numbers

    • Facebook page URLs

    • Instagram profile URLs

    • Twitter profile URLs

    • LinkedIn company page URLs

    • YouTube channel URLs

  4. Pick the condition under which the Phantom should exit a website

    When you really start digging into a website, you'll realize you can dig very far and even get lost in the vast depths of some. To stop your Phantom continuously digging and getting stuck on one site, you'll need to choose a condition under which it will stop its scrape and exit the site. This will save your execution time and ensure all your websites get processed.

    Depending on what you want, you can choose from the following options:

    • Website depth
      This refers to the number of layers of a site the Phantom will visit before exiting, collecting any relevant data it finds along the way.
      For example, setting a depth of 0 means that the Phantom will visit the URL you've given and go no further, while a depth of 1 means it will also visit every link on that page, and a depth of 2 every link on those pages, and so on.
      Take note: Setting a depth of 2 or more is not recommended, as it means your Phantom will likely take a very long time to run.

    • When an email address is found
      The Phantom will exit after having found an email address.

    • When a phone number is found
      The Phantom will exit after having found a phone number.

    • When a social network is found
      The Phantom will exit after having found any social media page.

    • After having opened X pages
      The Phantom will exit after having opened the defined number of subpages on the site.

  5. Define advanced settings

    Scrape multiple results per website
    Check this box if you want to extract not only the first but every available instance of the data you're scraping on all the pages you're browsing.

    Only visit web pages that start with a particular root URL
    This option is useful if you only want to visit specific pages within a website.
    For example, if scraping the PhantomBuster website, you could use https://phantombuster.com/phantombuster?category=linkedin
    as a root URL so that the Phantom will only visit pages starting with this, i.e. LinkedIn Phantom pages such as https://phantombuster.com/automations/linkedin/3149/linkedin-search-export

  6. Set the Phantom on repeat

    Automation always produces better results in the long run. Set the Phantom to launch repeatedly and get results while you're away!

    This Phantom runs from the cloud, which means you don't even need to have your browser open or computer on for these launches to happen.

    Screenshot_2025 03 31_at_14.59.32

    For more automatic launch options, click on the three little dots in the top right and "Show advanced settings."