Data Scraping Crawler

Website scraper and data extraction crawler to extract emails, social media addresses and much more.

Share:  
Table of Contents

Scrape data and extract email addresses, phone numbers and social media information with Phantombusters Web Crawler.

This web crawler has been designed for marketers, salespeople, growth-hackers and recruiters. Why? Because browsing the web for basic data such as emails, phone numbers, and Instagram, Twitter, Facebook or LinkedIn accounts is a big part of the lead generation process.

Extracting data with a web crawler consists in letting a bot browse the web for you. Specify what information you need and see it scrape the data you're looking for.

This web crawler can also be used for deep crawling. Set the depth to 2 and above and the crawler will go deeper on your target website in order to find and extract the data you're looking for.

This Data Scraping Crawler is a great first step to many of our other automations such as Instagram Auto Liker, Twitter Auto Follow, LinkedIn Company Employees and many more!


Tutorial 🚀

1. Create your account.

Free trial for 14 days. Then switch to a Free plan or become a customer.

SIGN UP

* no credit card required

2. Add this automation to your account.

3. Click on Configure me!

You'll now see the 3 configuration dots blinking. Click on them.

4. Specify which domains you want to crawle

In the target field you'll be a list of websites you want to scrape data from. This can be either one domain. Or a Google Spreadsheet URL containing many domains.

Your spreadsheet should contain a list of URLs (one link per row). You can specify the name of the column that contains the links. Simply enter the column name in the field below.

Don't forget to make that spreadsheet publicly accessible!

5. Specify the data you want to scrape with the web crawler

Tick the boxes of every data you want the web crawler to scrape. For the moment the following are available:

  • Email addresses
  • Phone Numbers
  • Facebook Page URLs
  • Instagram Profile URLs
  • Twitter account URLs
  • LinkedIn company URLs
  • YouTube channel URLs

6. Specify the condition for the web crawler to exit

In order to go on to the next website, your web crawler will need an exit condition. The reason for this is because when a set a great depth (2 and above), your crawler might get stuck on the website for quite a long time and use up all your execution time.

7. Other informations

Scrape multiple results per website: Tick this box if you want to catch not the first but every available of the items you're after on all the pages you're browsing.

Visits only websites that start with URL: This option is useful is you want to go deep only on specific pages. Only those containing the word search for instance, or companies.

Depth: The default crawler's depth is set to 0.

Start your automation!

You're all set. Just click "launch" to get your automation started!

Set this automation on repeat

Once your automation's configuration is ready, you can schedule repetitive launches. This will allow you to avoid rate limits, scrape more data and get your automated workflows to spread over days, weeks, even months.

To do so, go to your dashboard and look for your automation's “Settings” button.

Then, select a frequency:

And Save those new settings at the bottom of the page.

Output

This API will output CSV and/or JSON containing the following fields:

  • email
  • facebookUrl
  • instagramUrl
  • linkedinUrl
  • phoneNumber
  • twitterUrl
  • youtubeUrl

Share this API

Your friends & colleagues need to know about this!