

A lot of people come to PhantomBuster asking us for the phone numbers and emails of companies being listed in some directory. How infuriating is it to see the data you need right in front of you, but to know that to access it you would need several hours of mindless copy/pasting?
So here are the strategies we apply when someone asks us for this kind of data. This is no magic: In most cases, it'll be easy, sometimes a bit more advanced, and sometimes... impossible (without code).
But at least you'll now quickly know in which category you can hopefully access these emails and phone numbers of potentially thousands of leads.
Case 1: The website displays an index of all exhibitors on one single page

Example: R&R2019 Trade show
Situation: This situation is the easiest; We have here a list of all the exhibitors gathered on one single page. You simply have to click on the company name to see a full-page displaying the phone and email of each company.
Strategy: Use a Data Scraping Crawler on the index page which will extract every link there. Set its depth setting to "1" so that it visits all pages and extract their phones and emails.
Case 2: The trade-show website you're targeting displays all results over multiple pages

Example: All4Pack
Situation: Loads of exhibitions and lists have 1000s of participants. A common way to display this large amount of results if often to divide them up over multiple pages. So how do we get a list of all these companies if they are scattered all over the place?
Strategy: Let's use a scary-but-for-real-its-not-that-scary technical word: we'll forge URLs:
Notice the URL of one page. It can be by page: exhibition.com/exhibitors/page-1 or any variation of this. First letters A, B, C D, of the company names are also a classic one.
Paste that URL in a Google Spreadsheet.
And generate all the pages' URLs as follow:

Case 3: The results are displayed on a page with an infinite scroll

The semi-automated strategy
Example: Global Industry Exhibition
Challenge: In this case, we can't see more than a few results without having to scroll but we also couldn't find a sitemap. How do we do it?
Solution: In this case, we'll first scroll to the bottom of the page (it might take a few minutes and a finger cramp) in have all the links on the page. Then we'll use a "Link Clipper" browser extension which will extract for you all links to a spreadsheet. Link Klipper for instance.
You'll then need to clean the file before giving it as an input to Data Scraping Crawler.
The Sitemap strategy
Example: Interpack.de
Challenge: The issue here is that using Data Scraping Crawler on this page would extract the first links displayed but would miss the other 100(0)s others. We need to find a way to create that list of URLs. Let's see how we can make
Solution: Most websites have a sitemap. Sitemaps are useful to Google to make sure they see and index every page. They are also useful to use because they gather in one place all the pages, sometimes including the pages of our exhibitors. So how to find it?
In some instances, typing the https://www.website.com/
sitemap.xml
is enough. You can try it with yours truly: https://phantombuster.com/sitemap.xml is gathering all our pages indexed by search engines.
But in Interpack's case it, it does not work. Developers of the site must have opted for another location of the sitemap.xml. In this case, you can use a tool such as SEO Site Checkup's Site Map extractor. Which will find all probable sitemaps of the website. In Interpack's case, it found two .XML files. Open each file and look how many URLs match the default path of exhibitors' pages. In this case we hit the jackpot :)

Sexy, huh? Try not freaking out and notice instead how it's gathering every URLs of every exhibitor! Let's now convert save the file (CMD/CTRL + S) this file. Then convert it from xml to csv. And here you go:

Just use this as an input to Data Scraping Crawler and you'll get all phone and emails of all packaging company listed at this German event.

Case 4: Once on a company page, you need to click a button to access the data

Challenge: The data is not visible unless you click on some button or pop-up. This is where crawlers show their limits.
Solution #1: Visual programming
Some extension will allow you to do visual programming. It means showing the tool which sequence of action you want to do, then have it do it for you. Data-Miner is a nice solution if you feel like getting your hands dirty.
Solution #2: Custom code
If the emails and phone numbers you're after are displayed on the website but you can not scrape them with any of the above technics, it might be time to call a pro. Look for a freelancer with experience in scraping (the following keywords might come in handy: "puppeteer", "python", "scraping",) or send us an email at support@phantombuster.com and we'll hook you up with a good freelance scraper.
Solution #3: Share your ideas!
Before you go
Scraping ranges from pretty easy to very hard. This is why solutions such as PhantomBuster are thriving. They will handle the security, limitations, rotating IPs, changing websites, multi-version & progressive rool-outs, country differences, etc for you.
But when custom code is not an option, a little bit of thinking with the help of these strategies can help you accomplish a lot.
If you have a challenging use-case or a cool solution to add to this guide, please share it in the comments below. And follow us on Facebook and Twitter to get our latest ideas to help you grow faster!