1. Knowledge Base
  2. How we get our data

How do we crawl the web?

What is the Dataprovider’s crawler?

What does the crawler do?

At Dataprovider.com we use our own in-house search engine crawler to index websites and structure the data. Every month it crawls over 270 million domains and a multiple of hostnames from more than 50 countries. Every month we update our index in order to provide comprehensive and up-to-date data to governments, statistical agencies, corporate companies and web developers. 

Our crawlers do their very best to respect customary robots.txt rules and robots meta tags. To avoid inconvenience and not overwhelm servers’ bandwidth our crawlers work very efficiently to avoid delays in the web pages load time. We distribute the crawling of your website among multiple crawlers and different crawling times. 

Why does the crawler index my website?

Our crawler crawls and indexes websites and structures the data so that governments, statistical agencies, open source intelligence and corporate companies can use it for analytical, business and research purposes. 

What can’t the crawler do?

Our crawler requests only server-side rendered data in browsers, it doesn’t extract data that is rendered on the client’s side. That means the crawler extracts data from static HTML code and can’t render website pages in JavaScript. 

How many variants of the crawler are there?

The crawler has one variant, Dataprovider. 

User agent

When a software agent operates in a network protocol, it identifies itself, its application type, operating system, software vendor and version by submitting a characteristic identification string to its operating peer: that is a user agent.

Our crawler identifies itself with a user agent containing “Dataprovider.com” which makes it visible in your web logs and site statistics programs you might use: 

COPY

"Mozilla/5.0 (compatible; Dataprovider.com)"

How can you verify the Dataprovider’s crawler?

You can verify if a web crawler that is accessing your website is actually Dataprovider’s crawler. This is useful if you are concerned that spammers or other troublemakers are accessing your website while claiming to be Dataprovider’s crawler. 

To verify if a problematic request actually comes from our crawler you need to run a reverse DNS lookup on the source IP of that request. 

There are a number of free online tools with which you can perform an immediate reverse DNS lookup: once you enter the IP address, you will be able to find the associated hostname. If the returned hostname doesn’t contain Dataprovider or Lipperhey, it means it isn’t our crawler that is sending those requests to your website.