Linkbot is Register's web crawler (aka "spider"), which crawls the Web to extract linked data, primarily media mentions about companies and related persons. Crawling is the process by which Linkbot discovers new and updated pages. We use a cluster of computers to crawl millions of pages related to Estonian companies and decision makers. The crawled and processed data is made available as linked data via Register Graph API (see https://developers.ir.ee for further details) for wider audience and used at https://www.inforegister.ee, Inforegister NOW! (see https://play.google.com/store/apps/details?id=ee.ir.anow&hl=et for durther details) mobile app and other applications of Register OÜ.

Linkbot's crawling begins with a seed list of webpage URLs, generated from previous crawl processes. As Linkbot visits each of these websites it detects links on each page and adds them to its list of pages to crawl.

How Linkbot accesses your site

Linkbot shouldn't access your site more than once per second on average. However, due to network delays the rate may vary over short periods of time. In order to limit the crawl rate please send an e-mail to support@ir.ee with subject "Request crawl rate change".

Blocking Linkbot from your content

Linkbot honors robots.txt when crawling content on your site. Hence the simplest way to block access to certain regions of your Web from Linkbot, is to add the files / folders to robots.txt.

After creation or change in robots.txt file there may be a small delay before Linkbot discovers your changes. If Linkbot is still crawling content you've blocked in robots.txt, check that the robots.txt is in the top directory of your Web server.


Lisa huvinimekirja


Palun logi sisse, et lisada kommentaar.

Avalda arvamust, küsi lisaks ja hääleta soovitud täienduste poolt. Foorumi lehele