🕸 Domains

Domain level parameters can be reached from the Administration interface, by clicking on Domain settings.

_images/domain.png

Domain settings are automatically created during crawling, but can also be updated manually or created manually.

Browse mode

When the collection’s Default browse mode is set to Detect, the Browse mode option of the domain defines which browsing method to use. When the domain’s browse mode is also set to Detect, the system automatically detects the optimal browsing mode the next time the domain is accessed, and updates this option to either Chromium, Firefox or Python Requests.

Ignore robots.txt

By default the crawler will honor the robots.txt 🤖 of the domain and follow its rules depending on the User Agent. When enabled, this option will ignore any robots.txt rule and crawl pages of the domain unconditionally.

Robots.txt status

One of:

  • Unknown: the file has not been processed yet

  • Empty: there is no robots.txt or it’s empty

  • Loaded: the file has been successfully loaded

Robots.txt allow/disallow rules

This contains the rules relevant to the crawlers User Agent.