Domain Settings#

Domain level parameters can be reached from the Administration interface, by clicking on Domain settings.

_images/domain_setting.png

Domain settings are automatically created during crawling, but can also be updated manually or created manually.

Browse mode#

When the policy’s Default browse mode is set to Detect, the Browse mode option of the domain define which browsing method to use. When its value is Detect, the browsing mode is detected the next time the page is accessed, and this option is switched to either Chromium, Firefox or Python Requests.

Ignore robots.txt#

By default the crawler will honor the robots.txt 🤖 of the domain and follow its rules depending on the User Agent. When enabled, this option will ignore any robots.txt rule and crawl pages of the domain unconditionally.

Robots.txt status#

One of:

  • Unknown: the file has not been processed yet

  • Empty: there is no robots.txt or it’s empty

  • Loaded: the file has been successfully loaded

Robots.txt allow/disallow rules#

This contains the rules relevant to the crawlers User Agent.