🕸 Domains¶

Domain level parameters can be reached from the Administration interface, by clicking on Domain settings.

Domain settings are automatically created during crawling, but can also be updated manually or created manually.

Browse mode¶

When the collection’s Default browse mode is set to Detect, the Browse mode option of the domain defines which browsing method to use. When the domain’s browse mode is also set to Detect, the system automatically detects the optimal browsing mode the next time the domain is accessed, and updates this option to either Chromium, Firefox or Python Requests.

Ignore robots.txt¶

By default the crawler will honor the robots.txt 🤖 of the domain and follow its rules depending on the User Agent. When enabled, this option will ignore any robots.txt rule and crawl pages of the domain unconditionally.

Robots.txt status¶

One of:

Unknown: the file has not been processed yet
Empty: there is no robots.txt or it’s empty
Loaded: the file has been successfully loaded

Robots.txt allow/disallow rules¶

This contains the rules relevant to the crawlers User Agent.