Documents#
The list of all indexed documents can be reached from the Administration interface, by clicking on Documents
. Regular expressions can be used in the search bar to match URLs or page titles.
The document page contains fields about the crawl status of the page:
Status#
Shows if the document triggered an error during its last crawl.
Error#
The error tat was triggered during last crawl if any.
Crawl DT#
The interval before the next recrawl of the document.
Recursion remaining#
The number of recursion level remaining, when the matching policy crawls Depending on depth.
Rejected by robots.txt#
This indicates if the URL was not crawled due to a robots.txt
rule. If necessary the robots.txt
can be ignored in
the Domain settings.
Too many redirects#
Indicates if the page was not crawled due to too many redirection. The limit can be set in the configuration file.
Show on homepage#
When the browsable home option is enabled, this parameter can switch availability of the document from the homepage. (See Archiving)