Monitor Websites for Specific KeywordsΒΆ
Sosse can be used to receive updates when a new page containing a specific keyword is published on a website. This functionality can be applied to a variety of use cases, such as monitoring merchant websites for promotional offers, or watching for event announcements.
For this use case, weβll monitor a website for common functional errors, like missing pages, server crashes, forbidden access, and database issues, and generate an Atom feed of faulty pages.
Creating the CollectionsΒΆ
Collections are essential for controlling how Sosse accesses and logs content from websites. For more details, see the Collections documentation.
Create a collection for the website that we want to monitor, with the parameters:
In the
β‘ Crawltab, setUnlimited depth URL regexto^https://my.broken-website.com/.*to target the website.In the
π Archivetab, disableArchive content(as we donβt need to archive the original feed).In the
π Recurrencetab, setCrawl frequencytoConstant timeand clear theRecrawl dt maxfield.
Start CrawlingΒΆ
To start crawling, go to the Crawl a new URL page and
enter the URL of the homepage:
https://my.broken-website.com/.
Check the parameters, then click Add to Crawl Queue. Once confirmed, Sosse will begin
crawling and logging any pages that match
the regular expression from the Collection every day.
Generate Atom FeedΒΆ
To get notified of errors, create a search with the following parameters:
Sort:
Last modified descending. This ordering causes the feed to generate new entries for previously known pages whenever they are modified.Search options:
Action:
KeepField:
DocumentOperator:
Matching RegexValue:
(Database Connection Failed|Internal Server Error|Not Found|Forbidden| Bad Gateway|Service Unavailable|Gateway Timeout|Request Timeout)
The pages in error can then be followed by subscribing to the
Atom results feed (see Atom feeds).
Additional OptionsΒΆ
You may need to update the Collection to use a browser if the site relies on JavaScript or requires authentication to access private areas. Additionally, it could be useful to configure the atom feed to function while anonymous searches are disabled. Once configured, you can integrate it with services like Zapier or IFTTT to trigger notifications whenever a new error is detected.