Monitor Websites for Specific KeywordsΒΆ

Sosse can be used to receive updates when a new page containing a specific keyword is published on a website. This functionality can be applied to a variety of use cases, such as monitoring merchant websites for promotional offers, or watching for event announcements.

For this use case, we’ll monitor a website for common functional errors, like missing pages, server crashes, forbidden access, and database issues, and generate an Atom feed of faulty pages.

Creating the CollectionsΒΆ

Collections are essential for controlling how Sosse accesses and logs content from websites. For more details, see the Collections documentation.

Create a collection for the website that we want to monitor, with the parameters:

  • In the ⚑ Crawl tab, set Unlimited depth URL regex to ^https://my.broken-website.com/.* to target the website.

  • In the πŸ”– Archive tab, disable Archive content (as we don’t need to archive the original feed).

  • In the πŸ•‘ Recurrence tab, set Crawl frequency to Constant time and clear the Recrawl dt max field.

../_images/guide_feed_website_monitor_collections.png

Start CrawlingΒΆ

To start crawling, go to the Crawl a new URL page and enter the URL of the homepage: https://my.broken-website.com/.

Check the parameters, then click Add to Crawl Queue. Once confirmed, Sosse will begin crawling and logging any pages that match the regular expression from the Collection every day.

Generate Atom FeedΒΆ

To get notified of errors, create a search with the following parameters:

  • Sort: Last modified descending. This ordering causes the feed to generate new entries for previously known pages whenever they are modified.

  • Search options:

    • Action: Keep

    • Field: Document

    • Operator: Matching Regex

    • Value:

      (Database Connection Failed|Internal Server Error|Not Found|Forbidden|
      Bad Gateway|Service Unavailable|Gateway Timeout|Request Timeout)
      

The pages in error can then be followed by subscribing to the Atom results feed (see Atom feeds).

../_images/guide_feed_website_monitor_error_search.png

Additional OptionsΒΆ

You may need to update the Collection to use a browser if the site relies on JavaScript or requires authentication to access private areas. Additionally, it could be useful to configure the atom feed to function while anonymous searches are disabled. Once configured, you can integrate it with services like Zapier or IFTTT to trigger notifications whenever a new error is detected.