Automatically Tagging Promotions and Deals with AI and Webhooks¶
This guide shows how to configure Sosse to automatically generate ⭐ Tags for web pages — such as promotions, discounts, or other valuable offers — using AI. It uses 📡 Webhooks and ChatGPT (or other AI services) to analyze page content and produce relevant tags. For demonstration purposes, this guide uses Spree, a popular open-source eCommerce platform.
Note
While this demo uses OpenAI’s ChatGPT API, other AI providers such as Claude (Anthropic), PaLM (Google), or Cohere can be used. Webhooks are fully customizable.
Warning
To use ChatGPT, you’ll need an OpenAI API key.
Set Up a Crawl Policy¶
First, define a crawl policy to handle the web pages you want to analyze. See the ⚡ Crawl Policies for more details.
Navigate to
⚡ Crawl Policiesin the admin panel.Create a New Policy:
Use an URL pattern that matches URLs of interest, ensuring that only relevant pages are processed for tag generation:
^https://(www\.)?(example-shop|forum|blog)\.com/.*/(promo|deal|product).*
Set
RecursiontoNever Crawlto limit recursion, we’ll process only manually queued URLs.Under the
🌍 Browsertab, chooseFirefox, orChromiumas the browser.In the
Scriptfield, add a script to extract the HTML description of the product and save it in the document’smetadatafield. For example, for a Spree product page, you might use:const detailsText = document.getElementById('product-details-page'); const text = detailsText.innerHTML; return {metadata: {productDetails: text}};
Under the
🕑 Recurrence, select wanted crawl frequency (e.g.,Daily).
Note
While sending plain text to ChatGPT is more efficient, we opt to send the full HTML in this case. This allows ChatGPT to interpret visual elements, such as a strikethrough price, ensuring better contextual understanding of the data.
Page Crawling and Webhook Results¶
Navigate to the Crawl a new URL page and paste the product pages you want to index.
Click Confirm to queue the crawl jobs.
After the crawl jobs are completed, review the results on the 🔤 Documents page.
Access the full webhook response under the
📡 Webhookstab.
View the metadata generated by the webhook under the
📊 Metadatatab, which includes details like the product name and price.