Dealing with Captchas¶
User agent¶
By default, the crawlers send HTTP requests with a SOSSE User agent HTTP header
this can sometime lead websites to flag the crawler as a robot and display a Captcha.
To mitigate this, SOSSE can use the Fake user-agent library to simulate a real
browser user agent. This can be achieved with the following options in the configuration file:
user_agent: uncomment the option and make it empty
fake_user_agent_browser, fake_user_agent_os, fake_user_agent_platform: these control how the user agent is generated. It’s probably best to set the
fake_user_agent_platformtopcas some website may change there rendering on mobile platforms.