Dealing with Captchas¶
User agent¶
By default, the crawlers send HTTP requests with a SOSSE
User agent HTTP header this can sometime lead websites to flag the
crawler as a robot and display a Captcha. To mitigate this, SOSSE can use the
Fake user-agent library to simulate a real browser user agent.
This can be achieved with the following options in the configuration file:
user_agent: uncomment the option and make it empty
fake_user_agent_browser, fake_user_agent_os, fake_user_agent_platform: these control how the user agent is generated. It’s probably best to set the
fake_user_agent_platformtopcas some website may change there rendering on mobile platforms.