SOSSE 🦦#

SOSSE (Selenium Open Source Search Engine) is a Web archiving software, crawler and search engine written in Python, distributed under the GNU-AGPLv3 license. It is hosted on both Gitlab and Github site, please use any of them to open feature requests, bug report or merge requests, or open a discussion.

SOSSE main features are:

  • 🌍 Browser based crawling: SOSSE uses Mozilla Firefox, or Google Chromium and Selenium to index pages that use Javascript. Requests can also be used for faster crawling

  • πŸ“š Offline browsing: SOSSE can save HTML copy or take screenshots of crawled pages to create archives suitable for offline browsing

  • πŸ“‰ Low resources requirements: SOSSE is entirely written in Python and uses PostgreSQL for data storage

  • πŸ”“ Authentication: the crawlers can submit authentication forms with provided credentials

  • πŸ”— Search engines shortcuts: shortcuts search queries can be used to redirect to external search engines (sometime called β€œbang” searches)

  • πŸ”– Search history: users can authenticate to log their search query history privately

See the documentation and screenshots.

Try it out#

You can try the latest version with Docker:

docker run -p 8005:80 biolds/sosse:latest

Open http://127.0.0.1:8005/, and log in with user admin, password admin.

To persist Docker data, or find alternative installation methods, please check the install pages.

Keep in touch#

Join the Discord server to get help and share ideas!