Pip install#

Dependencies#

Before installing SOSSE, you’ll need to manually install the following softwares:

Package install#

The installation can be done with the commands:

virtualenv /opt/sosse-venv/
/opt/sosse-venv/bin/pip install sosse

Default Configuration#

The default configuration and directories can be created with the commands:

mkdir -p /run/sosse /var/log/sosse /var/www/.cache /var/www/.mozilla /var/lib/sosse/downloads /var/lib/sosse/screenshots /var/lib/sosse/html
touch /var/log/sosse/crawler.log /var/log/sosse/debug.log /var/log/sosse/main.log /var/log/sosse/webserver.log
chown -R www-data:www-data /run/sosse /var/lib/sosse /var/www/.cache /var/www/.mozilla /var/log/sosse
mkdir /etc/sosse
/opt/sosse-venv/bin/sosse-admin default_conf > /etc/sosse/sosse.conf

Static files#

Static files will be copied to their target location with the following command.

/opt/sosse-venv/bin/sosse-admin collectstatic --noinput --clear

Database setup#

Database connection parameters can be changed in the /etc/sosse/sosse.conf file, you can find more information about each variable in the Configuration file reference).

Database creation#

The PostgreSQL database can be created with the commands:

su - postgres -c "psql --command=\"CREATE USER sosse WITH PASSWORD 'CHANGE ME';\""
su - postgres -c "psql --command=\"CREATE DATABASE sosse OWNER sosse;\""

Replace sosse by an appropriate username and password, and set them in the /etc/sosse/sosse.conf configuration file.

Database schema#

The initial database data can be injected with the following commands:

/opt/sosse-venv/bin/sosse-admin migrate
/opt/sosse-venv/bin/sosse-admin update_se

A default admin user with password admin can be created with:

/opt/sosse-venv/bin/sosse-admin default_admin

WSGI server#

You can install a WSGI server of your choice. If you wish to install uWSGI, you can do:

/opt/sosse-venv/bin/pip install uwsgi

And write the following config files:

nano /etc/sosse/uwsgi.ini
[uwsgi]

# Django-related settings
# the base directory (full path)
chdir           = /
# Django's wsgi file
module          = sosse.wsgi

# process-related settings
# master
master          = true
# maximum number of worker processes
processes       = 10
# the socket (use the full path to be safe
socket          = /run/sosse/uwsgi.sock
# ... with appropriate permissions - may be needed
# chmod-socket    = 664
# clear environment on exit
vacuum          = true
nano /etc/sosse/uwsgi.params

uwsgi_param  QUERY_STRING       $query_string;
uwsgi_param  REQUEST_METHOD     $request_method;
uwsgi_param  CONTENT_TYPE       $content_type;
uwsgi_param  CONTENT_LENGTH     $content_length;

uwsgi_param  REQUEST_URI        $request_uri;
uwsgi_param  PATH_INFO          $document_uri;
uwsgi_param  DOCUMENT_ROOT      $document_root;
uwsgi_param  SERVER_PROTOCOL    $server_protocol;
uwsgi_param  REQUEST_SCHEME     $scheme;
uwsgi_param  HTTPS              $https if_not_empty;

uwsgi_param  REMOTE_ADDR        $remote_addr;
uwsgi_param  REMOTE_PORT        $remote_port;
uwsgi_param  SERVER_PORT        $server_port;
uwsgi_param  SERVER_NAME        $server_name;

After that, the server can be run in the background with:

mkdir /var/log/uwsgi
chown www-data:www-data /var/log/uwsgi
/opt/sosse-venv/bin/uwsgi --uid www-data --gid www-data --ini /etc/sosse/uwsgi.ini --logto /var/log/uwsgi/sosse.log &

File permissions#

It’s advised to restrict the permissions of the configuration files:

chown -R root:www-data /etc/sosse
chmod 750 /etc/sosse/
chmod 640 /etc/sosse/*

Web server#

A web server like Nginx is required to relay requests to the WSGI server. It’s configuration should be done as follows:

nano /etc/nginx/sites-available/sosse.conf
##
# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# https://www.nginx.com/resources/wiki/start/
# https://www.nginx.com/resources/wiki/start/topics/tutorials/config_pitfalls/
# https://wiki.debian.org/Nginx/DirectoryStructure
#
# In most cases, administrators will remove this file from sites-enabled/ and
# leave it as reference inside of sites-available where it will continue to be
# updated by the nginx packaging team.
#
# This file will automatically load configuration files provided by other
# applications, such as Drupal or Wordpress. These applications will be made
# available underneath a path with that package name, such as /drupal8.
#
# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.
##

upstream django {
    server unix:///run/sosse/uwsgi.sock; # for a file socket
    #server 127.0.0.1:8001; # for a web port socket (we'll use this first)
}

# Default server configuration
#
server {
	listen 80 default_server;
	listen [::]:80 default_server;

	# SSL configuration
	#
	# listen 443 ssl default_server;
	# listen [::]:443 ssl default_server;
	#
	# Note: You should disable gzip for SSL traffic.
	# See: https://bugs.debian.org/773332
	#
	# Read up on ssl_ciphers to ensure a secure configuration.
	# See: https://bugs.debian.org/765782
	#
	# Self signed certs generated by the ssl-cert package
	# Don't use them in a production server!
	#
	# include snippets/snakeoil.conf;

	root /var/www/html;

	# Add index.php to the list if you are using PHP
	index index.html index.htm index.nginx-debian.html;

	server_name _;

    charset utf-8;

    # max upload size
    client_max_body_size 75M;   # adjust to taste

    # Django staticfiles
    location /static {
        alias /var/lib/sosse/static;
    }

    # Screenshots
    location /screenshots {
        alias /var/lib/sosse/screenshots;
    }

    # HTML snapshots
    location /snap {
        alias /var/lib/sosse/html/;
    }

    # Finally, send all non-media requests to the Django server.
    location / {
        uwsgi_pass  unix:/run/sosse/uwsgi.sock;
        include     /etc/sosse/uwsgi.params; # the uwsgi_params file you installed
    }
}

Then it should be enabled, and Nginx started:

rm -f /etc/nginx/sites-enabled/default
ln -s /etc/nginx/sites-available/sosse.conf /etc/nginx/sites-enabled/
nginx -g 'daemon on; master_process on;'

Crawlers#

Crawlers can now be started in the background with the command:

sudo -u www-data /opt/sosse-venv/bin/sosse-admin crawl &

Next steps#

Congrats! The installation is done, you can now point your brwoser to the Nginx and log in with the user admin and the password admin. For more information about the configuration, you can follow the Administration pages.