Pip install#
Dependencies#
Before installing SOSSE, you’ll need to manually install the following softwares:
Package install#
The installation can be done with the commands:
virtualenv /opt/sosse-venv/
/opt/sosse-venv/bin/pip install sosse
Default Configuration#
The default configuration and directories can be created with the commands:
mkdir -p /run/sosse /var/log/sosse /var/www/.cache /var/www/.mozilla /var/lib/sosse/downloads /var/lib/sosse/screenshots /var/lib/sosse/html
touch /var/log/sosse/crawler.log /var/log/sosse/debug.log /var/log/sosse/main.log /var/log/sosse/webserver.log
chown -R www-data:www-data /run/sosse /var/lib/sosse /var/www/.cache /var/www/.mozilla /var/log/sosse
mkdir /etc/sosse
/opt/sosse-venv/bin/sosse-admin default_conf > /etc/sosse/sosse.conf
Static files#
Static files will be copied to their target location with the following command.
/opt/sosse-venv/bin/sosse-admin collectstatic --noinput --clear
Database setup#
Database connection parameters can be changed in the /etc/sosse/sosse.conf
file, you can find more information about each variable in the Configuration file reference).
Database creation#
The PostgreSQL database can be created with the commands:
su - postgres -c "psql --command=\"CREATE USER sosse WITH PASSWORD 'CHANGE ME';\""
su - postgres -c "psql --command=\"CREATE DATABASE sosse OWNER sosse;\""
Replace sosse
by an appropriate username and password, and set them in the /etc/sosse/sosse.conf
configuration file.
Database schema#
The initial database data can be injected with the following commands:
/opt/sosse-venv/bin/sosse-admin migrate
/opt/sosse-venv/bin/sosse-admin update_se
A default admin
user with password admin
can be created with:
/opt/sosse-venv/bin/sosse-admin default_admin
WSGI server#
You can install a WSGI server of your choice. If you wish to install uWSGI, you can do:
/opt/sosse-venv/bin/pip install uwsgi
And write the following config files:
nano /etc/sosse/uwsgi.ini
[uwsgi]
# Django-related settings
# the base directory (full path)
chdir = /
# Django's wsgi file
module = sosse.wsgi
# process-related settings
# master
master = true
# maximum number of worker processes
processes = 10
# the socket (use the full path to be safe
socket = /run/sosse/uwsgi.sock
# ... with appropriate permissions - may be needed
# chmod-socket = 664
# clear environment on exit
vacuum = true
nano /etc/sosse/uwsgi.params
uwsgi_param QUERY_STRING $query_string;
uwsgi_param REQUEST_METHOD $request_method;
uwsgi_param CONTENT_TYPE $content_type;
uwsgi_param CONTENT_LENGTH $content_length;
uwsgi_param REQUEST_URI $request_uri;
uwsgi_param PATH_INFO $document_uri;
uwsgi_param DOCUMENT_ROOT $document_root;
uwsgi_param SERVER_PROTOCOL $server_protocol;
uwsgi_param REQUEST_SCHEME $scheme;
uwsgi_param HTTPS $https if_not_empty;
uwsgi_param REMOTE_ADDR $remote_addr;
uwsgi_param REMOTE_PORT $remote_port;
uwsgi_param SERVER_PORT $server_port;
uwsgi_param SERVER_NAME $server_name;
After that, the server can be run in the background with:
mkdir /var/log/uwsgi
chown www-data:www-data /var/log/uwsgi
/opt/sosse-venv/bin/uwsgi --uid www-data --gid www-data --ini /etc/sosse/uwsgi.ini --logto /var/log/uwsgi/sosse.log &
File permissions#
It’s advised to restrict the permissions of the configuration files:
chown -R root:www-data /etc/sosse
chmod 750 /etc/sosse/
chmod 640 /etc/sosse/*
Web server#
A web server like Nginx is required to relay requests to the WSGI server. It’s configuration should be done as follows:
nano /etc/nginx/sites-available/sosse.conf
##
# You should look at the following URL's in order to grasp a solid understanding
# of Nginx configuration files in order to fully unleash the power of Nginx.
# https://www.nginx.com/resources/wiki/start/
# https://www.nginx.com/resources/wiki/start/topics/tutorials/config_pitfalls/
# https://wiki.debian.org/Nginx/DirectoryStructure
#
# In most cases, administrators will remove this file from sites-enabled/ and
# leave it as reference inside of sites-available where it will continue to be
# updated by the nginx packaging team.
#
# This file will automatically load configuration files provided by other
# applications, such as Drupal or Wordpress. These applications will be made
# available underneath a path with that package name, such as /drupal8.
#
# Please see /usr/share/doc/nginx-doc/examples/ for more detailed examples.
##
upstream django {
server unix:///run/sosse/uwsgi.sock; # for a file socket
#server 127.0.0.1:8001; # for a web port socket (we'll use this first)
}
# Default server configuration
#
server {
listen 80 default_server;
listen [::]:80 default_server;
# SSL configuration
#
# listen 443 ssl default_server;
# listen [::]:443 ssl default_server;
#
# Note: You should disable gzip for SSL traffic.
# See: https://bugs.debian.org/773332
#
# Read up on ssl_ciphers to ensure a secure configuration.
# See: https://bugs.debian.org/765782
#
# Self signed certs generated by the ssl-cert package
# Don't use them in a production server!
#
# include snippets/snakeoil.conf;
root /var/www/html;
# Add index.php to the list if you are using PHP
index index.html index.htm index.nginx-debian.html;
server_name _;
charset utf-8;
# max upload size
client_max_body_size 75M; # adjust to taste
# Django staticfiles
location /static {
alias /var/lib/sosse/static;
}
# Screenshots
location /screenshots {
alias /var/lib/sosse/screenshots;
}
# HTML snapshots
location /snap {
alias /var/lib/sosse/html/;
}
# Finally, send all non-media requests to the Django server.
location / {
uwsgi_pass unix:/run/sosse/uwsgi.sock;
include /etc/sosse/uwsgi.params; # the uwsgi_params file you installed
}
}
Then it should be enabled, and Nginx started:
rm -f /etc/nginx/sites-enabled/default
ln -s /etc/nginx/sites-available/sosse.conf /etc/nginx/sites-enabled/
nginx -g 'daemon on; master_process on;'
Crawlers#
Crawlers can now be started in the background with the command:
sudo -u www-data /opt/sosse-venv/bin/sosse-admin crawl &
Next steps#
Congrats! The installation is done, you can now point your brwoser to the Nginx and log in with the user admin
and the password admin
.
For more information about the configuration, you can follow the Administration pages.