Utilities

There’s basically three utilities to Paperless: the webserver, consumer, and if needed, the exporter. They’re all detailed here.

The Webserver

At the heart of it, Paperless is a simple Django webservice, and the entire interface is based on Django’s standard admin interface. Once running, visiting the URL for your service delivers the admin, through which you can get a detailed listing of all available documents, search for specific files, and download whatever it is you’re looking for.

How to Use It

The webserver is started via the manage.py script:

$ /path/to/paperless/src/manage.py runserver

By default, the server runs on localhost, port 8000, but you can change this with a few arguments, run manage.py --help for more information.

Add the option --noreload to reduce resource usage. Otherwise, the server continuously polls all source files for changes to auto-reload them.

Note that when exiting this command your webserver will disappear. If you want to run this full-time (which is kind of the point) you’ll need to have it start in the background – something you’ll need to figure out for your own system. To get you started though, there are Systemd service files in the scripts directory.

The Consumer

The consumer script runs in an infinite loop, constantly looking at a directory for documents to parse and index. The process is pretty straightforward:

  1. Look in CONSUMPTION_DIR for a document. If one is found, go to #2. If not, wait 10 seconds and try again. On Linux, new documents are detected instantly via inotify, so there’s no waiting involved.
  2. Parse the document with Tesseract
  3. Create a new record in the database with the OCR’d text
  4. Attempt to automatically assign document attributes by doing some guesswork. Read up on the guesswork documentation for more information about this process.
  5. Encrypt the document (if you have a passphrase set) and store it in the media directory under documents/originals.
  6. Go to #1.

How to Use It

The consumer is started via the manage.py script:

$ /path/to/paperless/src/manage.py document_consumer

This starts the service that will consume documents as they appear in CONSUMPTION_DIR.

Note that this command runs continuously, so exiting it will mean your webserver disappears. If you want to run this full-time (which is kind of the point) you’ll need to have it start in the background – something you’ll need to figure out for your own system. To get you started though, there are Systemd service files in the scripts directory.

Some command line arguments are available to customize the behavior of the consumer. By default it will use /etc/paperless.conf values. Display the help with:

$ /path/to/paperless/src/manage.py document_consumer --help

The Exporter

Tired of fiddling with Paperless, or just want to do something stupid and are afraid of accidentally damaging your files? You can export all of your documents into neatly named, dated, and unencrypted files.

How to Use It

This too is done via the manage.py script:

$ /path/to/paperless/src/manage.py document_exporter /path/to/somewhere/

This will dump all of your unencrypted documents into /path/to/somewhere for you to do with as you please. The files are accompanied with a special file, manifest.json which can be used to import the files at a later date if you wish.

Docker

If you are using Docker, running the expoorter is almost as easy. To mount a volume for exports, follow the instructions in the docker-compose.yml.example file for the /export volume (making the changes in your own docker-compose.yml file, of course). Once you have the volume mounted, the command to run an export is:

$ docker-compose run --rm consumer document_exporter /export

If you prefer to use docker run directly, supplying the necessary commandline options:

$ # Identify your containers
$ docker-compose ps
        Name                       Command                State     Ports
-------------------------------------------------------------------------
paperless_consumer_1    /sbin/docker-entrypoint.sh ...   Exit 0
paperless_webserver_1   /sbin/docker-entrypoint.sh ...   Exit 0

$ # Make sure to replace your passphrase and remove or adapt the id mapping
$ docker run --rm \
    --volumes-from paperless_data_1 \
    --volume /path/to/arbitrary/place:/export \
    -e PAPERLESS_PASSPHRASE=YOUR_PASSPHRASE \
    -e USERMAP_UID=1000 -e USERMAP_GID=1000 \
    paperless document_exporter /export

The Importer

Looking to transfer Paperless data from one instance to another, or just want to restore from a backup? This is your go-to toy.

How to Use It

The importer works just like the exporter. You point it at a directory, and the script does the rest of the work:

$ /path/to/paperless/src/manage.py document_importer /path/to/somewhere/

Docker

Assuming that you’ve already gone through the steps above in the export section, then the easiest thing to do is just re-use the /export path you already setup:

$ docker-compose run --rm consumer document_importer /export

Similarly, if you’re not using docker-compose, you can adjust the export instructions above to do the import.

The Re-tagger

Say you’ve imported a few hundred documents and now want to introduce a tag and apply its matching to all of the currently-imported docs. This problem is common enough that there’s a tool for it.

How to Use It

This too is done via the manage.py script:

$ /path/to/paperless/src/manage.py document_retagger

That’s it. It’ll loop over all of the documents in your database and attempt to match all of your tags to them. If one matches, it’ll be applied. And don’t worry, you can run this as often as you like, it won’t double-tag a document.