Hacker News new | past | comments | ask | show | jobs | submit login
Paperless-Ngx v2.0.0 (github.com/paperless-ngx)
182 points by rhim on Nov 29, 2023 | hide | past | favorite | 72 comments



There was a pretty big discussion about paperless-ngx a couple of months ago:

https://news.ycombinator.com/item?id=37800951 (183 comments)

I tested it out then and am considering migrating from my current system (Google Drive) to using a self-hosted approach. Paperless seems to have a good approach for minimizing the mental overhead of ingesting and categorizing new documents - which is what ultimately leads me to stacking documents up for months before processing them. My initial pilot run was promising, but I haven't gotten around to switching yet.

From the changelog, it's not really clear to me what's notable about this release, especially as a new/potential user.

This page is a better introduction to the product, although it doesn't mention the v2 release yet:

https://docs.paperless-ngx.com/


I've been using Paperless for several years now very happily and can recommend it over my previous system, also Google Drive. During the transition I found it helpful to set up a cron which (A) made an export of Paperless and (B) uploaded that export to a Google Drive folder.

One feature which seems to be quite a nice improvement (speculating as I haven't upgraded yet) is consumption templates [0]. My workflow involves an ADF scanner with an Android application, sharing the scanned PDF with Paperless Share [1] and then it's uploaded to the server via API. It seems that consumption templates will enable adjusting tags/sharing settings/permissions of a document at ingestion time based on where it's ingested from.

[0] https://github.com/paperless-ngx/paperless-ngx/pull/4196

[1] https://github.com/qcasey/paperless_share


I use syncthing to sync from paperless data folder which runs on Kubernetes (k3s).

It's a one-way sync. Paperless is the authoritative location. The only reason I back up to Google drive is so that my phone has easy access to the documents I may need on the go.


There are two dedicated mobile apps for Paperless: - https://github.com/paulgessinger/swift-paperless (iOS only, nicer interface) - https://github.com/bauerj/paperless_app (iOS and Android, built in scanner)

I use them in combination with Tailscale, both can be used to rename documents and edit tags.


Could you specify how it improves over using Google Drive or similar? Is that "just" because you control the hosting, or is the experience better?


Personally, I think it isn't really an improvement over Google Drive. Drive offers so many more features, an office suite, integration with many other Google services, etc.

That said, I don't think Paperless is supposed to fill all those gaps. For me, its sole job is to make scanned documents searchable (from anywhere with Tailscale) and durable (with encrypted off-site backups). Having this isolated from a Google account with already too-far-reaching access is a benefit in my opinion.

Edit: rereading the context, I think you were referring to how I used Google Drive before Paperless. In that case I just stored scans in Google Drive. I struggled to organize consistently and search was lackluster. Paperless improves on these, but also is much more hackable. It's easy to set up post ingestion scripts, backups, email ingestion, etc.


One feature that isn't mentioned on this release that I was looking for before actually got added in the RC1 for 2.0.0:

https://github.com/paperless-ngx/paperless-ngx/releases/tag/...

    * Feature: Implement custom fields for documents @stumpylog (#4502)


If anyone is looking to kick the tires of Paperless NGX quickly, check out my little pet project [1] for running it with Podman. I use it every week to scan papers from my Brother ADS2800w which will SFTP the PDFs into a directory for Paperless NGX to consume.

I just updated my install to v2.0.0 with a simple podman pull and a systemctl restart of my paperless pod and everything looks great. Hats off to the contributors of the project. Every update, even major ones like this have been really smooth.

1: https://github.com/jdoss/ppngx


How do you like this setup?

I've been thinking of moving from docker-compose to podman, specifically using the [podman-play-kube](https://docs.podman.io/en/v4.2/markdown/podman-play-kube.1.h...) but haven't gotten around to it.

I like Podman has a lot to offer for self-hosters but it isn't popular (yet?)


I like it a lot. I moved totally away from docker-compose and Docker a couple years ago to using only Podman and I haven't looked back. Using Podman Pods let's me isolate my workloads in their own namespaces and I can prototype a multi-service workload very quickly.

If you check out the bash script on my ppngx project you can get an idea of how you could write your own script for your workloads. I can run ./start.sh over and over again and it will replace the running containers with my changes which is a very fast DX.

The README on ppngx talks about using the podman generate systemd command to create units from the pod so you can run them via systemd, but this command is being deprecated in favor of using Quadlet [1] (systemd generator) to crate the units on the fly. I haven't gotten around to using it since I like to have more control over my systemd units. I could see Quadlet being very good for users that don't know the inner workings of systemd and podman.

1: https://docs.podman.io/en/latest/markdown/podman-systemd.uni...


I love paperless-ngx but I wish it had a rotate button. Some of my document scans are upside down.


I don't think I'd be comfortable with it having elaborate editing functionality. PDF editing in a browser is finicky, and an enormous bug fest.

I do PDF editing offline, on the desktop, then re-upload to paperless. Not the most integrated flow, but much more bulletproof. I want the PDFs themselves to be immutable once on paperless. Only metadata should be editable.


It keeps an “original” PDF and presents a working copy for modifications like OCR and metadata. Rotation is important for OCR, so rotate-and-redo is a worthwhile feature.


There is an issue about this, basically it's not going to happen because it is editing functionality. They suggest using another solution before import (build a pipeline).


It does have rotate clockwise/anticlockwise


Where? I'm pretty sure my instance doesn't.


In Settings I have "Use PDF viewer provided by the browser" checked and then see the screenshot:

https://imgur.com/TIOv1kK


The renderer does maybe. But it isn't saved and not used for OCR.

So paperless doesn't have rotating functionality.


you can use an opensource tool for scanning, like NAPS2, which will let you rotate before you mail it to paperless-ngx


I wish paperless-ngx included native advertising to printers for the "Sent to PC" feature.

Last I checked it doesn't and had to run a separate service to advertise to the printer the paperless endpoint.


What service do you run for this?


This one worked for my HP scanner https://github.com/manuc66/node-hp-scan-to

Maybe it's not something standard and every company has their own implementation :shrug:


I haven't been using it too much yet but I am really impressed by paperless-ngx so far. It just works(TM) and the auto-tagging functionality is surprisingly good, even with just a few documents in it.

Does anyone have a good scanner recommendation though? I am eyeing the Brother ADS-1700W since it seems to be recommended often, but I would really like to use the "scan to webhook" feature (it's 2023 after all) instead of SMTP or whatever else are the options I would have with the Brother.


Recommendation: https://www.quickscanapp.com/

I am using iPhone as a scanner and it automatically scans, OCRs, uploads and ingests to the paperless-ngx instance, even remotely using tailscale.

The iPhone camera is more than good enough for scanning documents.


I don't have an iPhone, but on Android there is the "Paperless Mobile" app (https://github.com/astubenbord/paperless-mobile), which can be used to scan as well. There are just some documents that I would prefer to have in proper and consistent "document scanner"-quality; I am always having a hard time with lighting using those phone scanners (although Paperless Mobile is one of the better ones I have used).


Would a document capture camera with a [ring] light also work?


Those still have the speed disadvantage of a phone camera and need more space than a compact document scanner, I'd imagine. I guess a ring light for my phone would be an improvement; using the builtin flash usually leads to very uneven lighting in the scan.


IIRC there are certain models of sheet-fed duplex scanners that work with Linux (with e.g. a lower-wattage Pi)


I had used this app prior to the addition of the Paperless-ngx integration and it worked well, but with that functionality added it's just so easy to scan and be done. I have a Brother scanner as well that I'll still use to import longer documents or anything I want in the best quality, but for 95% of things importing from this app works perfectly.


Thank you also for the tip on this one. Took a bit of work to get it working with my setup but have it working flawlessly.


Thank you for this! This reduces the friction for scanning documents a _ton_.

I love that it integrates with Paperless so well!


I am scanning from my Brother multi-function device to an SMB share, which paperless monitors for changes. Works like a charm. You can even bulk move files there using your local file manager.


Which type of brother printer do you use? And do you use it under Linux?


I'm using a Brother MFC-L3770CDW (Colour laser, with a duplexing scanner). Very reasonable price and super capable device, works fine with linux.


There is a long list of supported scanners directly from Paperless: https://github.com/paperless-ngx/paperless-ngx/wiki/Scanner-...

Personally I go with Brother ADS-1700W. I don't use it under any operating system since it is Scanner > SMB share.


Brother DCP-L2550DW here. One of the cheapest b/w multifunction devices with automatic document feeder and reasonable print and scan performance. Works like a charm on Linux, Windows, Android, and IOS.

I am using it with [NAPS2](https://www.naps2.com/), which is brilliantly simple, multi-platform, free, and open-source.


Just one of their Color-Laser scanner/printer combos. Works like a charm for iOS, Linux, MacOS, Windows.


I wish I could selectively subscribe to comments on HN, but I have to comment to do that. So this is my subscription comment. #metoo


exactly the same setup here, but i also have paperless pointing to a mailbox that i use exclusively for sending documents to.

all works perfectly.


I am using a paperless@<domain> address for this as well. Handy to archive stuff coming in via email.


I'll start with Paperless NGX sooon, and after looking around for lots of document scanners with autofeed (that are quite expensive) I found that in my office they were getting rid of a big multifunction HP printer that was sitting unused since COVID and remote work, and I got that for free.

I'll clean all the rollers and stuff next week and test it :P


I've had great luck with an Epson Workforce scanner. Originally I got it to scan ~10k family photos -- took about 1 hour and entirely smooth.

In that case I scanned to a USB drive attached to the scanner (since each photo was a separate file). For Paperless I use the Epson Smart app, scan the document with whatever settings, remove/rotate pages as needed, and then share it to Paperless with Paperless Share [0].

Many network attached scanners can scan to SMB, no device needed, but I kind of like the human-in-the-loop aspect. Since my Paperless server runs on an HDD next to the scanner I can actually hear once the file lands which is quite satisfying.

[0] https://github.com/qcasey/paperless_share


Paperless-ngx + ScanSnap iX1600. Works with a samba share that is very easy to set up in Linux these days. Fast, easy, and you can have different scan profiles to set the destination folder. Push a button for the type and a button to scan. Paperless-ngx automatically files and tags reliably. It is saving me hours per week in filing. Can't recommend it enough. This is a personal system -- not sure how it would scale to 100k - 1M+.


Almost 600 Canadian for that scanner. Is it mainly that's it's incredibly fast and can go through a stack of pages?


I've had an iX500 for a few years. You're right, and it's also a complete tank. I deployed several of them to a doctor's office that had to scan lots of paperwork every day, and they all worked perfectly all the time, every time.

They're the Brother laser printer of scanners.


Fast, stack of pages, but also compatibility with different destination types, ease of setup, and no cloud account required (looking at you Raven) and your PC doesn't have to be on for network scans (direct, not passthrough).

But yes, the scanner is pricey. It was definitely an investment.


I’ve got an ix500 and I’m suffering for no SMB support.

The only thing that comes to mind is either do a convoluted SnapScan Online -> Google Drive -> rclone -> Paperless or bite the bullet and figure out how to directly scan into the local box via USB.


For the ix1600 they updated the Firmware to support SMB.

Source (German): https://www.synology-forum.de/threads/dokumentenscanner-dire...


Paperless is one of my favorite pieces of software. A few years ago I got fed up with my filing cabinet full of folders & tons of documents that didn't quite fit into any of the categories.

I installed Paperless on my home server & spent a night digitizing everything. After being comfortable with it for a few months I went back & shredded all my paper copies. Today my process is similar - when I get a document I would normally toss in that filing cabinet I just scan, upload to Paperless, and shred it. It's also really nice for storing large purchase receipts - I've previously had the writing on thermal paper receipts go invisible after a period of time, no longer an issue.

Searching for something specific is so easy now! Huge QOL improvement. Just make sure you have a solid backup strategy, losing my Paperless database & filestore would be devastating.


Just curiosity... What does "ngx" mean in this context?

To me it means Angular (the web framework). So, I was surprised to learn this wasn't an Angular plugin. Angular is often referred to as ng for short and as such their plugins tend to have ngx as a prefix. For example, the angular wrapper for ChartJS is ngx-chartjs.


Paperless started as "paperless" but the dev stopped work so another dev forked it to "paperless-ng" (for "next generation" I think). That dev, too, stopped work, so "paperless-ngx" was created.

The paperless-ngx's core team focused on gathering a group of people to support it to avoid any burnout problems and keep the project sustainable.


The x was rather the transition away from a single maintainer to the org. Iirc that guy still sticks around


I don't know if it has a specific meaning. There have been multiple forks:

paperless (https://github.com/the-paperless-project/paperless) -> paperless-ng (https://github.com/jonaswinkler/paperless-ng/) -> paperless-ngx (https://github.com/paperless-ngx/paperless-ngx/)


As others said I'm not sure if the name relates to Angular but it's worth saying that the frontend is in fact Angular

https://github.com/jonaswinkler/paperless-ng/tree/master/src...


Paperless was a project and then it died, so it got forked to Paperless NG (Next Generation). Paperless NG died off and it got forked again to Paperless NGX.

At least that is my understanding following the Paperless project over the years.


So what will be next? Paperless NGX++?


Paperleast


I set up paperless-ngx w/ a scanner attached to my nas and a bit of scripting to get the scan button working a while back, but then forgot about it.

For me, as someone who wants my docs on my own server, but well, doesn't care enough to want to constantly keep up with forks/changes/migration/updates, I've been looking for just something stable I can use for years (or maybe decades?, eg part of the appeal of something like Obisidian is that it just falls back to .md text files).

Curious if there are any long-term active users of this (or other systems) for handling all their paper and what they think about maintainability/longevity?


I had the same concern as you when I started, and after roughly two years of use I’ve been impressed with how minimal the maintenance overhead has been.

So far I’ve probably updated the software ~5 times across various releases, each time I’ve updated it been because there was a new feature I wanted rather than needing to pull in fixes (the software has been bug free for me). The update process is well documented and very straight forward if you are using their docker compose setup to run the application


I have been using paperless for years now. There was the 1 issue a while back when the original maintainer stopped and they had to fork it. But otherwise it's super stable. They keep to semver religiously and all your documents are neatly organised in original format on disk if you ever need them.


I am in the process of getting this running on a Kubernetes cluster in my home. That’s where I throw all self-hosted containerised applications these days. But there’s a bit of friction.

Their entrypoint script makes a lot of assumptions and in their docker-compose example they use a single container running supervisord instead of multiple containers, each with a dedicated purpose (ingestion, consuming, web server). The setup is almost insistent on logging to a file instead of stdout. It also checks and tries to modify permissions of some folders(!!). This requires quite a bit of unpicking.

This is doable, but not frictionless to get it to do what I consider “best practices” but I understand that it’s probably a mix of “easy for someone who’s day job is not to be an infrastructure engineer” and “we were using supervisord for baremetal anyway”. Maybe a lot of it is personal preference but I do feel like the project is not taking containerisation fully to heart. Maybe being more user-friendly in their eyes is more important than being a containerisation purist.

Either way, I’ve got it nearly working with my Brother ADS-1700W, which has shortcuts for me, my wife, and “joint”, which uploads documents to different directories via SFTP which then automatically have their paperless-ngx owner set appropriately.


I finally switched from my ancient Mayan EDMS running an outdated version on an Ubuntu 16.04 VM that I couldn’t upgrade because the Mayan docs for that version are not available anymore. I’m not a huge user but I shred everything I can and have around 1000 documents.

I have zero regrets so far. Paperless ngx is so much more user friendly, the automatic date extraction from OCR, the auto tagging and document type classification, and the ease to backup and restore sold me. I highly recommend it.


> running an outdated version on an Ubuntu 16.04 VM that I couldn’t upgrade because the Mayan docs for that version are not available anymore

For years I was eyeing Mayan as one the variants I could use. Not anymore.


Mayan is also doing a good job but I think geared more towards businesses/enterprises and at least for the older versions long term support is an issue. It’s also just one guy, or at least it was when I last checked.

For home I’d go with paperless-ngx no contest, especially if you can run it in a docker container.


Does paperless have the same support for workflows and indexes as Mayan? I use these two features heavily to automatically place documents into a hierarchy (eg payslips into their financial year). That and the ability to add arbitrary fields like 'parent', which then means I can created a linked list style association between documents, for example a series of correspondence. It's been a while since I looked at paperless and its forks, but I understand it's not quite built to have such extensibility/flexibility?


I never used these things in Mayan but paperless has the possibility to specify a correspondant, tags, document type, date. It has pre and post consumption hooks too for you to write custom scripts that are called then and do things. There is also a setting to organize your files in sub folders based on your criteria.

It will also eventually automatically recognize the document type and tags by learning from previous documents.

The docs have a section covering this and it explains it much better than I can, have a look to see if it would fit your needs?


I recently migrated from another (more "enterprisey") open-source EDMS system that shall remain unnamed to paperless-ngx. Can't praise this high enough. Where the other system needed multiple clicks for the easiest things and had a bunch of UI antifeatures, paperless has a very intuitive and well thought-out UI and handles ~30k documents without issues.


Has any paperless user found a good way to "deskew" scanned pages? Sometimes, when scanning from my Brother printer through the ADF, the pages are skewed/rotated and it can be pretty jarring.


There's this:

https://github.com/the-paperless-project/paperless/issues/20

I don't know if it made it's way into this fork.


Deskew is on by default unless you disabled it?


I'd love for this to be able to use something like s3 as a backend and (tax) audit prove archiving.


There are various FUSE-based file systems that use S3 under the hood.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: