Hacker News new | past | comments | ask | show | jobs | submit login
Paperwork: A Personal Document Manager for Scanned Documents (github.com)
191 points by ekianjo on Feb 19, 2016 | hide | past | web | favorite | 35 comments

I'm working on something similar like on-and-off but i'm trying to remove the limitations in usability (e.g. unpack and run)

Features that I for myself want / nearly have:

- My mom should be able to install it.

- Scan from phone, upload to central server / nas

- Everything AES Encrypted when not logged in, decrypted on use (including sqlite database on shutdown)

- Documents can be decrypted with plain openssh, without having the original program.

- Fulltext search through invoices (by extracting PDF text and putting them in SQLITE fulltext indexes)

- categorisation

- built-in PDF reader / printer

I've actually been working on something similar, on and off for a few months. I work for an accounting firm, and we often need to scan and search thousands of invoices. I have a variety of ad hoc solutions slapped together, but an integrated, locally hosted document management solution that isn't an uber-expensive ediscovery service would be amazing. I just haven't had the time to do it myself. Meaning what you're developing could be useful in an enterprise context as well as for personal finance.

This looks fantastic.

I've been working on a similar tool, but command line based. The idea is to create predefined profiles (e.g. bills go into a specific folder, with specified tags and specified resolution) and then just issue "$ scan bill" on the commandline. Never finished that one so far though. If Paperwork works for me, I probably won't :)

Edit: By the way, OCR-ed PDF archives in a directory structure can be searched nicely using pdfgrep (https://pdfgrep.org/).

I'll definitely have to check this out, a personal document manager is something I've been meaning to write for years. I'm incredibly tired of hard copies but I'm also a packrat- documents for my cars, healthcare, family affairs, etc. It would be nice to be able to hang on to scanned copies of everything that doesn't need to be a hard copy (great for receipts, still doesn't work for birth certificates or vehicle titles).

I built something like this myself using a Raspberry Pi: http://www.splitbrain.org/blog/2014-08/23-paper_backup_1_sca...

Thats interesting - I was considering something like this myself. Now Im wondering if I can connect up my old android phone to do the same.

Is the ocr async? How good are the results? Are you using terreract? What kind of scanning speed can you handle?

Does ist support scan to ftp?

I find the Linux/sane scanner drivers unbarable. I use a network scanner that uploads the pdfs to an ftp server, that syncs to dropbox. Works like a charm. Just need a tool to annotate an tag those pdfs.

The screenshot in https://github.com/jflesch/paperwork/wiki/File-import suggests that paperwork can import already scanned files.

Taking the essential parts of TesseractOCR and inotify{watch,wait} is not hard. I've written hacky little OCR scripts in Python and Bash in afternoons before.

My wife will love this. But is it possible to persist everything to a postgresql database? We have one at home for bills and personal accounting, so it would be great if we could utilize it for this too!

Interesting approach! What do you use as an interface for inputting everything?

We have made the bill/accounting software ourselves in C++/Qt and it's not all that polished. But it works really good for our use :)

Out of interest, why not a "Web ui"? Do you use Qt for a desktop client or an app? I'm writing a sort of personal "life tracker/history application" and decided to go with a Web UI (with an offline manifest) because I didn't want to maintain both a mobile and desktop app.

Personally I can't stand html and javascript. In addition, writing Qt style C++ is really fun and productive. As a bonus, our software works in both Windows and Linux (and probably mac). Personal finance work is not so nice to do on mobile so the app route would not suit us either.

Not saying that other solutions would not be better, but it all comes down to what one prefer to use :)

Electron is a good solution if you need to access certain features of a desktop application (e.g. access to files), but still want something that is easily web based.

I'm going to use it to build an iPhoto replacement, what I've got so far is web based but I'll package it up with Electron.

This is as good a place as any to ask this: my wife has a lot of magazines that she wants to keep the content of but not necessarily the paper. I've been looking for quite a while for a tool that will automate the computer side of things but so far nothing seems just right. This is pretty close but (and being on Windows I can't test the functionality right now) doesn't seem to handle multi-page files? Does anyone have any input on this functionality for this or any other similar tools?

Depending on the magazines, I'd recommend Texture (https://www.texture.com/, formerly Next Issue). It's a subscription service, but you get access to dozens of magazines and all the old issues for $10 per month.

Could you get digital copies from the manufacturer?

Not to be confused with http://paperwork.rocks/, an open source web-based notes app.

and not to be confused with Paperless, upvoted a few days ago on HN as well (similar to Paperwork, but with less features).

My current (half-baked) system is using ScanBot for scanning / OCR. This saves to an "Incoming Scans" folder in Dropbox.

I then have various Hazel (OSX) rules to automatically organize things out into various folders w/ date organization: receipts, bank statements, bills, etc.

Works somewhat well! You can do a search in Finder, which will also search contents, etc.

This would be a great start to an Evernote replacement. I think the features already developed are the most difficult pieces.

this looks pretty nice, I wonder how well the OCR works.

I would totally use this, but right now I have a hard time justifying actually having a scanner in my house. I just don't have enough paper coming in (well I do but I throw it away because I know I won't look at it)

It says they use Tesseract as the OCR engine. My experience using Tesseract in other places has been that it works pretty well and if you tune it a bit it can work very well. It struggles a bit with formatted text like tables or diagrams, but does pretty well with blocks of text.

The best (most accurate/flexible) OCR software I have ever used is ABBYY, which works so well I can only infer that it is powered by magic. Unfortunately that magic is proprietary, somewhat expensive (though not so bad really) and Windows only. I used it to help my mother digitize hundreds of pages of salary data for a consulting job she was doing where the text was formatted oddly and even with all that we only had a handful of errors in about 800 pages.

It doesn't have the awesome organizational stuff that Paperwork does however, which is what I've really been wanting for a while. This is reminding me of an awesome app I used to use when I had a Mac, which is DEVONthink. Basically a personal document database , Mac only and extremely useful, it was one thing I definitely missed. I use [the excellent, highly recommended for academics] Mendeley to organize PDF journal articles and such, but it's not so great for scans.

ABBY's OCR engine has been available for Linux since quite some time. http://www.ocr4linux.com/en:start

Really? I haven't tried it, but also a big part of what makes ABBYY work well is its GUI.

Evernote built this app, Scannable, which, amazingly, works. Unfortunately only iOS.

And closed-source.

Any good Windows equivalents to this? (The OCR + Search functionality, most importantly)

Evernote? Yes, it has a lot of flaws, but the OCR seems to work really well with my handwriting. I scan my notes in and Evernote makes them searchable.

I doubt if Evernote wrote their own OCR engine. Any idea what it is that they use?

I use a system called PaperPort. Works fairly well.

would love to see this working on OSX

I wonder if it just might. It's using pyGTK and that can be installed using homebrew. But, honestly, I've only taken a literal moment to check so maybe there's another dependency that prevents it.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact