
Paperwork: A Personal Document Manager for Scanned Documents - ekianjo
https://github.com/jflesch/paperwork/#readme
======
SchizoDuckie
I'm working on something similar like on-and-off but i'm trying to remove the
limitations in usability (e.g. unpack and run)

Features that _I_ for myself want / nearly have:

\- My mom should be able to install it.

\- Scan from phone, upload to central server / nas

\- Everything AES Encrypted when not logged in, decrypted on use (including
sqlite database on shutdown)

\- Documents can be decrypted with plain openssh, without having the original
program.

\- Fulltext search through invoices (by extracting PDF text and putting them
in SQLITE fulltext indexes)

\- categorisation

\- built-in PDF reader / printer

~~~
stult
I've actually been working on something similar, on and off for a few months.
I work for an accounting firm, and we often need to scan and search thousands
of invoices. I have a variety of ad hoc solutions slapped together, but an
integrated, locally hosted document management solution that isn't an uber-
expensive ediscovery service would be amazing. I just haven't had the time to
do it myself. Meaning what you're developing could be useful in an enterprise
context as well as for personal finance.

------
dbrgn
This looks fantastic.

I've been working on a similar tool, but command line based. The idea is to
create predefined profiles (e.g. bills go into a specific folder, with
specified tags and specified resolution) and then just issue "$ scan bill" on
the commandline. Never finished that one so far though. If Paperwork works for
me, I probably won't :)

Edit: By the way, OCR-ed PDF archives in a directory structure can be searched
nicely using pdfgrep ([https://pdfgrep.org/](https://pdfgrep.org/)).

------
doughj3
I'll definitely have to check this out, a personal document manager is
something I've been meaning to write for years. I'm incredibly tired of hard
copies but I'm also a packrat- documents for my cars, healthcare, family
affairs, etc. It would be nice to be able to hang on to scanned copies of
everything that doesn't need to be a hard copy (great for receipts, still
doesn't work for birth certificates or vehicle titles).

------
splitbrain
I built something like this myself using a Raspberry Pi:
[http://www.splitbrain.org/blog/2014-08/23-paper_backup_1_sca...](http://www.splitbrain.org/blog/2014-08/23-paper_backup_1_scanner_setup)

~~~
fiftyacorn
Thats interesting - I was considering something like this myself. Now Im
wondering if I can connect up my old android phone to do the same.

------
heinrichhartman
Does ist support scan to ftp?

I find the Linux/sane scanner drivers unbarable. I use a network scanner that
uploads the pdfs to an ftp server, that syncs to dropbox. Works like a charm.
Just need a tool to annotate an tag those pdfs.

~~~
sarnu
The screenshot in [https://github.com/jflesch/paperwork/wiki/File-
import](https://github.com/jflesch/paperwork/wiki/File-import) suggests that
paperwork can import already scanned files.

------
tyfon
My wife will love this. But is it possible to persist everything to a
postgresql database? We have one at home for bills and personal accounting, so
it would be great if we could utilize it for this too!

~~~
th0br0
Interesting approach! What do you use as an interface for inputting
everything?

~~~
tyfon
We have made the bill/accounting software ourselves in C++/Qt and it's not all
that polished. But it works really good for our use :)

~~~
ross-life
Out of interest, why not a "Web ui"? Do you use Qt for a desktop client or an
app? I'm writing a sort of personal "life tracker/history application" and
decided to go with a Web UI (with an offline manifest) because I didn't want
to maintain both a mobile and desktop app.

~~~
tyfon
Personally I can't stand html and javascript. In addition, writing Qt style
C++ is really fun and productive. As a bonus, our software works in both
Windows and Linux (and probably mac). Personal finance work is not so nice to
do on mobile so the app route would not suit us either.

Not saying that other solutions would not be better, but it all comes down to
what one prefer to use :)

------
lanaius
This is as good a place as any to ask this: my wife has a lot of magazines
that she wants to keep the content of but not necessarily the paper. I've been
looking for quite a while for a tool that will automate the computer side of
things but so far nothing seems just right. This is pretty close but (and
being on Windows I can't test the functionality right now) doesn't seem to
handle multi-page files? Does anyone have any input on this functionality for
this or any other similar tools?

~~~
doughj3
Depending on the magazines, I'd recommend Texture
([https://www.texture.com/](https://www.texture.com/), formerly Next Issue).
It's a subscription service, but you get access to dozens of magazines and all
the old issues for $10 per month.

------
mintplant
Not to be confused with [http://paperwork.rocks/](http://paperwork.rocks/), an
open source web-based notes app.

~~~
ekianjo
and not to be confused with Paperless, upvoted a few days ago on HN as well
(similar to Paperwork, but with less features).

~~~
heinrichf
Link:
[https://news.ycombinator.com/item?id=11063642](https://news.ycombinator.com/item?id=11063642)

------
mwarkentin
My current (half-baked) system is using ScanBot for scanning / OCR. This saves
to an "Incoming Scans" folder in Dropbox.

I then have various Hazel (OSX) rules to automatically organize things out
into various folders w/ date organization: receipts, bank statements, bills,
etc.

Works somewhat well! You can do a search in Finder, which will also search
contents, etc.

------
o_____________o
This would be a great start to an Evernote replacement. I think the features
already developed are the most difficult pieces.

------
rtpg
this looks pretty nice, I wonder how well the OCR works.

I would totally use this, but right now I have a hard time justifying actually
having a scanner in my house. I just don't have enough paper coming in (well I
do but I throw it away because I know I won't look at it)

~~~
gh02t
It says they use Tesseract as the OCR engine. My experience using Tesseract in
other places has been that it works pretty well and if you tune it a bit it
can work very well. It struggles a bit with formatted text like tables or
diagrams, but does pretty well with blocks of text.

The best (most accurate/flexible) OCR software I have ever used is ABBYY,
which works so well I can only infer that it is powered by magic.
Unfortunately that magic is proprietary, somewhat expensive (though not so bad
really) and Windows only. I used it to help my mother digitize hundreds of
pages of salary data for a consulting job she was doing where the text was
formatted oddly and even with all that we only had a handful of errors in
about 800 pages.

It doesn't have the awesome organizational stuff that Paperwork does however,
which is what I've really been wanting for a while. This is reminding me of an
awesome app I used to use when I had a Mac, which is DEVONthink. Basically a
personal document database , Mac only and extremely useful, it was one thing I
definitely missed. I use [the excellent, highly recommended for academics]
Mendeley to organize PDF journal articles and such, but it's not so great for
scans.

~~~
eliaspro
ABBY's OCR engine has been available for Linux since quite some time.
[http://www.ocr4linux.com/en:start](http://www.ocr4linux.com/en:start)

~~~
gh02t
Really? I haven't tried it, but also a big part of what makes ABBYY work well
is its GUI.

------
neil_s
Any good Windows equivalents to this? (The OCR + Search functionality, most
importantly)

~~~
criddell
Evernote? Yes, it has a lot of flaws, but the OCR seems to work really well
with my handwriting. I scan my notes in and Evernote makes them searchable.

I doubt if Evernote wrote their own OCR engine. Any idea what it is that they
use?

------
anentropic
would love to see this working on OSX

~~~
JustSomeNobody
I wonder if it just might. It's using pyGTK and that can be installed using
homebrew. But, honestly, I've only taken a literal moment to check so maybe
there's another dependency that prevents it.

