
Automating receipt processing with deep learning - ole_gooner
https://nanonets.com/blog/receipt-ocr/
======
axiosgunnar
What is the point of spamming [1] HN with this low-effort content marketing
for Nanonets that barely scratches the surface of the topic but instead is
full of obnoxious calls to action?

[1]: [https://i.imgur.com/eIhwtjo.png](https://i.imgur.com/eIhwtjo.png)

~~~
amelius
> Need to digitize documents, receipts or invoices but too lazy to code? Head
> over to Nanonets and build OCR models for free!

I think that once you go down the path of OCR'ing your own invoices, you
better start a company around it.

~~~
TrackerFF
Gonna be honest, I think it's a dead-end. The solution IMO will come from the
supply side; Easier to get receipts digitally from the store / vendor /
seller, than to spend all this effort on converting physical receipts to
digital.

Sure - there's good retro usage, I'd rather push for sellers to offer you
digital receipts.

In fact, I already get that from my local supermarket. My bank card is
registered to their app, so every time I swipe my card or scan my app (when
paying with cash), I get a receipt on the app, which I can export.

I remember back in college, over 10 years ago, this was a very hot topic.
Receipt management was one of those entrepreneur ideas that would always pop
up.

~~~
choward
> My bank card is registered to their app, so every time I swipe my card or
> scan my app (when paying with cash), I get a receipt on the app

That's great but why does this need to be an app? Why can't it be sent via
email or have a website I can log into? I'm not downloading an app for every
company I want to do business with.

------
tastyminerals
It's a nice overview article for anyone interested in the topic of IE from
financial documents. However, for industrial level solutions Tesseract does
not cut it. Abbyy is the best OCR engine on the market currently. Receipt IE
works just fine with rules supported by a small BiLSTM as fallback just
because receipts do not contain a lot of text. With invoices this approach is
suboptimal. On a general note no DL approach will give you fast and high
enough results just because any advanced network would be too slow and too
generic. If extraction takes more than 5 sec. it is hard to sell such system.

~~~
mywacaday
I had abbyy a long time ago and had some uses for it so i went to the website
to check out the cost and was met with this monstrosity "Protect your shopping
cart downloads with Download Insurance Service. For only £11.00 you will be
able to download your files for 24 months, in case you need to reinstall the
products. ABBYY Screenshot Reader". Ripoff!!, I've never seen anything like it
elsewhere, has anybody else?

~~~
tastyminerals
it is the market leader and monopolist, this is what you get.

------
maxnoe
Probably relevant:
[https://youtu.be/c0O6UXrOZJo](https://youtu.be/c0O6UXrOZJo)

------
gatestone
So what is the state of art? Is there a decent software available, that
actually works? An app that a small company could use easily, say, for
employee expense reports with reliable data parsing and automatic expense/tax
calculations?

~~~
feistypharit
I've been using waveapps.com. their receipt processor allows me to forward any
I get in email via pdf. They have an app to upload a pic as well. Have yet to
have an issue. Even the Costco ones that have a line through them. It's been
great. And it's "free".

~~~
waterside81
I use waveapps too and I'm _pretty_ sure they're Mechanical Turking their
receipt transcription. Sometimes it takes too long for it to be an automated
process.

------
bhanhfo
Is this dedicated receipt OCR significantly better then the out-of-the box OCR
you get from Google Cloud Vision, Abbyy or OCR.space?

As standard OCR gets better and better, the room for specialized OCR solutions
is becoming smaller.

~~~
tastyminerals
OCR is not information extraction. OCR is just the first step in a pipeline.

------
icedata
Sensibill (a former boss designed their system) has done pretty well at
interpreting receipts in various formats.

------
knolax
Why does the article title on HN omit the mention of OCR present in the actual
title.

~~~
maxden
Looks like the submitter is trying some A/B testing on getting clicks to their
site.

------
BrandiATMuhkuh
Does anyone know where I can find a receipt dataset?

