
OCR “intelligent” software - adblu
I need help with automatic data extraction from invoice scans and images. 
The thing is, I would like to develop automatic system so it knows where is the invoice number, price, date, etc.... 
The problem is that, each invoice has different layout and I cannot figure out any rule for such system. 
Do you have some ideas ? or could you point me into some direction ?
Thank you
======
bausshf
An ugly illustration:
[https://i.imgur.com/uZWLAaX.png](https://i.imgur.com/uZWLAaX.png)

1\. First you want to detect the location where the invoice number is. There
must be something that each invoice share that can help you detect it. If they
all have different ways of displaying it, simply see if you can detect any of
the possible ways it's displayed.

2\. After detecting where the invoice number is then remove all noise and crop
it down to just the invoice number.

3\. Now you got the invoice number only and can use OCR on that to read the
numbers.

It's a little more complex than that, but it should help you a tiny bit.

