
OCR Software Dev Exposes 200k Customer Documents - wglb
https://www.bleepingcomputer.com/news/security/ocr-software-dev-exposes-200-000-customer-documents/
======
p1necone
'The Cloud' is just 'Someone Elses Computer'. Would you process confidential
documents on 'Someone Elses Computer'?

The bigger mistake was made before this data breach even happened.

~~~
tdb7893
Even if they were Amazon and actually owned the hardware being run on it
wouldn't have changed the outcome at all so I don't understand how it being
someone else's computer is relevant.

~~~
zAy0LfpBZLC8mAC
It's not about property rights, it's about who is in control of and has
insight into what is going on on the machine.

~~~
LeifCarrotson
Right. In this case, the OCR software dev had control of the clound machines.

However, the customers that uploaded sensitive documents to this cloud OCR
service did not have control of the computers, the code, or the configuration.

Yes, if you don't trust anyone you can't get anything done but this feels like
the kind of task where you should be a little bit nervous each time you do it.

~~~
sixothree
This is assuming the customer would have better data protection.

~~~
mirimir
Customers would presumably not store their documents on publicly available
servers.

~~~
j88439h84
The ocr company presumably wouldn't either though.

~~~
mirimir
Well, except that said OCR company _needs_ to be using Internet-facing
servers, and random customers don't.

------
uses
Another MongoDB misconfiguration, wow. How is it still so easy to configure
mongo with no creds and an open port? Feels like these alone cause a large %
of data leaks.

If I was running a cloud platform of such a massive scale I'd probably scan my
own ports to identify glaring problems like this one. Kind of surprising that
isn't happening considering how bad it is to have the brand associated with a
report like this.

~~~
ReverseCold
If you spin up a server and install mongodb (while forgetting to turn on a
firewall that blocks all non-port 443/80 traffic) - everyone has root access
to your database.

Easy mistake to make. I've probably done it at least once on publicly
accessible test instances.

~~~
BLanen
This just says to me that MongoDB's defaults should be changed.

It's happening too much.

A somewhat random password would at least provide some protection and minimal
inconvenience for devs.

~~~
achillean
By default, MongoDB doesn't listen on the public interface so it won't be
exposed to the Internet - it only listens to localhost. Old versions of
MongoDB had bad defaults but that hasn't been the case in years:

[https://blog.shodan.io/its-still-the-data-
stupid/](https://blog.shodan.io/its-still-the-data-stupid/)

------
tjoff
As a user: just don't store (or even process) data in random clouds.

As a developer: really try to minimize data collection at all costs.

I'm waiting for a future where the above is considered common sense.

------
bigcity
There is a severe lack of good LOCAL ocr options for documents. They only ones
I know are Abbyy (very good and very expensive) and OCR.space Local
(affordable but not as good) and of course Tesseract. But i feel Tesseract is
increasingly left behind with regards to OCR quality and speed.

~~~
jumelles
And this article is ABOUT Abbyy!

------
snthd
The million dollar question - did this affect EU citizens?

~~~
notimetorelax
Yeah, as an EU resident and a client of one of their clients, I REALLY want to
know if my data was affected. And I have the right under the GDPR.

------
AJRF
Aside; This was the OCR engine included with DEVON(think/note) applications.

Thankfully the OCR engine processed documents offline, in a seemingly
prescient move, adherence to the rule of processing and storing as little data
as needed saved an anxiety inducing set of events from occurring.

Shame that small teams (I think that team is less than 10 devs) are sometimes
the only businesses with enough sense to ensure their risk when it comes to
things like this is mitigated or reduced.

[https://blog.devontechnologies.com/2018/08/abbyy-data-
leak/](https://blog.devontechnologies.com/2018/08/abbyy-data-leak/)

~~~
syn0byte
I think smaller teams are better at mitigating because there is much less "not
my department".

Dev: I don't gotta worry about setting up Auth on my mongo instance for
dev/testing. Deploy/Admins will handle it in production...

Admins: They delivered a container package from CI so we assumed they had set
up authentication correctly...

Skiddie: Om nom nom...

------
Ftuuky
This is why the bank where I work developed its own OCR software (and
translation and entity recognition, etc.). It simply couldn't afford to have
these kind of leaks.

------
jglazko
If I have the right 'Abbyy', I'm surprised that I don't see any information
about this data release on their website. Or, maybe not. Are there disclosure
laws that one would expect to come into play here?

~~~
bdcravens
Not familiar that much, but I think disclosure means they must notify the
customer, rather than being obligated to disclose on their website.

------
mirimir
Isn't there still OCR software that does all processing locally?

------
zanchey
Anyone remember the articles in Uplink? This feels like it could easily have
been lifted from the game.

------
person_of_color
Wow. Software from 1993 still going strong.

