
Convert paper-based notes to HTML content with Google Vision API - jmfcurti
https://itnext.io/convert-paper-based-notes-to-html-content-with-google-vision-api-e398fdb45cb9
======
artpi
Google Vision API is quite awesome.

I wrote this app to scan the labels and read them out loud so my grandpa can
manage his life.

[https://www.helpmereadthis.com/](https://www.helpmereadthis.com/)

It’s open source, so feel free to steal code.

~~~
te_chris
That’s such a lovely and heartwarming thing to do. Kudos to you and thanks for
sharing.

------
franciscop
Sidenote: around half/a third of the article is on how to deal with Google IAM
Auth. This is one of the things that puts me back the most from the modern Big
Cloud and keeps me on Heroku, that it's such a behemoth of services and
combination that you need such a complex setup for a very "simple" tool.

I like/prefer it a lot when you can simply put up a script, press 3 buttons
and voila you have a fully working url. Even Google used to do that, I've
built two libraries around Google's services, npm's `translate`[1] and `drive-
db`[2], the former I no longer test on Google Translate even though it's
supposed to be supported and the later it's using a more obscure plain JSON
API to avoid Google Auth.

[1]
[https://www.npmjs.com/package/translate](https://www.npmjs.com/package/translate)

[2] [https://www.npmjs.com/package/drive-
db](https://www.npmjs.com/package/drive-db)

~~~
WrtCdEvrydy
Cloud Authentication and Networking are the modern lock-in.

~~~
anoncareer0212
That 'privacy' stuff everyone rants about is pretty annoying, i guess

------
radarsat1
This is a great idea and wouldn't be too hard to port to a local solution,
e.g. using Tesseract or similar. The idea of tuning your handwriting and using
a markup language designed to be easily written by hand to make it easier to
do the OCR, instead of trying to be perfectly general, is actually a great
idea.

I can imagine one necessary enhancement (I would find it necessary anyway)
would be to support crossing-through words with a line to "erase" them.

~~~
nojvek
Google Vision API is eons ahead of tessaract in its OCR capabilities. I was
quite impressed how good it was when trying to read ID numbers from scans.

~~~
jcun4128
Ha I was going to say, I was trying to read receipts and Tesseract was having
problems, those are machine-printed not hand written. Granted supposed to
clean up/up contrast/etc... but yeah. Scanning a screenshot of a website
though accuracy is almost 100% every time eg. just black/white text.

~~~
dr_zoidberg
It's usually better to segment the text on the image (and twek the perspective
if necessary) to help tesseract extract the text. All the "easy" preprocessing
that can be done beforehand helps a lot. I understand that Google Vision
Services doesn't need these kind of adjustments (or maybe they do them
automatically?), which makes for an easier time/less code, but setting up the
account sure looks like a lot of work from the article!

~~~
jcun4128
That's a good tip about the segmenting text, will have to keep that in mind
next time.

I like the local aspect, one project I have in mind is a hand writing to
printed letters for note taking/tablets. Not a new concept I think one drive
has it, but it would be neat to train it on your own writing.

Why not Zoidberg

~~~
dr_zoidberg
If you want to train your own net for handwritten letters, you can start with
EMNIST[0] and then move on from there.

[0] [https://www.nist.gov/itl/products-and-services/emnist-
datase...](https://www.nist.gov/itl/products-and-services/emnist-dataset)

~~~
jcun4128
Thanks, yeah it's an ambitious project. It's cool but has no real gain
immediately. Maybe if I was a student/used a tablet everyday.

------
hn_throwaway_99
Nice tutorial, but seems like it would be much easier (and more user friendly)
to concert to markdown instead of using his pseudo-tags for HTML. The only
thing perhaps you'd need to handle differently than native Markdown is
significant whitespace, but that seems like it would just need some minimal
changes.

~~~
tobeagram
I completely agree, this would be suited so well for Markdown! Funnily enough
I have been wanting to add this feature into a knowledge management app [1]
I've been building. There have been so many times where our users have written
something on paper and want to save it digitally to work on and organise
later.

[1] [https://supernotes.app](https://supernotes.app)

------
reaperducer
_Convert paper-based notes to HTML content with Google Vision API_

The title is funny to me, because I know people who deliberately take paper
notes to keep them safe from Google.

I guess now they have an option should they change their views on privacy.

------
jbreiding
I used rocketbook for a time and it worked really well. Just the erasing and
having to use specific pages to take notes on meant I couldn't just grab
anything to take notes on and archive it with search capabilities.

This seems like it could be nice to tinker with building something similar.

I prefer hand written notes but prefer to be able recall the notes digitally
from a search.

------
amelius
How long will Google keep supporting this API, though?

And, it's getting old, but why does Google need to know any notes I write
down. My desktop computer and phone seem powerful enough to do any OCR with
the right software.

~~~
enjoiful
I am looking for a great OCR software I can run locally instead of using Cloud
Vision. Do you have any suggestions?

~~~
busymom0
Are you on Mac or iOS? If so, then I developed apps for them which does OCR
100% locally:

MacOS:

[https://apps.apple.com/us/app/image-text-ocr-photo-pdf-
scan/...](https://apps.apple.com/us/app/image-text-ocr-photo-pdf-
scan/id1495787023?mt=12)

iOS:

[https://apps.apple.com/us/app/image-text-ocr-photo-
scanner/i...](https://apps.apple.com/us/app/image-text-ocr-photo-
scanner/id1499292605)

------
42droids
Your really made me want to try the API. Also, awesome write up. Thank you.

------
1024core
I counted 12 steps to just get the demo going. Not for the faint of heart!

------
js4ever
Beware, it's in fact a medium article ... Do someone have a good trick to skip
their paywall?

~~~
pqb
Outline might help a bit [0] - it preserves all the text but pictures are
really small [1] and all headers are lost.

[0]: [https://outline.com/KZqa5L](https://outline.com/KZqa5L)

Edit:

[1]: Images on Medium can be resized by changing the expected width param, as
in
[https://miro.medium.com/max/{width}/{filename}](https://miro.medium.com/max/{width}/{filename})
URI.

\- Outline-generated link (width = 60 px):
[https://miro.medium.com/max/60/1*rNmb72I6rBUckbJuI6hRFw.png?...](https://miro.medium.com/max/60/1*rNmb72I6rBUckbJuI6hRFw.png?q=20)

\- Resized to 1024px width:
[https://miro.medium.com/max/1024/1*rNmb72I6rBUckbJuI6hRFw.pn...](https://miro.medium.com/max/1024/1*rNmb72I6rBUckbJuI6hRFw.png)

------
einpoklum
They lost me at "Google".

I don't want to have to use a Google account and to share my images and notes
and text with Google and the US government.

Also - I agree with other commenters that it'll probably be a better idea to
obtain the "unparsed" markup (or should I say - markdown?) rather than
generating HTML.

