Hacker News new | past | comments | ask | show | jobs | submit login
Convert paper-based notes to HTML content with Google Vision API (itnext.io)
79 points by jmfcurti on July 25, 2020 | hide | past | favorite | 33 comments



Google Vision API is quite awesome.

I wrote this app to scan the labels and read them out loud so my grandpa can manage his life.

https://www.helpmereadthis.com/

It’s open source, so feel free to steal code.


That’s such a lovely and heartwarming thing to do. Kudos to you and thanks for sharing.


Thanks for sharing! Amazing!


Sidenote: around half/a third of the article is on how to deal with Google IAM Auth. This is one of the things that puts me back the most from the modern Big Cloud and keeps me on Heroku, that it's such a behemoth of services and combination that you need such a complex setup for a very "simple" tool.

I like/prefer it a lot when you can simply put up a script, press 3 buttons and voila you have a fully working url. Even Google used to do that, I've built two libraries around Google's services, npm's `translate`[1] and `drive-db`[2], the former I no longer test on Google Translate even though it's supposed to be supported and the later it's using a more obscure plain JSON API to avoid Google Auth.

[1] https://www.npmjs.com/package/translate

[2] https://www.npmjs.com/package/drive-db


Cloud Authentication and Networking are the modern lock-in.


That 'privacy' stuff everyone rants about is pretty annoying, i guess


This is a great idea and wouldn't be too hard to port to a local solution, e.g. using Tesseract or similar. The idea of tuning your handwriting and using a markup language designed to be easily written by hand to make it easier to do the OCR, instead of trying to be perfectly general, is actually a great idea.

I can imagine one necessary enhancement (I would find it necessary anyway) would be to support crossing-through words with a line to "erase" them.


Google Vision API is eons ahead of tessaract in its OCR capabilities. I was quite impressed how good it was when trying to read ID numbers from scans.


I guess my point is that this project is interesting because it doesn't depend so strongly on the OCR being perfect, as he's allowing to prescribe a handwriting style that helps it along. I just think it's a nice compromise, and something that I hadn't thought of previously.

Obviously the tech can always be improved, but I find this to be a secondary concern. Ideas can still be cool without worrying about implementation details. For instance, if the PoC works well enough, but you want better performance, then you can train a new OCR system specifically for it. You can break the project down into pieces, where each piece is an interesting challenge. Meanwhile getting 80% performance out of an existing solution allows you to build up a dataset to be used for improving it.


Ha I was going to say, I was trying to read receipts and Tesseract was having problems, those are machine-printed not hand written. Granted supposed to clean up/up contrast/etc... but yeah. Scanning a screenshot of a website though accuracy is almost 100% every time eg. just black/white text.


It's usually better to segment the text on the image (and twek the perspective if necessary) to help tesseract extract the text. All the "easy" preprocessing that can be done beforehand helps a lot. I understand that Google Vision Services doesn't need these kind of adjustments (or maybe they do them automatically?), which makes for an easier time/less code, but setting up the account sure looks like a lot of work from the article!


That's a good tip about the segmenting text, will have to keep that in mind next time.

I like the local aspect, one project I have in mind is a hand writing to printed letters for note taking/tablets. Not a new concept I think one drive has it, but it would be neat to train it on your own writing.

Why not Zoidberg


If you want to train your own net for handwritten letters, you can start with EMNIST[0] and then move on from there.

[0] https://www.nist.gov/itl/products-and-services/emnist-datase...


Thanks, yeah it's an ambitious project. It's cool but has no real gain immediately. Maybe if I was a student/used a tablet everyday.


Nice tutorial, but seems like it would be much easier (and more user friendly) to concert to markdown instead of using his pseudo-tags for HTML. The only thing perhaps you'd need to handle differently than native Markdown is significant whitespace, but that seems like it would just need some minimal changes.


I completely agree, this would be suited so well for Markdown! Funnily enough I have been wanting to add this feature into a knowledge management app [1] I've been building. There have been so many times where our users have written something on paper and want to save it digitally to work on and organise later.

[1] https://supernotes.app


Convert paper-based notes to HTML content with Google Vision API

The title is funny to me, because I know people who deliberately take paper notes to keep them safe from Google.

I guess now they have an option should they change their views on privacy.


I used rocketbook for a time and it worked really well. Just the erasing and having to use specific pages to take notes on meant I couldn't just grab anything to take notes on and archive it with search capabilities.

This seems like it could be nice to tinker with building something similar.

I prefer hand written notes but prefer to be able recall the notes digitally from a search.


How long will Google keep supporting this API, though?

And, it's getting old, but why does Google need to know any notes I write down. My desktop computer and phone seem powerful enough to do any OCR with the right software.


I am looking for a great OCR software I can run locally instead of using Cloud Vision. Do you have any suggestions?


Are you on Mac or iOS? If so, then I developed apps for them which does OCR 100% locally:

MacOS:

https://apps.apple.com/us/app/image-text-ocr-photo-pdf-scan/...

iOS:

https://apps.apple.com/us/app/image-text-ocr-photo-scanner/i...


Do you have a recommendation for good OCR software? Tesseract had trouble for me reading scanned printed text.


Don't want to duplicate my comment:

https://news.ycombinator.com/item?id=23953456


Thank you.


>And, it's getting old, but why does Google need to know any notes I write down.

So they can target you with more ads.


You do realise Google doesn't use this data to target ads, right? Nor do you need an account.


Your really made me want to try the API. Also, awesome write up. Thank you.


I counted 12 steps to just get the demo going. Not for the faint of heart!


Beware, it's in fact a medium article ... Do someone have a good trick to skip their paywall?


Outline might help a bit [0] - it preserves all the text but pictures are really small [1] and all headers are lost.

[0]: https://outline.com/KZqa5L

Edit:

[1]: Images on Medium can be resized by changing the expected width param, as in https://miro.medium.com/max/{width}/{filename} URI.

- Outline-generated link (width = 60 px): https://miro.medium.com/max/60/1*rNmb72I6rBUckbJuI6hRFw.png?...

- Resized to 1024px width: https://miro.medium.com/max/1024/1*rNmb72I6rBUckbJuI6hRFw.pn...


both work for me

- incognito tab in chrome

- delete the medium or in this case itnext.io cookie


I did not get blocked this time so I cannot confirm it works properly on medium but this is a wonderful extension for unblocking many paywalls.

https://github.com/iamadamdev/bypass-paywalls-chrome


They lost me at "Google".

I don't want to have to use a Google account and to share my images and notes and text with Google and the US government.

Also - I agree with other commenters that it'll probably be a better idea to obtain the "unparsed" markup (or should I say - markdown?) rather than generating HTML.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: