A fun little hack on Android phones (or at least Google Pixels -- this may be disabled by some OEMs).
Starting last year the Android Recent Apps screen allows you to select text from the carousel of screenshots of apps. The way they do this is not by actually running the app, but by doing OCR on the screenshots, so you can actually select text that is otherwise not selectable like text in images by opening up recent apps.
I use this a ton. Sometimes it will transcribe incorrectly, but overall it’s extremely useful, and I’ve found myself accidentally attempting to do it on iPhone too.
Ah, Samsung, forever protecting us from the perils of accidentally being able to do things. (I do appreciate that they left in wi-fi calling, though; my wife's otherwise excellent Moto G4 has it disabled.)
I guess if Google can be inspired by the mathematical term "googol" and then choose a variant spelling, it's okay for Naptha to be inspired by the hydrocarbon product "naphtha" (note the second "H") and choose a variant spelling, too.
But having grown up in the home of a chemistry professor, I'm having a hard time getting comfortable with this one.
I have an even better retroactive name justification! Naphtha is used as fuel, but it's also used as a solvent in chemical extraction and purification procedures.
I honestly thought that was the reasoning when I saw the name & the tool. You purify text from images with naptha!
Naphta, actually, but the allusion to Thomas Mann's Nobel Prize winning book is brilliant in any case. Now there's whole pool of literary names that can be attached to future features, product roll-outs, etc.
Project Pepercorn alone is enough to get Business Insider writing about it all the time.
Only tangentially related, but probably a good thread to ask in:
Is there an OCR that’s in the ballpark as Google’s image recognition API, and that runs locally? I need to read text from photographed images that I am legally not allowed to upload to google, and Tesseract is virtuallly useless for this.
I have a privacy concern. Does this send the image (or the URL to that image) to their server for processing, or does it happen in my browser without their server receiving any information regarding the sites I visit, or the images I load?
"By default, when you begin selecting text, it sends a secure HTTPS request containing the URL of the specific image and literally nothing else (no user tokens, no website information, no cookies or analytics) and the requests are not logged. The server responds with a list of existing translations and OCR languages that have been done. This allows you to recognize text from an image with much more accuracy than otherwise possible. However, this can be disabled simply by checking the "Disable Lookup" item under the Options menu."
I wonder why couldn't this be done using all client-side javascript implementations? I wouldn't mind waiting a few seconds to select versus sending that data to a remote server (and paying for that server through ads or subscription).
That's exactly what pressing "Disable Lookup" does.
We started a fairly popular Emscripten/WebAssembly port of the Tesseract OCR engine specifically to advance the state of client-side javascript OCR engines: https://tesseract.projectnaptha.com
However, there's a limit to the kind of performance that you can get out of client side javascript OCR engines— with a shared server you can get better recognition on certain popular images.
Is the source code of their server available anywhere? Assuming I have it, it would be fairly easy to modify the extension to use my server instead of theirs, right?
If you enable the "Disable Lookup" setting from the context menu the extension doesn't communicate with any servers at all— everything is processed with a Javascript/WebAssembly OCR engine bundled with the app— it just may be of a sightly lower quality.
This does not answer my question though. Is the source code for the server available? If it is more accurate, and is open sourced, then I would like to run my own and make the extension connect to my server.
Since it's something being given to you for free, perhaps "I would appreciate it if other browsers were supported" is more appropriate than this bare command.
Seems like it is only 1 or 2 people building this. I don't think it's reasonable to expect hobby software projects to always support multiple browsers / OS / whatever. You have to start somewhere.
Something like this could be developed for the WASM backend of Qt. It works really well, but you lose all of those nice accessibility features of the normal web environment.
This stuff is far superior for Chinese character recognition. Does amazingly well with text on background images as long as there is some contrast. Sometimes image cap doesn't work in Firefox, but Chrome derivatives are fine (I'm using with Vivaldi).
The copy is mostly referring to the text detection algorithm that was being used (It's built on a tweaked custom implementation of Microsoft Research's Stroke Width Transform algorithm)— which was state of the art a few years ago (when I first wrote the words on the website).
Nowadays neural approaches perform a bit better, so I should probably change that.
<[ TEXT RECOGNITION IN PROGRESS / MORE INFO: http://projectnaptha.com/process/ (IDX:1:4&!&!&!&!:XDI) / ELAPSED 23.52SEC / DATE Mon, 09 Sep 2019 16:56:50 GMT / TEXT RECOGNITION IN PROGRESS ]>
Starting last year the Android Recent Apps screen allows you to select text from the carousel of screenshots of apps. The way they do this is not by actually running the app, but by doing OCR on the screenshots, so you can actually select text that is otherwise not selectable like text in images by opening up recent apps.