Hacker News new | past | comments | ask | show | jobs | submit login
Project Naptha: Make Text in Browser Images Selectable (projectnaptha.com)
324 points by polm23 on Sept 9, 2019 | hide | past | favorite | 52 comments



A fun little hack on Android phones (or at least Google Pixels -- this may be disabled by some OEMs).

Starting last year the Android Recent Apps screen allows you to select text from the carousel of screenshots of apps. The way they do this is not by actually running the app, but by doing OCR on the screenshots, so you can actually select text that is otherwise not selectable like text in images by opening up recent apps.


I use this a ton. Sometimes it will transcribe incorrectly, but overall it’s extremely useful, and I’ve found myself accidentally attempting to do it on iPhone too.


> little hack

It's acually a feature, you can turn it off in the Pixel Launcher settings/Suggestions/Overview selection.

On Android Pie, it can also be used to share/grab images that you couldn't normally download.


I remember reading about this feature but alas it's disabled / not implemented on Samsung devices (or at least not on my S9)


Ah, Samsung, forever protecting us from the perils of accidentally being able to do things. (I do appreciate that they left in wi-fi calling, though; my wife's otherwise excellent Moto G4 has it disabled.)


I wish I had known this before! So useful!


I guess if Google can be inspired by the mathematical term "googol" and then choose a variant spelling, it's okay for Naptha to be inspired by the hydrocarbon product "naphtha" (note the second "H") and choose a variant spelling, too.

But having grown up in the home of a chemistry professor, I'm having a hard time getting comfortable with this one.


Ah just noticed this! The typo is not acknowledged in the "in the "What's in a Name?" paragraph in the end -- wonder if they noticed it?


(creator here)

Unfortunately I only noticed the typo after purchasing the domain name and decided to run with it :)


Time to build an elaborate (and false) reasoning to share, and at the end of the story reveal the truth.


This is probably even better -- now it's a unique and memorable word, like "captcha" for instance.


I have an even better retroactive name justification! Naphtha is used as fuel, but it's also used as a solvent in chemical extraction and purification procedures.

I honestly thought that was the reasoning when I saw the name & the tool. You purify text from images with naptha!


Naptha is the name of a character in Thomas Mann's https://en.wikipedia.org/wiki/The_Magic_Mountain


Naphta, actually, but the allusion to Thomas Mann's Nobel Prize winning book is brilliant in any case. Now there's whole pool of literary names that can be attached to future features, product roll-outs, etc.

Project Pepercorn alone is enough to get Business Insider writing about it all the time.


> naptha is a type of fuel often used for lighters

from what I gather here, it is an accidental misspelling.

Curiously, if you remove the other 'h' instead, as in 'naphta' (nafta), you end up with the word as actually used in many parts of the world.


Only tangentially related, but probably a good thread to ask in:

Is there an OCR that’s in the ballpark as Google’s image recognition API, and that runs locally? I need to read text from photographed images that I am legally not allowed to upload to google, and Tesseract is virtuallly useless for this.


I have a privacy concern. Does this send the image (or the URL to that image) to their server for processing, or does it happen in my browser without their server receiving any information regarding the sites I visit, or the images I load?


"By default, when you begin selecting text, it sends a secure HTTPS request containing the URL of the specific image and literally nothing else (no user tokens, no website information, no cookies or analytics) and the requests are not logged. The server responds with a list of existing translations and OCR languages that have been done. This allows you to recognize text from an image with much more accuracy than otherwise possible. However, this can be disabled simply by checking the "Disable Lookup" item under the Options menu."


I wonder why couldn't this be done using all client-side javascript implementations? I wouldn't mind waiting a few seconds to select versus sending that data to a remote server (and paying for that server through ads or subscription).


(creator here)

That's exactly what pressing "Disable Lookup" does.

We started a fairly popular Emscripten/WebAssembly port of the Tesseract OCR engine specifically to advance the state of client-side javascript OCR engines: https://tesseract.projectnaptha.com

However, there's a limit to the kind of performance that you can get out of client side javascript OCR engines— with a shared server you can get better recognition on certain popular images.


Is the source code of their server available anywhere? Assuming I have it, it would be fairly easy to modify the extension to use my server instead of theirs, right?


(creator here)

If you enable the "Disable Lookup" setting from the context menu the extension doesn't communicate with any servers at all— everything is processed with a Javascript/WebAssembly OCR engine bundled with the app— it just may be of a sightly lower quality.


This does not answer my question though. Is the source code for the server available? If it is more accurate, and is open sourced, then I would like to run my own and make the extension connect to my server.


Hey! Is there some reason why its not working on some photos where the cursor is replaced with a magnifier tool?


They are the same people behind the javascript port of the tesseract OCR engine: https://github.com/naptha/tesseract.js

Here's an online notebook for trying it: https://observablehq.com/@tmcw/tesseract-js-v2-alpha


I haven't used this extension in a while but IIRC there was an option to do all the OCRing locally.



Doesn't seem to support Firefox yet...


Works for me on FF 69

edit: I just saw that the page says my browser is not supported but I could use more than half of the examples


I’m on FF70, I’ll give it a go


> currently only Google Chrome is supported

Stop of doing this !


Since it's something being given to you for free, perhaps "I would appreciate it if other browsers were supported" is more appropriate than this bare command.


Seems like it is only 1 or 2 people building this. I don't think it's reasonable to expect hobby software projects to always support multiple browsers / OS / whatever. You have to start somewhere.


Something like this could be developed for the WASM backend of Qt. It works really well, but you lose all of those nice accessibility features of the normal web environment.


I use Firefox, but this is a must-have! Very impressed, I will be keeping up-to-date and I will be waiting for a Firefox version released! Good-luck!


I haven't tried it myself, but it might work with Chrome Store Foxified [0].

[0] https://addons.mozilla.org/en-US/firefox/addon/chrome-store-...


https://ocr.space/copyfish

This stuff is far superior for Chinese character recognition. Does amazingly well with text on background images as long as there is some contrast. Sometimes image cap doesn't work in Firefox, but Chrome derivatives are fine (I'm using with Vivaldi).


I remember uninstalling this due to malware reasons. I guess it has been resolved for a long time now?

"Our Copyfish extension was stolen and adware-infested": https://news.ycombinator.com/item?id=14888010


https://chrome.google.com/webstore/detail/cloud-vision/nblmo...

Google Cloud Vision OCR is incredible for Chinese character recognition. I think it's the same backend as Google Translate uses.


This is a really cool idea for simple PDFs and images but I don't think the OCR capability is state of the art.


(creator here)

The copy is mostly referring to the text detection algorithm that was being used (It's built on a tweaked custom implementation of Microsoft Research's Stroke Width Transform algorithm)— which was state of the art a few years ago (when I first wrote the words on the website).

Nowadays neural approaches perform a bit better, so I should probably change that.


A similar tool for translating text on images. https://www.imagetranslate.com, which gives more control on editing the images and recreate them.


how about just making text in web pages selectable? looking at you facebook on android


why I get this ?

```“Where Did

You Go?”

<[ TEXT RECOGNITION IN PROGRESS / MORE INFO: http://projectnaptha.com/process/ (IDX:1:4&!&!&!&!:XDI) / ELAPSED 23.52SEC / DATE Mon, 09 Sep 2019 16:56:50 GMT / TEXT RECOGNITION IN PROGRESS ]>

"Nothing. ```


The "more info" link explains that the text recognition is slow.


This will put that one reddit user that manually transcribes memes for blind people out of a job!


Having this superpower is more useful then I realized it would be. Thanks for making it.


Doesn't work on Kanji :-(

Reading untranslated manga would be so-much-easier if it does!


Try https://chrome.google.com/webstore/detail/cloud-vision/nblmo...

Contact me if you want to chat more about language learning and comics!


i don't see a privacy policy or licence?


it works good, im using it from years :)


Useful




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: