Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Screenshot Hero - iPhone app to make all photos text searchable (asadmemon.com)
429 points by asadlionpk on Jan 16, 2020 | hide | past | favorite | 118 comments

Based on the video it works amazingly well. I’ve tried finding a usable way for open source text recognition on desktop, and haven’t found anything even close to this.

Like, I've got scans of letters, machine-written, 300dpi, perfect black/white 1-bit depth, no marks or scratches, perfect quality — and OCRmyPDF (using tesseract) absolutely fails at this task, returning only bullshit. Even if I set the language correctly, or even set the wordlist to a manual transcription of the PDF.

I also tried using OCR on screenshots, with the same miserable result.

How does Apple’s Vision API do so much better at this kind of task? Is there some trick I'm missing?

Like, the images I supply are of such high quality that you can literally just split into characters and search for the nearest match in a database of latin characters, and even that would return better results than tesseract.

I think you have to fiddle with the internal settings of the OCR package you are using to get good results. For tesseract and pytesseract, there is a whole article on improving the quality: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuali...

My guess is Apple trained on their own massive dataset and better architectures that make their systems better than off the shelf ocr.

I am working in this area and one way I see good improvement is training an object detector to first detect words in an image, then you can pass that through tesseract/OCR software. Besides that, finetuning tesseract on data you want strong performance would be the next best alternative.

Here is a cool article from dropbox how they engineering their OCR system: https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr...

This link below is a really cool project that showcases what building your own OCR engine would look like!


Reading all these comments about others having bad experiences with Tesseract makes me think there is a market for a robust, better alternative to Tesseract.

In my limited comparison. Apple’s API performed way better than Tessaract. You can check your images using this app or their sample xcode project for vision, I think same API works on macOS too.

I would also vouch for google’s ocr API but I opted for Apple’s for privacy reasons and as they were easier to implement.

> but I opted for Apple’s for privacy reasons and as they were easier to implement.

Thank you for respecting user privacy and making it a first class citizen.

I use Tesseract to extract text from my scanned documents in order to be able to grep through them. What surprised me, was that the lower quality (resolution) scans had actually much better OCR results. So it seems Tesseract works best with some specific font-size-DPI combination.

This is the script I wrote for that purpose (maybe it helps): https://nopaste.xyz/?ea1e6242a659ce5c#vAMyPpajo0OIeCGQe9Wkdq...

The results aren't perfect, but good enough for my purpose.

You get what you pay for with Tesseract, don't judge OCR performance based on it. Would highly recommend checking out FineReader if you're on a supported platform. You can feed it crumpled beer receipts and the results are still decent

If it makes you feel any better, I had a similarly terrible experience with Tesseract. I spent days studying the docs and I failed to find a setup where it would produce anything useful even with the most perfect input.

I was also left with a feeling that I must have been missing some trick, because people on the web seem to be using it with good results.

It requires a lot of tuning. When I first tried it out I rapidly abandoned it when I couldn't get useful stuff out of a screenshot of text I typed in.

I don't know if this is how it works or not, but it is worth noting that searching for text in images is a much easier problem than transcribing images, and if you do the latter as "step 1" you destroy your ability to do the former: if you search for some piece of text you want to find all images that maybe if you squint a bit could be that piece of text, which allows you to deal with cases where your recognition is forced to make a bad guess that does the moral equivalent of rebracketing symbols that is difficult to work around in a later text-only search pass. If you can at all avoid it, always avoid transforming your input information until you know your query, lest you destroy information (this is what always made Microsoft Journal so epic for me, though I should be clear that I haven't used it in 15 years so for all I know they forgot this).

I had a similar experience working with OCR for a similar project as OP ( https://www.askgoose.com ). What we realized was tesseract's training data was significantly different to what we were trying to transcribe (screenshots of conversations). The sparseness of text leads to bad transcriptions. One workaround is to do some image processing before hand so you can break the image down into data that looks similar to the train data used for tesseract.

I've had success with first using imagemagick to cleanup the image, do some thresholding and then using the latest version (4.1) of tesseract. With tesseract you can also try to experiment with different --psm values for better results.

It might be worth trying this for desktop: https://github.com/the-paperless-project/paperless

It handles OCR, indexing and search

It appears to also use Tesseract

The paperless project has some tweaks in the config that can get decent results out of the box. For general use with decent quality documents it will work pretty well right now versus spending hours tweaking Tesseract.

Tesseract needs text on a white background to work well out of the box. For "text on images" try for example https://ocr.space/ocrapi

You might like Screenotate by Omar Rizwan - https://screenotate.com/

Is there a way to port Apple's Vision API to javascript for a webapp, or use their models without an Apple device like Apple Maps on Duckduckgo?

I don't think that's possible. The Vision VNRecognizeTextRequest is only available for:

iOS 13.0+ macOS 10.15+ Mac Catalyst 13.0+ tvOS 13.0+


This sounds like an app I’ve wanted forever! But as with all apps I’m particularly sensitive to giving bulk access to my photos (and other data).

The app description says “images are all processed on device and nothing is sent to any server” but the app’s privacy policy talks about first- and third-party collection of potentially PII.

> The app does use third party services that may collect information used to identify you.

> I want to inform you that whenever you use my Service, in a case of an error in the app I collect data [...]. This [data] may include information such as your device Internet Protocol (“IP”) address, [...], the configuration of the app when utilizing my Service, [..], and other statistics.

Could an error log dump the strings it detected in my images and send them off to this third-party for instance?

I assume everything here exists with the best intentions, but I worry about using an app like this to scan and analyze all my images when there are a host of exceptions in the privacy policy.

> But as with all apps I’m particularly sensitive to giving bulk access to my photos (and other data).

I don't understand why iOS doesn't allow blocking internet access for apps. You can only disallow mobile data access, but not WiFi access.

The frustrating part is that in China they have the functionality [0], so it's there, but we are not allowed to use it for our privacy.

[0] https://old.reddit.com/r/apple/comments/69k1j8/the_chinese_i...

Advertising on iOS relies on internet access so toggling it off would mean no ads.

It doesn’t need to. The OS could download ads and show them in—app and report back impressions through the OS. And, Apple doesn’t make much money from iOS ads.

Apple doesn't have any ad frameworks that are open to third-party developers, so they would have to build one to support this. And that didn't work out so well last time they tried.

But developers do and they make free apps on the app store

The data collected is, as mentioned, used to analyze crashes. In case of an error, some information is collected, that can be explained as being justified: device unique ID, to know if an error happened on one device multiple times, IP address is collected in web server access logs, configuration of the app could mean a lot of things, i.e. resolution of the viewport, etc.

If the Privacy Policy contains no mention of sending images to the server and storing the images and/or text contents, then ... it's not happening.

edit: This comment was posted before you edited yours.

If the Privacy Policy contains no mention of sending images to the server and storing the images and/or text contents, then ... it's not happening.

A Privacy Policy is just words in most jurisdictions. Short of a Wireshark analysis, there's no way to know for sure.

I'd rather the program asked if I want to send the crash log the next time I opened it. I have several programs that do that, and I wish it was more widespread.

> A Privacy Policy is just words in most jurisdictions. Not for long. At least in EU.

Well, it's already illegal, it just takes time to prosecute enough data processors until others start being afraid of the fines and start ACTUALLY implementing GDPR.

I don’t feel so concerned about images being sent to the server, but rather the information derived and collected about the images.

If it makes you comfortable, I am not sending text (or any other metadata for images including location, camera fingerprint etc). I plan to add basic analytics to know which views/features are being used and also crash detections.

Again, if it's not mentioned in the Privacy Policy, a sane thing would be to assume it's not being done.

You can be paranoid, but then you shouldn't have a cell phone at all, let alone an always-connected smartphone.

No, its actually insane to take every written statement at its word.

I'm in the same boat. I wish Apple gave us a way to block internet access for specific apps. Currently, there is way to block cellular data for a given app but wifi is always unrestricted.

Try Google Photos.

That's the opposite of what the parent wants. Google photos uploads everything to their cloud.

I've been using the Memos on iOS (https://memos.org) for searching. Memos additionally lets you copy the text and is on device.

This looks very promising, and the price is reasonable.

But the web site is one page, without even an "About" section. All it has is an animation, some text, and an e-mail address. No information about the company at all. Even the whois for the domain is hidden.

For the sort of person for whom "on device" processing is important, information about the company has value.

Just downloaded and ran it.

Took a few minutes to go through my images, but the search was working from the very first indexed photo.

The search is fast and word recognition is freaking impressive!

Well done!

Thank you! Glad it worked fine for you :)


One thought - get this plugged into the SIRI search system so people can search their photos when just doing a general search.

Great idea!

This works really well, great work! One thought-- if you first let it scan all screenshots and let it finish, but then change it to scan all images, it seems that it is recomputing all the text for every image. Perhaps it might make sense to compute the hash of the individual images, and associate the parsed text with this image hash in a table. Then if you have to scan the image again, and the hashes match, these can simply be retrieved from the table rather than computed again. Either way, what a nice free tool.

I tried writing a native module a while back (2+ years ago) to work with a RN app and failed miserably. Has the process/documentation improved since then? Are there known go-to resources for native module development with clear examples? Wish expo.io (which I have used and is otherwise great) was more robust when it comes to support custom native modules, too.

Honestly, it hasn’t changed much. But Swift made it easier to make a native module (as compared to Obj-C which was defacto method a few years back)

Hi - I work on Expo.

One of the big focuses for us this year is to make that experience much better.

We've already taken some steps. (We now just give you a regular React Native project if you need one), but we plan to keep working on this stuff until you can get all the power you need to build something like Screenshot Hero with the ease of use of just making a website.

that's awesome to hear. For what it's worth, I found all of your guys' work simply incredible, and have used it several times both for personal and professional projects with overwhelmingly fantastic results and a far better experience than anything I was ever able to accomplish natively. The "complaint" about problems really stems from the fact that the bulk of the experience is so fantastic that things like this which are still a bit rough around the edges end up really standing out.

Did you try the iOS Calendar tutorial[1] ? Besides this, I find useful looking into other third party modules, in concrete some module that is simliar of what you are building, if you are having UI or not, if you want to use Cocoa pods or not ...

Also a bit of tanget, but I'll take the oportunity to say that I wish third party react-native modules had some sort of standard documentation like flutter website, specially versioned documentation.

[1] https://facebook.github.io/react-native/docs/native-modules-...

I have learned it in the last few months. Its been really intuitive as a Native dev swithcing to work with RN more. A really simple example I made is this native module for showing the iOS 13 Context Menu when RN views are long pressed, if you are looking for superrrr simple examples https://github.com/mpiannucci/react-native-uimenu

For those on Android, Google photos has been doing this for years.

I've been using an app called "Memos"[0] with a similar value proposition for several months now- it's become a crucial for me. If someone gives me a paper, appointment reminder, note from school- I just take a picture of it and I can find it again any time I want. It's also really nice for finding pictures of buildings or signs (or photos of people taken near buildings or signs) since it can recognize text on those as well.

It's really cool to see how something like this can be made without a huge investment! I expect that future iOS versions will probably just include this natively. There's already some object recognition but not at this level.

[0]: https://memos.org/

I typed "green" into the search box of Google Photos and it seemed to work. Does anyone have more details on this tech?

Object recognition using machine learning. Google scans all photos in Photos by default, but it can be turned off.

Try searching "credit card, [city], september 12th 2019", or "yellow cat" or "screenshot".

It can't transcribe text though.

Google photos can both search text: https://9to5google.com/2019/08/22/google-photos-text-search/

And transcribe text with Google Lens

If you click the Lens-button in the mobile app, you can select and copy the text just fine

Thanks for taking the initiative.

I would love to see more developers follow this approach.

Agreed: Cool app and certainly heroic and worth implementing.

As an alternative, Google Photos from the App Store also support OCR for still-images on its web version. I don't have an idevice on me to know if it's also OCR in-app. Give GP a run for its money, Screenshot Hero!

Thanks! Yes Google Photos is awesome and does way much more (like searching by things in photo). I just wanted an on-device solution without having to upload anything to Google servers.

I wouldn’t. Swift is far more performant than using these React-type pseudo-native systems and the developer mentioned that he “fell back” to React because of an apparent lack of SwiftUI documentation — so why not use this project as a chance to actually learn it and create some documentation? He wrote up about how he built the app, adding yet another article to the boring canon of “I just want to use JavaScript and React and call myself an iOS developer.”

And he said that he used React just for the views — SwiftUI isn’t that hard at all and there are exact tutorials about listing things. Ray Wenderlich for example, the WWDC videos — there is high quality information out there but it seems like Swift and SwiftUI wasn’t actually a real consideration.

To be clear, I am not criticizing the app, what I am criticizing is the parent comment hoping more developers follow that approach. I happen to think that approach is lazy and aspires to the lowest common device denominator rather than building apps that are platform specific and optimized for each.

> so why not use this project as a chance to actually learn it and create some documentation

Sometimes you just want to get something done rather than do free work for one of the richest corporations in the world.

The latter should be enough fulfillment for anyone, but sometimes you're just too greedy!

This project was supposed to be that, a chance to learn SwiftUI. I didn't want to reverse engineer their API + document it. Frankly, I didn't have that kind of free time :)

It helped that React Native performed great for this use case.

I think that's unnecessarily harsh. RN is powerful and the dev still uses Native modules and views. The performance was probably good enough for this concept and he made a cool app with it.

Oh. This is great. Just tried it out a couple times on my 439 screenshots, and it works. I love it.

I wonder how well it'll scale to 10s of thousands of screenshots (especially the search).

Still, I'm wondering wether or not I should keep Notion around (only used it for web-clipping). Especially since Notion still can't search within pages and iOS now can do full-page screenshots.

Edit: except that on iOS, Safari's full-page screenshots are PDFs. Darn.

I have 6k photos on my phone and this works just fine. The search speed is mostly the same.

Scanner Pro, by Readdle, also does on-device OCR for documents you scan (along with many other things, including deformation/perspective correction and PDF export).

Not affiliated in any way, just a satisfied user. https://readdle.com/scannerpro

I really wish iOS had a way to selectively block all internet access for specific apps (like it kinda does for cellular data), and to block the Hidden photos album from all apps at the system/API level.

That would make it more comfortable to try apps like this in terms of privacy concerns.

How does it work with iCloud Photo Library, where most of the photos are not stored on the device? Does it have to download the full library to search? I have over 35k photos, but this app only shows about 4.5k of them, and it processes one image every 2-3 seconds.

By default, my app was only scanning screenshots. In the menu the 4th option should switch it to scan all photos, I think.

I have the same setup but less photos (around 6k, many on icloud). Does it not detect all 35k photos?

I will look into this.

Is it going to download the entire library to my device if I run processing for all of my photos? That is over 500gb of network activity.

Unfortunately it will have to when scanning them. But it won't be storing a copy of the image within app, if that helps.

What I really want is an app that goes through all my photos/chat history and deletes anything that might contain sensitive information after 30 days, such as from screenshots. I already use the iMessage feature that lets you delete history after 30 days, but I wish I could pretty much do this for everything. There are some things I might want to keep (like photos, for example), but I'd prefer to just have anything private deleted automatically. If it's information I really want to save, I'll make sure to save it somewhere safe.

How would you define sensitive content? The real challenge here is to train a classifier.

That and the app (or most likely a cloud service) will have insane access level to all sensitive data. Which can be scary.

yeah, sounds like a good way to hand over 'only' the important stuff to our internet overlords.

I leave that as an exercise to the reader. However for something like this I'd rather have false positives.

With photos on iOS you can delete sensitive content on creation i.e. take photo, delete photo. The trash empties after 30 days, so you can access it there if you need to.

Is there a way of knowing for sure that it doesn't send the photos anywhere? It doesn't appear to be open-source

Take my word for it? :) I wish Apple had per-app-internet blocking feature but you can turn off internet and try the app, it will work just fine.

I don't plan to open source this particular project as I already have enough of open source action going for me.

Awesome, I've been wanting something like this for a while now - any plans to get it working on Android?

Unfortunately not, I had to write most of it in Swift so cannot cross-compile.

This is awesome, I’ve been meaning to try building something like this myself after watching the WWDC talks of all the built-in ML features of iOS.

I wonder how difficult it would be to hook up the index of text to the spotlight search API so it can search via the built-in iOS search?

I think this shouldn’t be hard. I will give this a shot.

Works great for English searching. If it could support multiple languages would be perfect.

Understandable, I will look into this.

Would happily pay £2-3 for this to index all the memes, did not expect it to be free.

It does index all the memes just fine. I should add inApp-purchase on the extra dank ones, jk.

I did not find this mentioned anywhere but Onenote makes snips text searchable too

Evernote had it even earlier. The founder, Stepan Pachikov, has been working on OCR as far back as the '80s.

Evernote works surprisingly well. I'm always surprised when I search for something and it finds random photos of white boards that contain the word or phrase I'm looking for.

Good Notes on the iPad is also shockingly good at OCR for handwriting. At least for the way I write.

Does this show me where the text is in the photo?

I’d like to be able to take a photo of my book shelf and have it show me where the book I want is... or to generate an index of my books, with links I can click that show me where they are.

Not yet. But I assume that is going to be a useful feature.

I wouldn't worry about this obscure use case.

Unless you're in a (big) library, isn't it faster just to.. look?

Usually yes. But imagine you have a screenshot of a text conversation and search for "method" (but stop at "meth" for whatever reason) and see an unexpected screenshot. Finding the fact that "something" contains the text "meth" in said conversation may be incredibly difficult.

You are right, that will be helpful. I will find some time and add this feature.


Yes that would be neat. With a proper text based search you usually get the matched prase marked in search result snippets to check the context.

If you make it to easily search all of the text you read on any screen (desktop + mobile) over the past day, you're gonna have a huge win there

It’s unlikely there’s any way to do that on iOS, but in theory you could do that on a desktop OS. Pretty sure there would be such a massive amount of text throughout the day, you couldn’t just do a simple filter search, you would need to somehow sort for relevance as well.

What software is being used to make that video ?

I recorded it in simulator using the following shell command (while simulator was running):

  xcrun simctl io booted recordVideo <filename>.<extension>


looks like a screen recording on device, standard iOS feature.

Not seeing this just now on US ios app store. Did it get pulled?

Ironically, Google Photos does this on iPhone.

Nice! Published only to the US App store? Can't be found in other countries:(

That's odd. It should be available on all stores. Try searching directly?

Very impressed, works extraordinarily well. Congratulations and thank you.

Thank you!

I would love if it could also let you search by the App or website that was active when the screenshot was taken.

Probably a bit harder but maybe scraping app store screenshots and the top 5000 Alexa websites and training a classifier on them might be viable?

Why the "Genre"-filter is not a dropdown menu?

bummer news about SwiftUI but it parallels what I have been hearing about it. Apple could do well to put more effort into documentation and examples.

I think it will take some time, just like Swift did. It's has potential!

Is there an app like this (local only) for macOS?

I have a bunch of XKCDs in my screenshots and somehow this app was even able to index those! I’m very impressed with the quality of the OCR tech. Great work!

ikr! I did test XKCD comics when making this. I am primarily using this to index my memes :D

i just typed the app name in app store and I don't see. Am i doing something wrong?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact