
Show HN: Screenshot Hero - iPhone app to make all photos text searchable - asadlionpk
https://asadmemon.com/projects/screenshothero/
======
kuschku
Based on the video it works amazingly well. I’ve tried finding a usable way
for open source text recognition on desktop, and haven’t found anything even
close to this.

Like, I've got scans of letters, machine-written, 300dpi, perfect black/white
1-bit depth, no marks or scratches, perfect quality — and OCRmyPDF (using
tesseract) absolutely fails at this task, returning only bullshit. Even if I
set the language correctly, or even set the wordlist to a manual transcription
of the PDF.

I also tried using OCR on screenshots, with the same miserable result.

How does Apple’s Vision API do so much better at this kind of task? Is there
some trick I'm missing?

Like, the images I supply are of such high quality that you can literally just
split into characters and search for the nearest match in a database of latin
characters, and even that would return better results than tesseract.

~~~
mendeza
I think you have to fiddle with the internal settings of the OCR package you
are using to get good results. For tesseract and pytesseract, there is a whole
article on improving the quality: [https://github.com/tesseract-
ocr/tesseract/wiki/ImproveQuali...](https://github.com/tesseract-
ocr/tesseract/wiki/ImproveQuality)

My guess is Apple trained on their own massive dataset and better
architectures that make their systems better than off the shelf ocr.

I am working in this area and one way I see good improvement is training an
object detector to first detect words in an image, then you can pass that
through tesseract/OCR software. Besides that, finetuning tesseract on data you
want strong performance would be the next best alternative.

Here is a cool article from dropbox how they engineering their OCR system:
[https://blogs.dropbox.com/tech/2017/04/creating-a-modern-
ocr...](https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-
using-computer-vision-and-deep-learning/)

This link below is a really cool project that showcases what building your own
OCR engine would look like!

[https://github.com/awslabs/handwritten-text-recognition-
for-...](https://github.com/awslabs/handwritten-text-recognition-for-apache-
mxnet)

~~~
mendeza
Reading all these comments about others having bad experiences with Tesseract
makes me think there is a market for a robust, better alternative to
Tesseract.

------
reikonomusha
This sounds like an app I’ve wanted forever! But as with all apps I’m
particularly sensitive to giving bulk access to my photos (and other data).

The app description says “images are all processed on device and nothing is
sent to any server” but the app’s privacy policy talks about first- and third-
party collection of potentially PII.

> The app does use third party services that may collect information used to
> identify you.

> I want to inform you that whenever you use my Service, in a case of an error
> in the app I collect data [...]. This [data] may include information such as
> your device Internet Protocol (“IP”) address, [...], the configuration of
> the app when utilizing my Service, [..], and other statistics.

Could an error log dump the strings it detected in my images and send them off
to this third-party for instance?

I assume everything here exists with the best intentions, but I worry about
using an app like this to scan and analyze all my images when there are a host
of exceptions in the privacy policy.

~~~
milankragujevic
The data collected is, as mentioned, used to analyze crashes. In case of an
error, some information is collected, that can be explained as being
justified: device unique ID, to know if an error happened on one device
multiple times, IP address is collected in web server access logs,
configuration of the app could mean a lot of things, i.e. resolution of the
viewport, etc.

If the Privacy Policy contains no mention of sending images to the server and
storing the images and/or text contents, then ... it's not happening.

edit: This comment was posted before you edited yours.

~~~
reikonomusha
I don’t feel so concerned about images being sent to the server, but rather
the information derived and collected about the images.

~~~
milankragujevic
Again, if it's not mentioned in the Privacy Policy, a sane thing would be to
assume it's not being done.

You can be paranoid, but then you shouldn't have a cell phone at all, let
alone an always-connected smartphone.

~~~
artfulhippo
No, its actually insane to take every written statement at its word.

------
saucow
I've been using the Memos on iOS ([https://memos.org](https://memos.org)) for
searching. Memos additionally lets you copy the text and is on device.

~~~
reaperducer
This looks very promising, and the price is reasonable.

But the web site is one page, without even an "About" section. All it has is
an animation, some text, and an e-mail address. No information about the
company at all. Even the whois for the domain is hidden.

For the sort of person for whom "on device" processing is important,
information about the company has value.

------
MR4D
Just downloaded and ran it.

Took a few minutes to go through my images, but the search was working from
the very first indexed photo.

The search is fast and word recognition is freaking impressive!

Well done!

~~~
asadlionpk
Thank you! Glad it worked fine for you :)

~~~
MR4D
Yes!

One thought - get this plugged into the SIRI search system so people can
search their photos when just doing a general search.

~~~
asadlionpk
Great idea!

------
eigenvalue
This works really well, great work! One thought-- if you first let it scan all
screenshots and let it finish, but then change it to scan all images, it seems
that it is recomputing all the text for every image. Perhaps it might make
sense to compute the hash of the individual images, and associate the parsed
text with this image hash in a table. Then if you have to scan the image
again, and the hashes match, these can simply be retrieved from the table
rather than computed again. Either way, what a nice free tool.

------
ChicagoBoy11
I tried writing a native module a while back (2+ years ago) to work with a RN
app and failed miserably. Has the process/documentation improved since then?
Are there known go-to resources for native module development with clear
examples? Wish expo.io (which I have used and is otherwise great) was more
robust when it comes to support custom native modules, too.

~~~
ccheever
Hi - I work on Expo.

One of the big focuses for us this year is to make that experience much
better.

We've already taken some steps. (We now just give you a regular React Native
project if you need one), but we plan to keep working on this stuff until you
can get all the power you need to build something like Screenshot Hero with
the ease of use of just making a website.

~~~
ChicagoBoy11
that's awesome to hear. For what it's worth, I found all of your guys' work
simply incredible, and have used it several times both for personal and
professional projects with overwhelmingly fantastic results and a far better
experience than anything I was ever able to accomplish natively. The
"complaint" about problems really stems from the fact that the bulk of the
experience is so fantastic that things like this which are still a bit rough
around the edges end up really standing out.

------
jaimex2
For those on Android, Google photos has been doing this for years.

------
evan_
I've been using an app called "Memos"[0] with a similar value proposition for
several months now- it's become a crucial for me. If someone gives me a paper,
appointment reminder, note from school- I just take a picture of it and I can
find it again any time I want. It's also really nice for finding pictures of
buildings or signs (or photos of people taken near buildings or signs) since
it can recognize text on those as well.

It's really cool to see how something like this can be made without a huge
investment! I expect that future iOS versions will probably just include this
natively. There's already some object recognition but not at this level.

[0]: [https://memos.org/](https://memos.org/)

------
josefresco
I typed "green" into the search box of Google Photos and it _seemed_ to work.
Does anyone have more details on this tech?

~~~
milankragujevic
Object recognition using machine learning. Google scans all photos in Photos
by default, but it can be turned off.

Try searching "credit card, [city], september 12th 2019", or "yellow cat" or
"screenshot".

It can't transcribe text though.

~~~
zurtex
Google photos can both search text: [https://9to5google.com/2019/08/22/google-
photos-text-search/](https://9to5google.com/2019/08/22/google-photos-text-
search/)

And transcribe text with Google Lens

------
kburman
Thanks for taking the initiative.

I would love to see more developers follow this approach.

~~~
briandear
I wouldn’t. Swift is far more performant than using these React-type pseudo-
native systems and the developer mentioned that he “fell back” to React
because of an apparent lack of SwiftUI documentation — so why not use this
project as a chance to actually learn it and create some documentation? He
wrote up about how he built the app, adding yet another article to the boring
canon of “I just want to use JavaScript and React and call myself an iOS
developer.”

And he said that he used React just for the views — SwiftUI isn’t that hard at
all and there are exact tutorials about listing things. Ray Wenderlich for
example, the WWDC videos — there is high quality information out there but it
seems like Swift and SwiftUI wasn’t actually a real consideration.

To be clear, I am not criticizing the app, what I am criticizing is the parent
comment hoping more developers follow that approach. I happen to think that
approach is lazy and aspires to the lowest common device denominator rather
than building apps that are platform specific and optimized for each.

~~~
hombre_fatal
> so why not use this project as a chance to actually learn it and create some
> documentation

Sometimes you just want to get something done rather than do free work for one
of the richest corporations in the world.

The latter should be enough fulfillment for anyone, but sometimes you're just
too greedy!

------
ElFitz
Oh. This is great. Just tried it out a couple times on my 439 screenshots, and
it works. I love it.

I wonder how well it'll scale to 10s of thousands of screenshots (especially
the search).

Still, I'm wondering wether or not I should keep Notion around (only used it
for web-clipping). Especially since Notion still can't search within pages and
iOS now can do full-page screenshots.

Edit: except that on iOS, Safari's full-page screenshots are PDFs. Darn.

~~~
asadlionpk
I have 6k photos on my phone and this works just fine. The search speed is
mostly the same.

------
ronjouch
Scanner Pro, by Readdle, also does on-device OCR for documents you scan (along
with many other things, including deformation/perspective correction and PDF
export).

Not affiliated in any way, just a satisfied user.
[https://readdle.com/scannerpro](https://readdle.com/scannerpro)

------
Razengan
I really wish iOS had a way to selectively block all internet access for
specific apps (like it kinda does for cellular data), and to block the Hidden
photos album from all apps at the system/API level.

That would make it more comfortable to try apps like this in terms of privacy
concerns.

------
css
How does it work with iCloud Photo Library, where most of the photos are not
stored on the device? Does it have to download the full library to search? I
have over 35k photos, but this app only shows about 4.5k of them, and it
processes one image every 2-3 seconds.

~~~
asadlionpk
I have the same setup but less photos (around 6k, many on icloud). Does it not
detect all 35k photos?

I will look into this.

~~~
css
Is it going to download the entire library to my device if I run processing
for all of my photos? That is over 500gb of network activity.

~~~
asadlionpk
Unfortunately it will have to when scanning them. But it won't be storing a
copy of the image within app, if that helps.

------
brenden2
What I really want is an app that goes through all my photos/chat history and
deletes anything that might contain sensitive information after 30 days, such
as from screenshots. I already use the iMessage feature that lets you delete
history after 30 days, but I wish I could pretty much do this for everything.
There are some things I might want to keep (like photos, for example), but I'd
prefer to just have anything private deleted automatically. If it's
information I really want to save, I'll make sure to save it somewhere safe.

~~~
nnd
How would you define sensitive content? The real challenge here is to train a
classifier.

~~~
asadlionpk
That and the app (or most likely a cloud service) will have insane access
level to all sensitive data. Which can be scary.

~~~
fzil
yeah, sounds like a good way to hand over 'only' the important stuff to our
internet overlords.

------
_bxg1
Is there a way of knowing for sure that it doesn't send the photos anywhere?
It doesn't appear to be open-source

~~~
asadlionpk
Take my word for it? :) I wish Apple had per-app-internet blocking feature but
you can turn off internet and try the app, it will work just fine.

I don't plan to open source this particular project as I already have enough
of open source action going for me.

------
JKirchartz
Awesome, I've been wanting something like this for a while now - any plans to
get it working on Android?

~~~
asadlionpk
Unfortunately not, I had to write most of it in Swift so cannot cross-compile.

------
OkGoDoIt
This is awesome, I’ve been meaning to try building something like this myself
after watching the WWDC talks of all the built-in ML features of iOS.

I wonder how difficult it would be to hook up the index of text to the
spotlight search API so it can search via the built-in iOS search?

~~~
asadlionpk
I think this shouldn’t be hard. I will give this a shot.

------
rodneyzeng
Works great for English searching. If it could support multiple languages
would be perfect.

~~~
asadlionpk
Understandable, I will look into this.

------
pacifika
Would happily pay £2-3 for this to index all the memes, did not expect it to
be free.

~~~
asadlionpk
It does index all the memes just fine. I should add inApp-purchase on the
extra dank ones, jk.

------
rajesh-s
I did not find this mentioned anywhere but Onenote makes snips text searchable
too

~~~
itsangaris
Evernote had it even earlier. The founder, Stepan Pachikov, has been working
on OCR as far back as the '80s.

~~~
criddell
Evernote works surprisingly well. I'm always surprised when I search for
something and it finds random photos of white boards that contain the word or
phrase I'm looking for.

Good Notes on the iPad is also shockingly good at OCR for handwriting. At
least for the way I write.

------
ada1981
Does this show me where the text is in the photo?

I’d like to be able to take a photo of my book shelf and have it show me where
the book I want is... or to generate an index of my books, with links I can
click that show me where they are.

~~~
asadlionpk
Not yet. But I assume that is going to be a useful feature.

~~~
vincentmarle
I wouldn't worry about this obscure use case.

------
saadalem
If you make it to easily search all of the text you read on any screen
(desktop + mobile) over the past day, you're gonna have a huge win there

~~~
OkGoDoIt
It’s unlikely there’s any way to do that on iOS, but in theory you could do
that on a desktop OS. Pretty sure there would be such a massive amount of text
throughout the day, you couldn’t just do a simple filter search, you would
need to somehow sort for relevance as well.

------
misiti3780
What software is being used to make that video ?

~~~
asadlionpk
I recorded it in simulator using the following shell command (while simulator
was running):

    
    
      xcrun simctl io booted recordVideo <filename>.<extension>

~~~
misiti3780
thx.

------
moioci
Not seeing this just now on US ios app store. Did it get pulled?

------
mobattah
Ironically, Google Photos does this on iPhone.

------
vizzah
Nice! Published only to the US App store? Can't be found in other countries:(

~~~
asadlionpk
That's odd. It should be available on all stores. Try searching directly?

------
qubex
Very impressed, works extraordinarily well. Congratulations and thank you.

~~~
asadlionpk
Thank you!

------
yeldarb
I would love if it could also let you search by the App or website that was
active when the screenshot was taken.

Probably a bit harder but maybe scraping app store screenshots and the top
5000 Alexa websites and training a classifier on them might be viable?

------
trenchgun
Why the "Genre"-filter is not a dropdown menu?

------
gigatexal
bummer news about SwiftUI but it parallels what I have been hearing about it.
Apple could do well to put more effort into documentation and examples.

~~~
asadlionpk
I think it will take some time, just like Swift did. It's has potential!

------
philfreo
Is there an app like this (local only) for macOS?

------
snazz
I have a bunch of XKCDs in my screenshots and somehow this app was even able
to index those! I’m very impressed with the quality of the OCR tech. Great
work!

~~~
asadlionpk
ikr! I did test XKCD comics when making this. I am primarily using this to
index my memes :D

------
vira28
i just typed the app name in app store and I don't see. Am i doing something
wrong?

~~~
asadlionpk
Does this link not work either? [https://apps.apple.com/us/app/screenshot-
hero/id1493170794?l...](https://apps.apple.com/us/app/screenshot-
hero/id1493170794?ls=1)

