Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I wrote a free Mac app to OCR any text on screen (github.com/schappim)
321 points by schappim 75 days ago | hide | past | favorite | 80 comments

I’m frustrated that there is no easy way to capture text from pictures. Thank you for building this.

This type of functionality should be integrated or readily available in MacOS.

I would love to have a way to do basic math operations, unit conversions, etc. without resorting to write them in spotlight. For example hovering over a price, it should convert it to a different currency. Or compare similar types of informations from different sources.

Your solution goes in this direction, thanks!



We had this fully client-side in the browser all the way back in 2013.

The problem with Naptha is that the OCR quality of Tesseract (used by Project Naptha) is not very good for screenshots. Historically Tesseract has been optimized for text documents. So the OCR results of this new macOCR app are significantly better.

Another alternative is Copyfish. It is cross-platform and uses cloud ocr:


There is 0 reason to use the cloud, unless exfil and selling data is your thing.

The cloud == someone else's computer. Never forget that.

My personal reason for using "the cloud" in this specific case is that the OCR results are significantly better.

> The cloud == someone else's computer.

I think everyone around here knows this and can make their own decisions, if and for what data they use "the cloud".

Be careful speaking about "everyone". You never know who the 10000 are.


Not everyone’s in the US

Sure but it's fairly assured that 80% at least are from the USA, possibly higher. Also it's fairly easy to extrapolate the premise of "the 10000" to include the rest of the world

There is an app called Text Sniper which works really well and is available on SetApp. I use it constantly and love it.

So there actually is a very quiet functionality hiding in MacOS Notes.app I noticed recently.

If you take any screenshot to clipboard that includes some text, then paste it into Notes, it will silently name the resulting image file using the text pictured.

Not terribly useful but I did find it helpful once while taking screenshots for documentation.

You’re welcome.

It should be pretty trivial for someone to hook this into https://insect.sh/.

Very nice! I'm currently using https://github.com/amebalabs/TRex for the same purpose.

This looks like exactly what I was after when making this!

Hmm does this support non-English languages?

Outstanding. I have a question, and it's not a complaint or a request for anything except a bit of data: What kinds of text does it support, as in, would it work for handwritten English, or German rotated 45 degrees, or Japanese...?

I dug into the code, looks like it use VNRecognizeTextRequest[1] from Vision framework by Apple here [2].

And the docs says >By default, a text recognition request first locates all possible glyphs or characters in the input image, then analyzes each string.

Since the code doesn't specify any preferred languages, I think it would try to detect any languages supported by the framework.

From short googling, I found this thread [3]. Looks like the supported languages depends on the MacOS version, and it only support en, fr, it, de, es, pt, zh on Big Sur.

Not sure about the rotation though.

[1] https://developer.apple.com/documentation/vision/vnrecognize...

[2] https://github.com/schappim/macOCR/blob/master/ocr/main.swif...

[3] https://developer.apple.com/forums/thread/121048

For those on Windows, I find Capture2Text[0] to be a pretty great FOSS Screen OCR application: you press Win Q and select an area like the snipping tool would, simple as that.

It also has some other snipping modes, supports more than english, and has the option to, after the OCR, immediately show a popup window where you can fix what the OCR inevitably failed to properly recognize.

I recommend use alongside a clipboard manager.

[0] http://capture2text.sourceforge.net

P.S. I recommend changing/disabling its Win E shortcut as that conflicts with Windows built-in shortcut for file explorer

It is, however, based on Tesseract which works well on clean or scanned documents, but isn’t very good for text “in the wild” like street signs and not-parallel-to-view images.

The Mac program uses VisionKit which does handle these cases better (not as well as google cloud vision from my outdated experience but way better than tesseract)

Nice! Although I'm linux user, I got inspired by the idea. I often combine imagemagick import with xclip to quickly snap parts of the screen to clipboard (import png:- | xclip -sel clip). This can easily be extended with tesseract for ocr support (just add 'tesseract - -' between import and xclip in the pipeline)!

Forgot a crucial parameter in the first xclip call: -t image/png. Without this, your system don't know its png data and will paste the raw data when you hit ctrl-v.

People here recommended a lot of different solution that does the same.

I’m using (and I’d like to recommend) https://www.keyboardmaestro.com/ for this.

It requires self-written macro, however it can do much more than that, including parsing & formatting OCRed text. For one job I went directly Image->OCR->File so I could copy OCRs into text for some non-elegant hardcodes ;-)

Is your macro published anywhere?

Personally, I use a combination of tesseract and MacOS' screen capture to achieve the same thing. Very handy to have if you use an app like BetterTouchTool to run it with a quick hotkey.


nice, thanks for sharing! for anyone having trouble configuring the shell execution: use /bin/sh/ for launch path and "-c" (without quotes) for parameter. Otherwise BTT will autofill "(null)" and then it won't work and it won't tell you why

nice script!

Reminds me of PowerSnap on the Amiga, that did a very simple version of this (it did not run a proper OCR engine, but relied on a very close match with one of the installed fonts, so worked great for UI of apps that didn't support cut and paste, not so great for images unless they were created using the same font and not scaled, but still very useful).

For folks looking for something like this in an app store, I think I found TextSniper here on HN last August, and it's also in the great alt app store "SetApp" collection:

“TextSniper is an easy-to-use desktop Mac OCR app that can extract and recognize any non-searchable and non-editable text on your Mac's screen. As an extra feature, it can turn OCR text into speech. It is a super convenient alternative to complicated optical character recognition tools.”


“Meet lightning-fast text recognition on Mac. TextSniper is an app that can extract text from a selected portion of your screen. Forget taking notes — get TextSniper to capture and save what’s important.”


Consolidating into this note, OwlOCR is mentioned elsewhere in this post:

“Capture any text on your Mac's screen. Digitize images and PDFs to searchable PDFs using OCR right on your Mac.”


“OwlOCR allows grabbing a part of the screen and having any text in that area be instantaneously recognized and copied to clipboard. Additionally, the application supports recognizing text from PDF files, images and converting the contents to plain text. All conversion is done securely on-device - none of your images or files are sent to third-party services in the cloud.”


// Both process on device, OwlOCR mentions Apple's algo. Users of both in this thread are happy.

If you want a launchable app, you can also use Automator (select new App, drag and drop “Utilities/Run Shell Script”, insert the path "/usr/local/bin/ocr" as the script content, save) and put the resulting app in the Doc. (No terminal or screen space required to launch, and it's always at your finger tips.)

This is cool! I’ve been a user of TextSniper to do this for a long time, but will happily try this out.

Hey this is very useful! I wrote a Raycast (https://raycast.com/) script to invoke it here https://gist.github.com/cheeaun/1405816e5ceb397cbc9028204f82...

I tweeted a GIF on how it looks like https://twitter.com/cheeaun/status/1395973544983425025.

Wow this is very nice, if this can be made really solid, then you are very close to a kyc as a service product. Meaning, compare issuing date with the official formats for the time frames, then compare if the fonts and checksums are correct and an API to send the results to the customers. Most platforms dealing with fiat are waiting for something like that, they are suffering from doing this manually or semi manually with prohibitive employee costs. If this can be done platform agnostic and fraud proof, the companies will throw themselves at the product. Are you planning anything towards that direction?

Seems to be a tiny wrapper around VNRecognizeTextRequest which is available on iOS, macOS etc. It's a dozen lines of code to do this - I think someone should build this into a menu item app, perhaps?

Very nice. Does this work with screen fonts / pixel perfect fonts too? That'd be my biggest gripe with all the OCR tools: what looks like the simplest fonts of them all to OCR are usually the one the less well detected.

For example using tesseract on Linux and trying to OCR the "terminus" font I get better result by first resizing the screenshot to something bigger (and blurry) and even then it's far from perfect OCR'ing. When in the first place it's a pixel perfect font...

(and, yes, there are cases where OCR'ing screen fonts make sense)

I’ve always wondered whether you couldn’t build a not-fully-ML app for screen-font recognition, that just takes a bunch of fonts, renders out every individual glyph at every size, trims them, converts them to an alpha mask, and then generates an indexible image fingerprint of said alpha mask.

The OCR software would then just need to be smart enough to recognize “things that look like glyphs”, and put bounding boxes around them; and everything from there could be implemented in logic, rather than a model. (Just apply the same transforms to the thing in the bounding box, and then search the fingerprint DB.)

I actually use another version with Google Vision (it is in a whole other class to on device OCR), and it is quite challenging to make Google Vision not work. For example it even works for hand written text.

Oh interesting, I should check that out.

Very nice. I've been using https://screenotate.com/ for years in part because of the nice OCR support

yeah, screenotate has been nice for this, have also used a lesser known feature in Prizmo (https://creaceed.com/prizmo) too for this.

I didn't try the app yet but I use an app called "Yomiwa" to OCR Japanese. The biggest issue I find is that it seems to be trained on black on white images so if I try to OCR a sign with different colors, or some text on a product or menu, it often fails. For example I just tried it on the ingredients of the label of the soda I'm drinking which is white on black and it's completely failing.

Masking the background and making text black would solve this. I have used OpenCV for that before.

OCRmyPDF does that. Given the name I assume it's only for PDFs given the name, but theoretically it should work for any image without the need to extract it from the PDF.


This can be a valuable accessibility tool, especially on inaccessible apps. Have you looked into checking its compatibility with Voice Over?

You could just pipe the output to the say command eg:

/usr/local/bin/ocr | say

I can, but not everybody else. 90% of the population does not use terminal commands as a part of their everyday lives.

Here's another one: https://screenotate.com/

Sorry if this is off topic but as a windows user is there a good one around for that, that people would recommend

Pardon the late reply.

I've been successfully using Mathpix Snip [1] to do general OCR for quite some time.

It's not as well communicated as its initial purpose of applying OCR to LaTeX equations, but it currently supports much more than that, such as mixed text/math and tables.

On a personal note, I'm actually surprised it wasn't mentioned thus far in this thread.

See more discussion here on HN in [2], [3].

[1] https://mathpix.com

[2] https://news.ycombinator.com/item?id=16535358

[3] https://news.ycombinator.com/item?id=21871780

Copyfish is cross-platform: https://ocr.space/copyfish

Microsoft One Note can do it.

One note

It's barfing due to missing libraries on my machine, anyone know what OS version this requires?

Should be Catalina and above, I’ll add it to the docs. I’ve tested on an i9 running Catalina and M1 forced to be on Big Sur.

Very nice and it is super wonderful to use. Now my workflow is so easy. Most of the time, translation on GIF and images don't work and I have to get them and then use a translation tool. Now it is just snap and my shortcut to translation all in one place.

Yandex supports translation on images, and you can paste it straight from the clipboard. I'm assuming that MacOS has something similar to Windows' Win-Shift-S to take a screenshot a section of the screen, which makes it really fast and easy.


Flameshot could do with this feature https://github.com/flameshot-org/flameshot/issues/702

Great project! This is one of those tools that solves a problem that's right in front of you. I can see this being built into screenshot software, just like mark-up tools are these days.

Also, does anyone know of a similar project for Linux?

No project I'm aware of, but I did cobble together a script to do something similar. In my I case also wanted it to work with copyq so there's some noise related to it.

  maim -s -u | tesseract - "$tmp"
  # Remove empty lines
  sed -ir '/^\s*$/d' "$tmp".txt

  copyq add "$(cat "$tmp".txt)"
  rm "$tmp".txt
  rm "$tmp".txtr

tesseract insists on adding on txt extension and what I assume is some intermediary file txtr, making it awkward to use with mktemp. Probably explained in the manual which I skipped.

But like others have said tesseract is not very reliable, at least with default settings -- it's common for it to add extra spaces or various single quotes, or omit spaces.

I wasn't familiar with that "-r", but FWIW the gsed man page says "(for portability use POSIX -E)"

Abutting that "r" option against the "i" option is likely why you ended up with a file named .txtr and therefore implies that it did not actually hear the "-r" you intended

I've had the best luck picking an actual backup suffix such as "-i.bak" or "-i~" to keep BSD sed and GNU sed on the same page, although I've also seen scripts that go as far as "--version" sniffing and changing the actual invocation as "${SED_I} -E" type stuff

You are absolutely correct! I wasn't aware that "-i" takes an optional suffix, so it didn't even occur to me to look at the sed line as possible cause for this extension weirdness.

Yeah, but "optional" in the _worst possible way_ since, due to the getopt library difference, the GNU version wants any empty suffix value abutted, and the BSD version wants it separated away from the "-i", burning thousands of hours of humanity over the years :-(

Woe unto those who write scripts as "sed -i -e /whatever/" since for half(?) of their users they'll end up with "somefile-e"

Really nice! Would love if it supported tables. Have struggled to find any OCR that can take a photo of a table and turn it into a spreadsheet. It’s usually an inconsistently tab-separated mess requiring a ton of cleanup.

Have you tried FineReader? I used it a couple of years ago for table recognition

That is _exactly_ what I've been looking for the past 5 years! Unfortunately I am still on Mojave, so of course I get an error:

dyld: Symbol not found: _OBJC_CLASS_$_VNRecognizeTextRequest

Any chance to get it to work on poor macOS 10.14?

I had no idea that macOS/iOS now had built in OCR!

Any idea on how this compares to tessaract (or other local OCR).

I currently have an Alfred workflow that invokes tessaract and it works decently well but the accuracy could be better.

I’ve tried with Tesseract, the Vision API, Google Vision and Azure’s equivalent (also kicked off using an Alfred workflow).

By far the best for text was Google Vision and then Azure. Whilst Google Cloud and Azure both also do handwriting recognition, Azure did better at this.

The cloud platforms performed better than pure on device with Apple’s vision API outperforming Tesseract.

Good to know! I thought most cloud OCR services were paid only but it turns out google vision has 1000 free invocations a month with Azure at 5000. That should be plenty for most people (certainly for me).

Do you have the source for those workflows/would you be willing to share them?

I tried a bunch of different solutions but found OwlOCR to be the one with the best result.


I'll keep an eye on this one too though!

This uses the same OCR as this ShowHN, so the OCR results should be identical.

Is the OCR part readily available using this `VNRecognizedTextObservation`? So it suggests possible option and the best one needs to be chosen? Would love to see how it's implemented.

This looks super useful. Is there some equivalent for Windows? A quick search didn't yield anything useful.

that's really cool, congrats! it would be nice to have an UI and a package for installation.

Very useful - thanks!

Not a Mac user or developer, poking at the source code because I might be interested in building a Linux equivalent. Why is code so damn complicated these days? What does all this crap do? Why isn't the source code of things these days 100% human-readable?

/* Begin PBXBuildFile section / 0425D1C16E9B7E34F8EBCCFB229F6BCF / Pods-ocr-umbrella.h in Headers / = {isa = PBXBuildFile; fileRef = E52F12A9CD9DA185DB6C7CFAF9971233 / Pods-ocr-umbrella.h /; settings = {ATTRIBUTES = (Project, ); }; }; 69F017594F16B64B4E70E96B863F38D1 / Pods-ocr-dummy.m in Sources / = {isa = PBXBuildFile; fileRef = 812D67335813B22DFC54237ACEB07CC8 / Pods-ocr-dummy.m /; }; 9D8F5FD727B32865EE80BA6ACDA12AF4 / ScreenCapture.swift in Sources / = {isa = PBXBuildFile; fileRef = CAE82544998B753F1708876308FF330D / ScreenCapture.swift /; }; AC8C4224C366FAD03EFFDC427D793373 / ScreenCapture-dummy.m in Sources / = {isa = PBXBuildFile; fileRef = 2BFFD24873C787E751AFC41D8C497ECB / ScreenCapture-dummy.m /; }; BE8E791706F107976678CAA1DE681FA6 / ScreenCapture-umbrella.h in Headers / = {isa = PBXBuildFile; fileRef = B9D6CB7E3F7CD4599F66F1F010D4CADD / ScreenCapture-umbrella.h /; settings = {ATTRIBUTES = (Project, ); }; }; CA9117D8B1C22828347BFE8326E2F7D2 / ScreenRecorder.swift in Sources / = {isa = PBXBuildFile; fileRef = FC34AC3B539E1EFA3B0D1E086E1BA1D9 / ScreenRecorder.swift /; }; / End PBXBuildFile section */

Why are you complaining about code that is (hopefully) obviously auto-generated and not "written" by the author, but still necessary nonetheless? It's because you are "Not a Mac user or developer", so we should be a bit more considerate on you

Maybe the issue your complaint unexpectedly tries to surface is that many awesome, highly useful projects like this one depend on code that isn't human-readable. I think this is a noteworthy point and should be discussed more often.

But then again, having the code in some form of source control /at all/ is far, far better than the alternative, which is depending on some instructions in a README.md or just hoping the user will know how to use XCode properly such that the real contribution of the project is used

Maybe your post also somewhat points out the fact that to newcomers or people looking at XCode code (auto-generated or otherwise) for the first time, it's /not/ obvious which files you should be looking at, and so we should give the parent poster some slack. Is this a problem that projects should worry about or take into consideration when auto-generated code starts to mix with non-generated code in source control?

N.B.: the parent post was talking about ./Pods/Pods.xcodeproj/project.pbxproj https://github.com/schappim/macOCR/blob/ca9a6379e07a8e1a5eaa...

My tiny contribution is within main.swift. The heavy lifting is done by the CoreImage/Vision API, and a library that interfaces with Screen Capture.

You can achieve the same thing using Tesseract on Linux, or even better quality using Google Vision.

I'm a noob using Xcode. Loaded the workspace, removed the signing team, everything builds for all schemes but where is the binary?

Does this screenshot[1] answer your question?

[1] https://files.littlebird.com.au/Shared-Image-2021-05-22-17-1...

This looks like a .xcodeproj file. This isn't source code, it's auto generated metadata for the Xcode IDE. You have to check it into source control because Xcode is dumb and sometimes breaks your project if it can't find its own metadata.

The actual code here that isn't just Xcode boilerplate is in this very simple to read file: https://github.com/schappim/macOCR/blob/master/ocr/main.swif...

That is not the source code, it is XCode IDE project file, you don't need it.

This application calls the native macOS libraries to do the OCR so I don't think you'd find anything useful here to do a Linux port - you can certainly use the idea and combine it with a linux compatible OCR library though.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact