

Microsoft OCR Library for Windows Runtime - maouida
http://blogs.windows.com/buildingapps/2014/09/18/microsoft-ocr-library-for-windows-runtime/

======
steeve
We had great results using tesseract-ocr[1] with SWT (state of the art text
detection algorithm, via libccv[2]) on Linux.

You can use our python bindings for both[3,4], although they might be slightly
outdated:

[1] [https://code.google.com/p/tesseract-
ocr/](https://code.google.com/p/tesseract-ocr/)

[2] [http://libccv.org/doc/doc-swt/](http://libccv.org/doc/doc-swt/)

[3]
[https://github.com/veezio/pytesseract](https://github.com/veezio/pytesseract)

[4] [https://github.com/veezio/pyccv](https://github.com/veezio/pyccv)

~~~
danbruc
Be aware that SWT is patented [1] if you want to use it commercially.

[1]
[http://www.google.com/patents/US20090285482](http://www.google.com/patents/US20090285482)

~~~
discjockeydom
This link shows the claims of the published application. The recently allowed
claims are a lot more narrow and less problematic. Still worth reviewing
though in case you are worried you infringe:

[http://www.scribd.com/doc/240266916/12122729](http://www.scribd.com/doc/240266916/12122729)

------
swalsh
This is very cool! I've been working on a receipt scanning tool in C# for
keeping track of kitchen inventory (tired of calling my wife asking if we have
sesame oil or some odd ball thing)

I found a few libraries, but they only worked with relatively perfect scans
(my goal is to be able to just use a phone). When I get home definitely going
to give this a go.

------
jamessantiago
Off topic, but this made me think that it would be neat if libraries on places
like github and nuget could someout include "cited by" data. Something that
referenced open source (maybe closed source too) projects that had a
dependency to the library similar to google scholar or CiteSeerX.

~~~
asuidyasiud
You can get a DOI for github.

~~~
afandian
Then what? There's nothing magical about DOIs. You need someone to store the
citation metadata. And generate / deposit citation metadata. And maintain the
persistence of the DOI. What precisely does the DOI represent? A codebase? A
fork of it? A file? A file at a particular revision? A changeset?

------
rikkus
It doesn't appear that you can use this in a 'normal' .NET app. Any ideas why?

~~~
NetMonkey
This is really one of my big frustrations with Microsoft.

On one hand, they really try to push everybody to upgrade to their newest and
shiniest, by making a lot of stuff (like this) only available on Windows 8+.

On the other hand, they don't even bother to put in a box with "What operating
systems will this work on", so you don't have to do trial/error, research
WinRT, and then be disappointed when you realize this will apparently never
work on Windows 7. And maybe only in Metro apps? What is Windows Runtime and
am I just supposed to know this?

I really enjoy coding C# and working in .NET. Microsoft has some really great
stable techs which work well for years and years - but increasingly if you
want anything new and shiny from them, you have to run the newest OS. Which if
you work with anything related to enterprise, good luck only targetting
Windows 8.

And honestly, despite working almost exclusively with MS tech, I just don't
really trust any platform from them that doesn't have significant traction and
track record as they all too often just give up and try something new - and
sometimes without real replacements available.

~~~
danbruc
The MSDN documentation for the classes [1] clearly states the supported
platforms. Admittedly the restriction to store apps is missing on the page for
the namespace [2].

    
    
      Minimum supported client  Windows 8.1 [Windows Store apps only]
      Minimum supported server  Windows Server 2012 R2 [Windows Store apps only]
      Minimum supported phone   Windows Phone 8
    

[1] [http://msdn.microsoft.com/en-
us/library/windows/apps/xaml/wi...](http://msdn.microsoft.com/en-
us/library/windows/apps/xaml/windowspreview.media.ocr.ocrengine.aspx)

[2] [http://msdn.microsoft.com/en-
us/library/windows/apps/xaml/wi...](http://msdn.microsoft.com/en-
us/library/windows/apps/xaml/windowspreview.media.ocr.aspx)

~~~
NetMonkey
Ah, don't know how I missed that. Thanks.

Crazy that it's limited to 8.1, and not even working on 8.0.

I really wonder if there is a valid technical reason, or they just use it to
push upgrades.

~~~
silon3
It's crazy that it's Windows Store only. On my only Windows machine the
Windows Store won't even open because I have UAC disabled.

~~~
bkeroack
Who on Earth uses Windows Store apps on Windows Server? We would totally
consider using this if not for that.

------
mdaniel
On [http://msdn.microsoft.com/en-
us/library/windows/apps/windows...](http://msdn.microsoft.com/en-
us/library/windows/apps/windowspreview.media.ocr.aspx) they mention the
supported languages and their statuses, but Korean is only "Good".

I freely admit that I do not speak Korean, but if one compares "Chinese
Simplified" characters (listed as "Very good") with those in the Korean
alphabet, I am surprised those two entries aren't transposed.

Is there something that makes recognizing Korean harder than Chinese
Simplified, or was that just a product management decision?

------
cipher0
"demonstrated in code snippets below". The code snippets are actually images
and even worse, they're JPEGs which is the reason why the text looks horrible.

~~~
drblast
If only there were some automated way to convert those images to text.

~~~
allegory
Now that is possibly the cruellest irony I've seen for a while. Well spotted
:)

------
reallycurious
is this better than the terrassect OCR?

~~~
josteink
I think that's a sort of apples and pears type of comparison.

Tessarect can be used everywhere, and is used dominantly on open platforms.
This is a offering from Microsoft to be used on their platform only.

They may both be good, but they have widely different platform targets.

~~~
RobAley
My guess is he meant better at actually OCR'ing text, not better for
implementation.

------
jccodez
tesseract is really looking great with google adding searchable pdf as output
in the latest release candidate.

------
Norm--
So from reading the list of reasons for inaccurate results, it sounds like
this library is totally useless for images taken with mobile phones, yet it is
only allowed to run on mobile ;)

Now I would be more interested in an image correction library

".... Blurry images Handwritten or cursive text Artistic font styles Small
text size (less than 15 pixels for Western languages, or less than 20 pixels
for East Asian languages) Complex backgrounds Shadows or glare over text
Perspective distortion Oversized or dropped capital letters at the beginnings
of words Subscript, superscript, or strikethrough text"

