Amazon is the master of the "good enough". Their service works well enough that you can check the box that it exists and then point it at all that data you already have in AWS. And that's all that most everyone needs.
If you are using AI and your competitors aren't, it doesn't really matter all that much how good the AI is -- you're gonna do better and be more efficient.
It's only after everyone is using AI that it will start to matter how good your particular implementation is. Right now we're at the stage that any implantation is better than none.
You don't have to disagree. The OP said:
> "it doesn't really matter all that much how good the AI is"
I think you're both correct and you bring up an interesting point. As long as your AI is "good enough" to replace what would've taken more resources to do otherwise, it's a win. I'm not sure if that's "half" as you stated, but I bet it depends on the task. If the task is saving a few seconds to query something, then I'd agree, half of the time wrong isn't a savings. But, if you have less than half a chance at saving thousands of dollars or hundreds of hours if it works correctly, then that may be chalked up as a win.
I like your framing/sentiment though: it's not about small differences in "betterness", it's about the difference in kind in going from "no ML/AI" => "whoa, it works!".
Disclosure: I work on Google Cloud (but not in ML).
Since you work there, hopefully you can see this feedback and filter it up: I love your tools. They are the best. I have tried using your tools. They are hard to use, despite the fact that I have a pretty solid understanding of how to use them. I would like to use your tools more, but getting support is hard (partly because the docs aren't great, partly because there is no community, because see #1).
I don't know how to fix this, but it would be great if maybe Google spent some time focusing on building a community around your tools, like AWS did. At the beginning they had a lot of employees hanging out on the forums and on other forums, answer questions and building a community of users, and especially helping third parties who tried to build libraries for their tools (like boto for Python). It would be great if Google did that too.
Thanks for listening!
I want to know which AI platform is the strongest, I did find in my anecdotal tests Microsoft's Vision API performed better but this was in mid to early 2016.
But generally speaking when I talk to AI experts they all agree that for the services that Google actually has, they are better, with MS being #2.
Actually, Google has fewer public APIs when compared to Msft/AWS for AI use cases, ex: Face recognition etc. Maybe they just don't release stuff unless it's much better than the competitor's.
I was wondering why there's a unicode char in the middle:
In : c = '\u011f'
In : c
The test they did is similar to providing:
MICROSOET and expecting the API's to find MICROSOFT.
This is like speech recognition using context to fill in the gaps. Without this it would be unusable.
Disclosure: I work on Google Cloud (but not these APIs).
Payloads -> speohed
OCR.space is a "good enough" option for many projects. It has a very generous free tier of 25,000 free conversions/month per IP address (Google only 1000/month per account). In my tests it performed not as good as Google, but good enough for many applications (much better than Tesseract).
In fact the state of the art of OCR "in the wild" (using images off the street, for example) is far from 100%. Google Cloud Vision does pretty good.
The ICDAR 2017 challenges (especially Robust Reading Challenges) should give you an idea of where we are now:
As an example, see the ICDAR 2015 results , where the Google Vision API is at 59.60% (Hmean) while the best ones are over 80%. Note that this test is about localization, i.e. finding the text location without recognizing the actual content, though on a more challenging dataset.
As for recognition, see the table on page 6 of this paper . The "IIIT5K None" column should be pretty close to what was done in the OP, using the same dataset, with recognition accuracies of around 80% while the Google Vision API is at 322/500=64.4%. Note here that since this paper is only about recognition, there is no localization step before which would otherwise act as a filter and decrease the accuracy a bit by failing to localize some text that the recognition step would be able to recognize.
For example the Robust Reading on COCO text:
Guess what. Zoom in on the "PRINCE" image, and you'll see it says top-right: A MIKE NEWELL FILM. So... both google and AWS did a nice job.
It's not reasonable to expect PRINCE as the outcome.
The point another person makes below about "payloads" and MSoft is valid too... As is the g-accented (not recognized because UTF codes not processed).
Makes ya wonder.
This isn't quite true -- the rekognition API will also accept base64 encoded bytes (5MB max): http://boto3.readthedocs.io/en/latest/reference/services/rek...
As far as our experience goes, Cloud Vision API is a killer option compared to both AWS and MSFT. It's pricier than AWS though and is slower. MSFT is terrible in both price and speed.
Here are some current state-of-the-art papers + code where available about detection:
Fused Text Segmentation Networks for Multi-oriented Scene Text Detection
EAST: An Efficient and Accurate Scene Text Detector
Detecting Oriented Text in Natural Images by Linking Segments
Arbitrary-Oriented Scene Text Detection via Rotation Proposals
And for recognition:
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Robust Scene Text Recognition with Automatic Rectification
If you want to get a (slightly out of date but what can you do, the field is moving very fast) overview see this survey from 2016:
Scene Text Detection and Recognition: Recent Advances and Future Trends
edit: I would attempt my own test right now but it's been awhile since I've tried to use Google Cloud. Right now I'm getting constant "Server Error" popups until Chrome decides to crash and die just when simply checking my account and billing page. The Cloud Console's wonkiness is probably one of the reasons why I stopped using GC in favor of AWS :/
You can always test out the Vision API via the landing page (https://cloud.google.com/vision/). The full text results seem to be under the little document tab. I took a screenshot of the text above it, and it seemed to work as expected (breaking it into two paragraphs).
Disclosure: I work on Google Cloud (but not on ML APIs).
Kairos is doing better here. ;-)
MTurk is hit and miss when it comes to workers, some will just click buttons to see if they can get paid, others completely knock it out of the park.
I have another project that I have put about $100 in so far and had decent results (incredibly quickly too!)
Getting my work to "good enough", then tossing the rest on Mturk is much, much cheaper in the long run.
Microsoft was by far the best at this. Google wasn't even close.
Some of the companies that do this work have been around for many decades and have tons of photographs / scanned images, so I'm investigating ways to ingest images into a search engine to help locate old projects.
> What’s the use case?
Churches, for one.
I tried again with English text. I wanted a word list from a book that helps people learn English, so I took photos of the index. The format is word....page #, in two columns.
The results were just as bad.
I've given up on OCR, and decided I have to transcribe everything by hand. I only do it in my free time, and it's been taking months.
Is there any tool that can take a photo of a book where the pages curl towards the middle, and "flatten" it so that OCR will work better?
The one time I needed to turn a scanned PDF (600+ page book) into searchable text, I used this Ruby script https://github.com/gkovacs/pdfocr/ , which pulls out individual pages using pdftk, turns them into images to feed into an OCR engine of your choice (Tesseract seems to be the gold standard) and then puts them back together. It can blow up the file size tremendously, but worked well enough for my use case. (I did write a very special purpose PDF compressor to shrink the file back, but that was more for fun.)
"We will email you the dataset and code."
What's wrong with a github link in the article?