
How Google Cracked House Number Identification in Street View - programd
http://www.technologyreview.com/view/523326/how-google-cracked-house-number-identification-in-street-view/
======
sjtgraham
Anyone else getting a lot of obvious door numbers in reCAPTCHA captchas
lately?, I guess that's how Google trained these neural nets.

~~~
amjaeger
was just thinking that google can now fool captchas

~~~
clhodapp
Google is also the one serving them, which does allow them to cheat.

~~~
somesay
reCaptcha was always a free captcha service based on one simple idea: one part
is the classical one, hopefully only readable by humans, the other one
actually supports Google on doing OCR-like jobs. Previously it were undetected
words from Google Books scans, now they are mostly using house numbers.

More interesting is the other change: reCaptcha now tries to detects real
users and then only generates a simple number captcha for the classical part.
Likely they are using your Google Account cookie or Google Analytics for that.

------
datawander
Like the "youtube cats" paper that only has 16% rate of success (which
represented a huge improvement), this doesn't appear to be anything beyond
what has already been done for text recognition, which is usually one of the
first examples held up of a place Machine Learning has done exceedingly well
with 20 years ago.

This line gives it away. If they can remove that assumption I will be
impressed, otherwise I would say they reinvented the wheel and probably could
have used something off the shelf and got similar results.

"To start off with, Goodfellow and co place some limits on the task at hand to
keep it as simple as possible. For example, they assume that the building
number has already been spotted and the image cropped so that the number is at
least one third the width of the resulting frame. They also assume that the
number is no more than 5 digits long, a reasonable assumption in most parts of
the world."

~~~
DannyBee
1\. They are doing multi-digit recognition at once, which, as the paper says
"To our knowledge, all previously published work cropped individual digits and
tried to recognize those". So i don't understand your issue with the 5 digit
limit, when everyone else is sticking to "1 digit at a time".

2\. Spotting the building number, as the paper says, is taken care of by a
different algorithm.

I'm not sure why this is also a big deal, since spotting the building number
is "not the hard part" in most cases.

3\. Your assertion that they could have used something off the shelf seems
directly contradicted by the fact that the paper says nobody has ever
published a multi-digit simultaneous recognition paper.

So i'm very curious what this "off the shelf" thing would be. Could you
elaborate?

~~~
darklajid
Disclaimer: I'm working in the OCR industry, which .. doesn't make me an
expert on either state-of-the-art recognition algorithms nor do I know enough
about neural networks to be dangerous.

That said: The GP has a point, imo. I don't doubt that the paper describes
something new and interesting (and all engines I work with during the day do
segmentation/recognize character by character), but localizing a region of
interest is usually the hard job for me. When I identify the right region and
crop it/scale it/rotate it .. my job's "easy" and I can run a multitude of
generally good OCR engines (off the shelf, if you will) and get decent results
(maybe vote a bit, use engine A to segment and engine B and C to recognize the
characters etc.)

So .. ignoring the 'we trained a neural network' part (which makes me nod
thoughtfully and mumble 'whatever they did there..'), which I _understand_ is
the interesting thing here!, they did more or less what I do all the time. The
preceding algorithm is what I do quite a bit less often and which in my
environment is more interesting and often challenging.

Then again, it can always be labeled as PEBKAC I assume :)

~~~
beagle3
So ... as someone in the OCR industry, perhaps you could answer:

I'm trying to OCR license plates from random videos (stable, but essentially
random camera location that sees cars going by and stopping - and I would like
to read their plates). I've tried every commercial offering under
$4000/license, and -- even when manually localized and a single photo
selected, I'm getting less than 95% on a single letter/digit, which translates
to ~70% for full plates. (Humans get >99% for full plate on those photos).

Where would you recommend I look next for a solution?

~~~
darklajid
Hard to tell. First: I'm a lowly developer here, right? I can't and won't sell
you stuff. In addition: While I care a lot about my craft/developing, I don't
care about the industry - I'm blind to most competition.

Regarding your particular problem: No idea about international license plates
(or the one you are interested in). It helps a lot to restrict engines to
character sets. German license plates are roughly [1]
([A-Z]{1,3})-([A-Z]{1,2})(\d{1,4}) which helps a lot.

Usually you try to combine 'dumb' OCR with datasets/fuzzy matches to rule out
errors. Only you know if that is possible for your dataset.

Depending on whether I understood your problem correctly we might again have
the localization issue (which I complained about above): Find the licence
plate, crop and rotate it. Bonus points if your images might contain multiple
license plates and a human operator would 'obviously' see the right one..

Recognition itself should be okay: Limited character set, a limited number of
fonts (here: One only) and hopefully decent binarization opportunities (here:
black on white, background reflective).

Feel free to shoot me a mail, details in my profile.

1: From memory, might be slighly inaccurate, sample only

~~~
beagle3
Thanks! It's actually Israeli license plates I'm working on right now (because
that's the video data set that I got for training so far, though this project
is going to be deployed mostly around Europe) - only digits, a standard xx-
xxx-xx template, nice standard retro-reflective black-on-yellow - supposed to
be the easiest possible case. And it is, for humans. But all the commercial
OCRs I managed to find have abysmal performance.

When I get some breathing time, I'm going to try the latest Tesseract again
(when I last tried, it was v2, and its performance wasn't good).

~~~
darklajid
If you look at my profile: I got a limited experience with Israel. The OCR
company I work for? Sits in Israel.. ;-)

I'm reasonably sure that you're not in the market for the things we do, but if
you like to chat/want to talk about the project you're doing: Again, feel free
to drop me a line. I .. really wouldn't even think about selling things or
something, I'm mostly curious about fellow HN users' projects and have an
affinity for IL.

Good luck!

------
magicalist
I think the actual paper (linked at the end) is much more informative and
interesting than this coverage:
[http://arxiv.org/abs/1312.6082](http://arxiv.org/abs/1312.6082)

(I wouldn't call it blogspam, because it looks like they interviewed the
researchers, but the summary leaves something to be desired)

~~~
aaronsnoswell
The MIT guys generally do a pretty good job with their articles. Definitely
not blog spam.

------
marc0
I must say I find this fascinating, esp two aspects: First, great idea to
train on number sequences instead of single characters. That's certainly a
lesson which is applicable in many other situation, too. Second, 11 levels and
deep learning -- a bold approach, since (from what I know) the paradigm is
that deep networks are generally not working well. They even found that
performance improves with the levels (they just stopped at 11, probably
because of resource limitations). As mentioned in the paper, this is probably
due to the fact that the network ist trained with a huge dataset, so the size
of the dataset really is the relevant factor.

------
danielweber
This is unusual: an article about neural nets that actually uses neural nets.

