
Google acquires reCAPTCHA - smikhanov
http://googleblog.blogspot.com/2009/09/teaching-computers-to-read-google.html
======
seldo
Good move. Google wants to digitize all the books in the world, and reCaptcha
does just that. Google could have tried just becoming a client of reCaptcha,
but their scale would have crushed reCaptcha, to say nothing of the business
risk of outsourcing a key stage of your signup process to a third party.

The logical solution was to buy them and Google-size their infrastructure.

From a personal perspective, I don't think this really changes how I feel
about reCaptcha either. They'll still have the same mission -- unless Google
stops them reading out of copyright classics and puts them onto scanning in
Dan Brown novels, which I think is unlikely.

~~~
amanfredi
Google already has a great captcha for their signup process. The primary
motivation must have been the OCR benefits.

~~~
seldo
Isn't that what I said?

~~~
amanfredi
"to say nothing of the business risk of outsourcing a key stage of your signup
process to a third party."

I thought you meant that Google would have actually considered ditching their
own captcha in favor of recaptcha for google sign-up.

------
byrneseyeview
I can't believe I didn't see this coming. Given Google's size and mission
statement, any company whose business generates new data must partner with
Google, sell out to Google, or kill Google if it wants to survive.

~~~
antiismist
That's an interesting insight but it goes too far. For example,
Mint.com/Intuit and Facebook can survive independent of Google.

~~~
maggie
who?

------
Keyframe
I haven't even realized that recaptcha was a company.

~~~
henning
I thought reCAPTCHA was a hip university research project.

~~~
patio11
They learned from the Google school of public relations. ("What, make money?
Hah, hah, we're just a bunch of geeks who like interesting problems, no crass
lucre here, no siree!")

~~~
yan
Wow, googling for 'crass lucre' brought this post up as result #2. Definition
please? :)

~~~
mhartl
N.B. The official term is "filthy lucre".

------
gehant
Since reCaptcha tries to differentiate humans from computers/bots, I find it
ironic that the same reCaptcha technology is being used to improve the
accuracy of how computers scan text...

~~~
ludwig
Only because every time it gets stuck, it "asks" a human.

~~~
elemenohpee
Yeah I thought the article was a little misleading on that point. They made it
seem like the human input was improving the OCR algorithms.

~~~
asnyder
I would think that the human input is improving the OCR algorithms. I assume
that every human correction trains the OCR in some way. With enough
corrections the OCR should at some point learn to distinguish between the
characters it couldn't before.

~~~
rsingel
Which means the technology will eventually make an algorithm so good that it
can solve a CAPTCHA as well as a human, thus making itself obsolete. What
other technology does that?

~~~
mildweed
viruses (the biological type)

~~~
jacquesm
Only the ones that are too aggressive. Most viruses shoot for a very nice
balance between killing the host and keeping enough of them around for the
next batch.

The weirdest effect of this is that the most dangerous viruses tend to burn
out. If one of those ever came along with a really long incubation time for
the disease but a much longer time for contagion that might be a problem.

~~~
slyn
HIV/AIDs?

~~~
jacquesm
No, think Zaire Ebola with a month or two of transmission before the first
symptoms. That would seriously suck.

There are horror movies that scare me much less than something like that.

------
mixmax
Some users from 4chan managed to hack recaptcha and embarrass Time magazine
seriously at the same time not long ago.

Basically they stuffed the ballots on the Time online voting page where users
could vote for most influential person of the year. Most influential person
ended up being Moot, founder of 4chan. Just to rub it in they managed to spell
"marblecake also the game" out of the first letters of the top 21 entries.

Note: This hack shows more about Time's incompetence than recaptcha's.

In-depths story here: [http://musicmachinery.com/2009/04/27/moot-wins-time-
inc-lose...](http://musicmachinery.com/2009/04/27/moot-wins-time-inc-loses/)

~~~
brown9-2
No, they didn't come close to "hacking recaptcha":

 _Update – Just to be perfectly clear, anon didn’t hack reCAPTCHA. It did
exactly what it was supposed to do. It shut down the auto voters instantly and
effectively. The only option left after Time added reCAPTCHA to the poll was a
brute force attack. Ben Maurer, (chief engineer on reCAPTCHA) comments on the
hack: “reCAPTCHA put up a hard to break barrier that forced the attackers to
spend hundreds of hours to obtain a relatively small number of votes.
reCAPTCHA prevented numerous would-be attackers from engaging in an attack. In
any high-profile system, it’s important to implement reCAPTCHA as part of a
larger defense-in-depth strategy”. As Dr. von Ahn points out “had Time used
reCAPTCHA from the beginning, this would have never happened — anon submitted
tens of millions of votes before Time added reCAPTCHA, but they were only able
to submit ~200k afterwards. And to do this, they had to resort to typing the
CAPTCHAs by hand!” One thing that Time inc. did that made it much easier for
the anonymous hack was to allow leave the door open for cross-site request
forgeries which allowed anon to create a streamlined poll that never had to
fetch data from Time.com._

~~~
mixmax
That's why I wrote _This hack shows more about Time's incompetence than
recaptcha's._

Sorry if I didn't make it clear enough that the fault lay with Time and not
recaptcha.

------
apowell
I preferred thinking that the CAPTCHAs on my websites were doing a bit of good
for the world - now I'm just facilitating free labor for Google's benefit.

~~~
lsb
You're still getting free bot-prevention. If you don't like it, use something
like Damien Katz's Negative CAPTCHA (hide a field with CSS, call it email, and
wait for a bot to fill it out, and check that that field is empty server-
side).

~~~
secret
That's clever!

~~~
lucumo
And completely unsustainable. It only takes a parser that checks if the field
is visible. It's just a simple matter of adapting. reCAPTCHA is much harder to
work around for a spammer.

~~~
natrius
Distinguishing humans from computers is an inherently unsustainable problem.

~~~
lucumo
Yes. But it would be nice to have something that at least requires somewhat of
a break-through instead of another afternoon of coding.

------
biohacker42
I read this as Google basically giving up on CAPTCHA. Let me explain.

For a while google had one of the hardest CAPTCHAs but bots just keep getting
better, and they were cracking it more and more often. But you can't just make
the CAPTCHA even more difficult, it was already fooling a lot of the humans.

Note that a bot does not need to successfully solve it 100% or even 90% of the
time. I'm not sure of what the exact figure is but at some point the bot
reaches parity with humans even if it only succeeds say 25% of the time.
That's because on average now it only takes 4 guesses to guess correctly.

And I don't think reCAPTCHA is stronger then the multicolored CAPTCHA google
had (still has?)

I think google is realizing that they just can not stop the best CAPTCHA
cracking bots, maybe they can stop a lot of them, or a lot of the not so smart
bots.

But they also can't just give on CAPTCHA, and let anyone and any bot in
without even trying.

Thus reCAPTCHA because you might as well do a little public good while you're
at it. If nothing else, you've at least made some old texts more readable.

------
sam_in_nyc
It's my guess Google is in this for the cookies and analytics that can be
gathered by (however many) websites that use ReCAPTCHA.

As a trivial side-note: When encountered with a ReCAPTCHA, I'll fill out one
of the words and put in gibberish (or other text) for the other. For some
reason I find it satisfying to "pull a fast one" on any captcha service.

~~~
mseebach
> For some reason I find it satisfying to "pull a fast one" on any captcha
> service.

Why? What do you achieve except being told you're wrong every once in a while?
I mean, you're trying to bother a machine. Isn't that a bit like a reverse
Turing test?

I suspect the amount of noise that ReCAPTCHA filters out from automatic
attempts is several orders of magnitude larger than anything any group of
actual humans can generate.

~~~
sam_in_nyc
It takes no extra effort to enter in the wrong item, so the cost of doing so
is about zero.

As far as what satisfaction I gain from it... I suppose I find it to be a sort
of rebellious act. Also, I did not get accepted into Carnegie Mellon.

------
stingraycharles
Wait. I must be missing something:

 _Since computers have trouble reading squiggly words like these, CAPTCHAs are
designed to allow humans in but prevent malicious programs from scalping
tickets or obtain millions of email accounts for spamming. But there’s a twist
— the words in many of the CAPTCHAs provided by reCAPTCHA come from scanned
archival newspapers and old books. Computers find it hard to recognize these
words because the ink and paper have degraded over time, but by typing them in
as a CAPTCHA, crowds teach computers to read the scanned text._

The way I understand this, is that the user is presented letters from archival
newspapers and must type in the text he sees, and recaptcha uses that text to
improve OCR. But doesn't that imply that recaptcha was unable to interpret the
scanned text before ? If so, how can it then verify the correctness of the
text the user types in ? If not, how exactly is this helping OCR ?

~~~
fizx

      > But if a computer can't read such a CAPTCHA, how does the 
      > system know the correct answer to the puzzle? Here's how: 
      > Each new word that cannot be read correctly by OCR is 
      > given to a user in conjunction with another word for which 
      > the answer is already known. The user is then asked to 
      > read both words. If they solve the one for which the 
      > answer is known, the system assumes their answer is 
      > correct for the new one. The system then gives the new 
      > image to a number of other people to determine, with 
      > higher confidence, whether the original answer was 
      > correct.
    

<http://recaptcha.net/learnmore.html>

------
gengstrand
It's a brilliant move in using collective intelligence. You get better quality
CAPTCHAs. You get more accurate digitization on the cheap. I go into more
detail on this at [http://it.toolbox.com/blogs/future-of-work/google-
acquires-r...](http://it.toolbox.com/blogs/future-of-work/google-acquires-
recaptcha-34202)

------
DarrenMills
Righting your own CAPTCHA seems like a small task for Google. reCAPTCHA wasn't
exactly encroaching on Google's market share... so what exactly did this get
Google? I think they have a bigger plan, as always.

Ideas?

------
fizx
I find it funny that Luis also posted:

<http://vonahn.blogspot.com/2009/07/hottest-people-in-cs.html>

------
yarapavan
A list of about 50 scientific articles on "captcha" available at
<http://www.citeulike.org/tag/captcha>

------
beeker
Is Louis von Ahn going to work for Google as part of the deal?

~~~
jchonphoenix
I'm currently doing research for Luis as part of Gwap, and as far as I can
say, he's going to remain on the faculty at CMU but also work for Google out
of the Google PGH offices. He's no longer teaching 251, however, which is what
he's famous for around here (at least to students). What this means for me,
however,...

------
dunk010
Think of all the sites which use recaptcha for their signups - google will get
some interesting stats through this.

------
joshu
Heh. Now there are three TR35s from the same year at Google.

------
mercury888
yep - they are going to monetize captchas. How typical of google :)

