
The No CAPTCHA problem - homakov
http://homakov.blogspot.com/2014/12/the-no-captcha-problem.html?
======
geerlingguy
This is why I still recommend using other form spam prevention techniques
before sacrificing usability for a CAPTCHA. One of the most effective
combinations for 80%+ of the sites I've ever dealt with is having a honeypot
field in the form, plus some amount of time required to pass before the form
can be submitted successfully. There are other ways to mitigate bots as well,
but these two alone have sufficed for quite some time for anything but
dedicated human-based attacks.

Granted, I think the checkbox CAPTCHA is much better than the UX disaster that
is the 'type some hard-to-read letters' CAPTCHA, but it's still adding a
burden on the user, rather than a burden on the bot.

(Source: I maintain the Drupal Honeypot module[1], and have used it in a ton
of different situations where CAPTCHA/reCAPTCHAs would normally be
recommended).

[1]
[https://www.drupal.org/project/honeypot](https://www.drupal.org/project/honeypot)

~~~
austenallred
The weird thing about this entire No Captcha solution, in my opinion, is that
it assumes that a captcha is the most efficient method for defeating spam.

In most blackhat circles, captchas are an afterthought. You figure out
everything else (IPs, original content), then plug in a service like
deathbycaptcha that solves the captcha for... looks like $1.39 per 1000
(thanks to ultramancool for the correction).
([http://deathbycaptcha.com](http://deathbycaptcha.com)). What nocaptcha does
is only show that captcha (which is already defeated), to a certain subset of
the users who haven't been deemed trustworthy. So, the big bot builders will
take a day or two and beat the system, and we're right back to where we
started.

Honeypots, however, are brutal - especially if you throw a couple in there.
When building a bot you build it for efficiency. If your site does anything
abnormal (whether it's 'what's n+n?' or 'what popular figure comes through
your chimney in December?') a bot is hopeless.

That being said, however, a bot is only hopeless so long as a solution isn't
implemented widely enough to be worth breaking through for spammers. If, for
example, Wordpress came up with 1000 questions like that, someone somewhere
would come up with and sell 1000 solutions.

In some sense it may be the case that Google is one of the worst companies to
create a a simple anti-spam API. I'm sure there's something they could do that
would be more effective than this, but this won't really move the needle.

~~~
corobo
With both of your examples (and many others I've come across) those question
type captchas can be done with a quick ping to Google and a sanity check on
the answer

"what popular figure comes through your chimney in December" -> "Santa Claus -
Wikipedia, the free encyclopedia"

"what's 1+1" -> "2"

They only really work if maybe the question is in the market of the site
you're registering for "What's <some popular guy on site>'s last name" etc

~~~
benzoate
Well the Santa Claus question will defeat bots for now at least...
[http://imgur.com/qX8pWrQ](http://imgur.com/qX8pWrQ)

~~~
RyanMcGreal
That didn't take very long.

------
compbio
I do not get the problem of hiring a clickfarm for 1$ an hour to click on cat
pics.

If we take reputation, IP and cookie. All must be in order to pass. We want to
spam a 1000 forms today. Scenario 1: The clickfarm itself fills in the
Captcha. Result: Their IP's will soon be blacklisted, reputation of a third-
world account will be inherently low. Scenario 2: We let the clickfarm send
the answer to our own bot, which selects the right pictures. Result: Google
will see a single IP and cookie trying out 1000s of captcha's a day, and ban
you. Scenario 3: We let the clickfarm send the answer to our own bot, this bot
uses a list of proxies that haven't yet been banned. Result: Google will see a
single account cookie trying out 1000s of captcha's a day, from different IP's
and ban you.

Can anyone come up with a scenario which involves reputation, IP and cookie
that does not end up with Google detecting and banning your efforts? Cookie
swapping?

~~~
probably_wrong
Here's a scenario: a dissident living in a third world country with pervasive
surveillance. He accesses the net using TOR, and disables cookies.

Now his IP is blacklisted, because there are lots of people using the same
exit node; his reputation is low for the same reason, and the cookie is
rejected. There's a good chance that this one person will be blocked, even
though he didn't do anything wrong.

For a simpler case, private browser sessions over a VPN would suffer from the
same issue.

~~~
unreal37
I would argue that the problem of spam and hackers is a greater burden on
society as a whole than someone in Iran not being able to get past a captcha.

~~~
probably_wrong
I see where you are coming from, specially considering that spam makes up for
a significant volume of the entire internet traffic. However, I'd think it
wiser for one spammer to go free than for one person to be denied access to
legitimate content.

I'm often being denied access to free content because I'm accessing from the
"wrong" countries, and that's infuriating. If I start being locked out of free
content due to my privacy measures, I'm probably going to start setting
buildings on fire.

~~~
jobposter1234
Do you think content creators should not be able to control access based on
their own criteria? Are you somehow "owed" access rights to free content?

~~~
barsonme
I'm not probably_wrong, so I can't speak for him/her.

But while _I_ believe content creators should be able to control access, I
think it's ridiculous to ban certain countries from access to certain content
-- I don't quite see the point.

And also, being "owed" free content != being able to access free content you
otherwise would be able to access were it not for a service like TOR or a VPN
(e.g. escaping the Chinese firewall using a VPN service whose IP is banned
from a website versus wanting to watch a movie but living in Germany instead
of the U.S.)

------
googoodolls
I have a hunch that it is Google's attempt to be on every form and know more
about a Google user and their accounts on other websites. At least what
websites they signed up for. I can stop Analytics but this is now out of my
control. This is what a website owner required me to do to access their
website.

~~~
homakov
Oh, it's already too late. Google already has enough data about you. I can
imagine the future - people train bots like kids, make them visit different
websites, google things and pretend to be humans. Your search history will be
like your credit score.

~~~
hdgb
I wouldn't be surprised if approval for a Visa depended on your search
history.

~~~
homakov
Oh yes [http://newsfeed.time.com/2012/01/31/british-tourists-
tweets-...](http://newsfeed.time.com/2012/01/31/british-tourists-tweets-get-
them-denied-entry-to-the-u-s/)

~~~
Kequc
It's a story about american airports it is to be expected. You should probably
give your nose hairs a good trim someone might think you're going to pluck the
longest one out and strangle the country to death.

------
fredley
What is the No CAPTCHA problem? What's being described here are problems that
apply to all CAPTCHAs. Whatever 'human' detection system you put in place,
humans can always be hired to solve them. The point of No CAPTCHA is not to
fix these problems, it's to make it easier for 90% of people who don't care
too much about cookie privacy etc. (or most likely have no idea it's even a
thing).

~~~
homakov
The problem itself is described in the end: it's about using clickjacking to
get a valid token on behalf of "good guys". And this problem has nothing to do
with existing systems.

Google could have made it so much easier and more secure: a POST request to
google.com/verify_me will have Origin header in it to prevent CSRF (only
wordpress.com scripts will be able to get token). Also there would be no need
to make a click. No CAPTCHA looks fancy but the real No CAPTCHA should always
have visibility:none!

~~~
raverbashing
"No CAPTCHA looks fancy but the real No CAPTCHA should always have
visibility:none!"

I agree, but I suppose they want something that's a Placeholder, if the user
needs to type a captcha

~~~
homakov
Why? If no need to type any captcha - do the verification in the background,
don't show me anything until you think I'm a bot

~~~
raverbashing
Because of page layout. Having a fixed size element is better than having
something (that is not yours) that might be there or not.

~~~
kissickas
There's still no need for a click.

~~~
eridal
IMHO the need for a click is just to lazy loading and thus, reducing server
demand

~~~
kissickas
Couldn't they just trigger that on form submission, then? "Please wait while
we confirm you are human" is better than clicking and then waiting, and then
submitting upon completion.

------
egsec
How many photos are in the universe of possible photos? How long would it take
for outsourcing the process to tag all photos so a script could then do the
matching?

Is the whole point of this to encourage hackers to get working on this AI
challenge of identifying similar photos?

Either they need to hire a lot of people to sit around making these sets or
they have an automated way of creating these sets which can be reversed. It
would seem to be an arms race where google is paying people, but attackers can
have people break it at a cost less than creating them (takes less time to
match them up then to find good photos, clean them up, tag them, etc.).

An attacker would also just target the database where this is all stored. With
the text recaptcha, it would seem that they have all of these photos and
scanned books and you have 8+ character strings of [a-zA-Z0-9], random
guessing would not be good enough, so the attacker needed to solve the OCR
problem.

However, given the option to select x of 9 images, if you assume that the
extremes are less likely of 1/9, 2/9, 8/9, 9/9- then I can hope to get lucky
picking 4 or 5 each time, the order does not matter. If you distribute the
attack to get around rate limits, etc. - perhaps just picking the first
through fifth images gives you a sufficiently high success rate.

~~~
mikejarema
I think a good chunk of the images are captured by way of Google's Streetview
vehicles [1]. I'm seeing blurry images of house and apartment numbers all the
time. So I'd imagine there are always new images popping up that Google can
feed into the recaptcha system that haven't been seen before.

[1] [http://www.google.com/recaptcha/intro/#creation-of-
value](http://www.google.com/recaptcha/intro/#creation-of-value)

~~~
egsec
Correct, I am referencing the new nocaptcha system. Those images would get
stale as opposed to those in the traditional scanned book, street signs, house
numbers in the recaptcha.

------
ChrisArchitect
I see this from totally a user experience side. NoCAPTCHA isn't about
defeating spam (you're right, the spammers are going to solve it/hire someone
to finish the job etc) - it's about making a better experience for the humans,
while slowing down the spammers a bit. (contrary to current recaptcha system
that slows down spammers a bit or not at all, and makes life mostly more
crappy for humans)

------
xlayn
People love not to think... Google is a business and the primary objective of
any business is to make money (the vision/mission and others is for the people
who love free lunch) Why captcha? to provide a service in a trade for "free"
human recognition capabilities.

Q//But google now is better at recognizing those numbers.... A//Right...
that's why they now request the next "way to expensive" to implement "free"
service from you, your recognition... and association capabilities.

~~~
yhlasx
>>People love not to think

O_O

judging by the comment you wrote right after that, I would assume you are one
of the people who likes not to think.

They are making people click checkboxes and deviating from the old model of
recognition. Your comment makes no sense.

~~~
xlayn
Yeah every once in a while... a little bit of heuristics, a little bit of
laziness.

------
FunCaptcha_Jim
Interesting perspective on the changes! Our lead designer actually had similar
concerns (can read them here: [https://www.funcaptcha.co/2014/12/04/killing-
the-captcha-wit...](https://www.funcaptcha.co/2014/12/04/killing-the-captcha-
with-black-boxes-and-false-positives/)). You both look to be drawing the same
conclusions. What are your thoughts on the metaphorical 'black box' being
implemented into the new reCAPTCHA?

------
johnvschmitt
The picture recognition test is particularly annoying. Even in their example,
of "match this" (cat), are we to assume we're matching all cats, or just cats
of that color?

If they have to make very careful sets of photos to avoid confusion, then the
sets of photos will be small enough to build lookup libraries for bots.

------
sarciszewski
I'm reading this page: [http://homakov.blogspot.com/2013/05/the-recaptcha-
problem.ht...](http://homakov.blogspot.com/2013/05/the-recaptcha-problem.html)

Why don't they just invalidate the current challenge when a new one is
requested? :S

~~~
homakov
There's no session ID for current user. They can try to use IP as identifier.
Admins can send remoteip to google to prevent spoofing but that parameter is
optional and I suppose they don't rely on it.

~~~
sarciszewski
... Okay, why not establish a session then?

~~~
homakov
Would require an extra roundtrip... Problem is that you get challenges with
client side and solve it with server side. It's _website_ who should go, get a
challenge for you, put it in your session cookie and make sure you don't go
and get another one. Which complicates it a lot

------
pearjuice
Trigger warning: passing the CAPTCHA on homakov's demo page
([https://homakov.github.io/nocaptcha.html](https://homakov.github.io/nocaptcha.html))
registers an account (blog?) at wordpress.org.

------
hippich
Another shameless plug - [https://hashcash.io/](https://hashcash.io/) :)

~~~
homakov
In your demo I'd be more careful with user input

>$url =
'[https://hashcash.io/api/checkwork/'](https://hashcash.io/api/checkwork/') .
$_REQUEST['hashcashid'] . '?apikey=[YOUR-PRIVATE-KEY]';

hashcashid can change URL completely to something like
../../newpath?newparams#

~~~
hippich
it is always battle between make it simple to understand and bring best
practices... in this particular case i chose simply to understand :)

------
_cpancake
The goal isn't to make things harder for bots, it's to make things easier for
users.

~~~
homakov
they made it easier for users _and_ for bots :)

~~~
_cpancake
It's no easier for bots. They still have to answer the old OCR challenge or a
computer vision problem.

~~~
homakov
Using clickjacking we can get lots of valid tokens, no need to solve
challenges.

~~~
_cpancake
You don't think Google will figure something out when a bunch of tokens from
different IP addresses are all being used by one IP?

~~~
homakov
It can be helpful. There's (optional!) remoteip parameter server can use to
send google IP address of current user. As in wordpress demo sometimes we can
send requests with the browser.

~~~
kuschku
And additionally it’s easy to just create empty Google accounts and then use
them with the bots. Just create a few dozen accounts, use them with a few
hundred bots, and you easily get full verification.

------
astralship
i think there are big issuses on the horizon here, it's going to get
increasingly difficult to find simple problems that humans can solve and not
bots. i'm not sure there is a fundemental answer

------
CraigKelly9
Google New reCaptcha using PHP - Are you a Robot?

[http://www.9lessons.info/2014/12/google-new-recaptcha-
using-...](http://www.9lessons.info/2014/12/google-new-recaptcha-using-php-
are-you.html)

------
rcyn
That's quite funny

------
finid
No CAPTCHA reCAPTCHA is not all Google is claiming it is. It only works if
you're logged in to the site, so what's the point?

See [http://ur1.ca/iza9d](http://ur1.ca/iza9d)

------
aaron695
Random blog article destroys entire Google team of high paid professional
engineers specifically employed to solve this problem and they did it just
using incognito mode.

Upvote FTW.

~~~
homakov
Lol no, i simply found original article too promising with minimum technical
details so I decided to dig. And a weakness is a weakness, not a
vulnerability. Something to think of.

------
yhlasx
Seriously guys? This made to the top of the front page? First of all, to all
people saying "HUR DUR GOOGLE WANTS YOUR BROWSING DATA", well they already
fucking have/had it for a looong time.

Secondly, If you tell me that one dude [author] ruled the one+ year work of
the engineering team at google as a flaw and simplified it as [So what Google
is trying to sell us as a comprehensive bot detecting algorithm is simply a
whitelist based on your previous online behavior, CAPTCHAs you solved.] and
that you believe it, I would question your intelligence.

This is supposed to be tech savvy community at least to some degree, what the
fuck.

Now, in the google's blogpost it reads [Advanced Risk Analysis backend for
reCAPTCHA that actively considers a user’s entire engagement with the
CAPTCHA—before, during, and after—to determine whether that user is a human.]

[However, CAPTCHAs aren't going away just yet. In cases when the risk analysis
engine can't confidently predict whether a user is a human or an abusive
agent, it will prompt a CAPTCHA to elicit more cues, increasing the number of
security checkpoints to confirm the user is valid.]

So my guess would be they analyze users behaviour on the page where captcha is
located, things like mouse movements, time it takes to type out the words,
spelling mistakes corrected and whatever else humans do differently than bots
- and only then combine that with your historical cookies. Maybe it is much
more complicated than that, I, as well as you, don't know the details.

Do you really think that they would go ahead and implement a such system
without rigorous testing of effectiveness? I am sure that they tested it
extensively with users, AND with bots, and decided that it is better than the
current system, and ONLY then deployed it. Rant off.

~~~
homakov
>So my guess would be they analyze users behaviour on the page where captcha
is located, things like mouse movements

If they can track mouse movements why in incognito mode i'm not a human for
them anymore? I was expecting same but from what I see it's just a whitelist.
And it's OK. Problem is, which you probably didn't care to read, is it's
vulnerable to simple clickjacking which opens another weakness - i can use
your click on my page to get your reCAPTCHA token and feed it to my spam bot.

I'm actually happy with No CAPTCHA, because it's making progress. But it's not
good enough (see the rest of comments, it could be a background AJAX request
instead).

~~~
yhlasx
>>which you probably didn't care to read

I did read it. My point is, you, or I, or anyone for that matter does not know
the inner details of how it works.

>>If they can track mouse movements why in incognito mode i'm not a human for
them anymore?

Maybe having a clean cookie history is not good enough during the risk
assessment.

Look, my entire point is, google is not a joke company. I am certain that they
tested it for effectiveness before deploying.

~~~
homakov
> I did read it.

So what do you think about clickjacking issue? I made an assumption about
their algo and maybe I'm wrong and they do track your mouse, but there's
_exploitable_ weakness. My post is 1) your algo seems simple 2) here's a bug
in it.

~~~
yhlasx
The curious thing is, I could not replicate the clickjacking issue. Everytime
I make a click on original wordpress registration page, I am verified as a
human immediately.

If I do the click on your github page, I get a challenge. My clicks were never
accepted as human on your github page. My clicks were always accepted as human
on wordpress page.

~~~
homakov
No incognito tab? Maybe they fixed it

