

How to Smell Difference Between Humans & Robots - Annoying Captcha Forms - solipsist
http://stackoverflow.com/questions/4683117/annoying-captcha-forms-how-to-smell-difference-between-humans-robots

======
ZoFreX
> I tend to include text fields (later visually hidden or obscured with
> Javascript) with "name" parameters like "email", "url" and "name". Spam bots
> always fill these in. Your users won't, because the fields are hidden. If
> the fields are filled in, your submission came from a spambot. Easy!

Fuck every website that does this. It's 2011, we should all be using password
managers. If using LastPass makes your site flag me as a robot, I'm not going
to use it.

CAPTCHAs are like crypto: Don't roll your own unless you have good reason to
and know what you're doing. Just throw in reCaptcha, or Mollom, or
(preferably) some crowd-sourced solution that leverages your userbase (the
best way that will scale, imo).

~~~
Hoff
Following the crypto analogy, reCAPTCHA is approaching the effectiveness of a
Beaufort cipher. The reCAPTCHA tool has been broken.

Various of the botnets will sail right past reCAPTCHA.

And yes, the choice here and the options here stink.

Nobody wants a CAPTCHA.

But the alternative is becoming part of the collateral damage of the arms race
seeking to fill Google with cruft.

Or shutting off all comments.

Or escalating the arms race with subnet firewall blocks, content and email
address filtering, reactive security, moderation or (if you have the
appropriate audience for it) crowd-sourcing.

~~~
ZoFreX
This is true, I was struggling to think of an option that doesn't cost money
(Mollom) or require a lot of effort (crowd-sourcing a la HN or Reddit). Is
reCAPTCHA better than nothing, or entirely worthless?

~~~
Hoff
When it comes to CAPTCHA...

Effective, Cheap, or Simple...

Pick any two...

As for your question, I switched from reCAPTCHA over to a different scheme,
though I might test with reCAPTCHA with different settings again in a few
months.

~~~
ZoFreX
I'm about to roll out a site using Mollom for the first time, will be
interesting to see how well it works.

------
drusenko
One big issue that nobody is talking about here: There is a huge increase in
networks using cheap labor to solve CAPTCHAs. A service like decaptcher.com
can have humans solve CAPTCHAs at the rate of $2/1000.

At that kind of price, all of the effort you go to to prevent automated
signups is useless -- it's just being outsourced to humans, where these
counter-measures are ineffective.

~~~
TorKlingberg
Even the cheapest manual CAPTCHA solving is still too expensive for most spam.
The expected return of a spam comment on a random blog is very low. Spamming
relies on posting millions of them for free.

~~~
drusenko
I suppose it depends on the application. For signup spam, the expected return
is definitely higher than the current cost. And unfortunately, it's also on a
website's critical path, so making signing up more annoying is strongly
against the best interests of a company.

In other words, it's a pretty hard problem.

------
cd34
Good ole stackoverflow - close topics that may bring new ideas for turing
tests.

The entire thread has a number of good ideas and relates to a thread you
started the other day. <http://news.ycombinator.com/item?id=2107972> Pity that
it didn't get more traction.

~~~
chrisbroadfoot
A lot of them are pretty bad ideas once you think about them for more than a
few minutes.

Mouse movements and clicks are easily emulated - just look at the mass of bots
for online games. There's actually quite a large industry based around selling
bots that can be used for goldfarming.

Keyboard events are similarly easy. It's very easy to write a function that
separates keypresses with human-like pauses.

Also - many of these ideas have severe accessibility drawbacks. Touchscreens,
for example, do not emit mouse move events like a mouse. Voice input will not
act like a keyboard and may look like the user has pasted into the input box.

~~~
cd34
At least he's trying to solve a problem he faces. Imagine if he took a heatmap
of all mouse movements during form submissions, ranked them as spam/ham, and
then used that heatmap against new submissions to determine a confidence.
Perhaps there is a pattern. It's difficult to know the answer to a question
that isn't asked. Maybe during data entry, your analysis returns a fact that
in human submissions, they press backspace at least once.

Sure it doesn't work for mobile browsers/touchscreens, but, on low-confidence
submissions, you present a harder, second challenge.

Most recaptcha forms are solved by humans at $1/1000 from companies. You have
caused them to use a human to solve it, which means your test worked. The fact
that you still got a spam submission is a separate issue.

Ensuring a human, non-spam response is something that needs to be done
statistically, not through a rules based system. Shared inoculation like
Akismet or Typepad help, but, often miss new trends.

I don't think the real problem is determining if it is a human, I think the
real problem is dealing with the contents of the form submission. I could care
less if the form is automatically submitted, as long as I know whether to
ignore that form. Captcha just erects a small barrier.

One of the best things I ever did to combat spam on inbound mail was
greylisting + dspam + tmda. If dspam has a high confidence that it is spam,
tmda sends a challenge/response. While I've never given out my gmail address,
the amount of spam it receives is ridiculous. My own solution, even the spam
folder rarely sees more than a few messages a week. My personal blog,
discounting trackback spam has had 71 spam posts in the last week. I use
RPXnow (requires one of the shared auth providers or someone needs to create
an account) and recaptcha with Akismet. It hasn't stopped the spam that is
received, merely what gets automatically published. I do believe that every
submission received on my blog comes from a human, or has at least had some
human interaction. Captcha doesn't solve the problem, merely erects a small
barrier and just makes sure the spam I get is more likely hand delivered.

------
blahedo
I've written about this before, and wrote a proof-of-concept plugin that I
dogfood on my own blog:

<http://www.blahedo.org/botblock/>

Two core ideas: 1) the truly hard thing for the computer is the language
understanding. It's the _question_ , not the _computation_ , that's the
problem. 2) Any site that becomes popular will be worth it to the spammers to
hard-code, so any successful widely-deployed system needs to let the users add
their own additional question templates.

~~~
solipsist
> the truly hard thing for the computer is the language understanding.

This does not just apply to understanding language. It applies to
understanding and interpreting audio, images, videos, etc. Words and phrases
are just the easiest to create in CAPTCHA's, but they are also getting easier
for computers to solve. Quite a few of the CAPTCHA's on your blog could be
solved by Wolfram Alpha (or at least close to). If it was used in masses, it
wouldn't take long for hackers to perfect the process of answering them.

> so any successful widely-deployed system needs to let the users add their
> own additional question templates

Couldn't hackers take advantage of entering their own question templates? And
relying on user-generated CAPTCHA's is extremely risky. Quality and
consistency would drop to a level where the system would became almost
useless.

------
hsuma
The problem with this idea is that if a distinctly 'human behavior' can be
calculated, it can be emulated. Also, it would require a lot of JS and other
support that some people don't have. There's also the problem of false
negatives when you consider auto-fillers that a lot of people use.

------
pornel
I notice that most people implementing CAPTCHA don't even fully realize what
they're protecting from.

There are different kinds of spam, each needs a different approach:

1\. targetted attack against a site

2\. non-targetted attack sending spam for machines

3\. non-targetted attack sending spam for humans

To protect against first case (e.g. someone trying to mass-register webmail
accounts) you have to have a proper CAPTCHA, and JavaScript tricks and hidden
fields are totally useless.

Most sites are only spammed with second type of spam, but choose protection
against the first type! However, this spam is super-easy to defeat with
content classification, because it _has to_ make spammy keywords/links
available to machines.

The third one is common in e-mail, rare on websites, but still can be mostly
defeated with blacklists, classification and technical tricks that catch
poorly written bots.

There are people implementing protection against #1, and then trying to prove
it works, because it blocks #2 and #3. In non-targetted attacks, anything that
wasn't expected by the bot writer will work, even if it's totally
misimplemented CAPTCHA that has solution in image's URL, or it's a choose-a-
kitten test that has 1 in 9 chance of guessing, or mouse movement tracking
that can be trivially simulated/replayed.

------
laughinghan
This is a great idea. The point is merely to provide an alternative to an
explicit CAPTCHA that requires interaction like reCAPTCHA that doesn't cost
less to try and break. That way, it's strictly more convenient to the user, if
they use autofill or whatever, they'll just be subject to the ordinary
reCAPTCHA that they would be anyway.

I'm imagining listening to mousemove on the document and doing something
lightweight like appending the coordinates to an array onmousemove. Then a
function that's setInterval'd every 2 seconds or something would flush the
array to the server via XHR. A server-side script would, transparently to the
user, analyze the mousetrail and if the user appeared to move their mouse
realistically and took at least like a minute to fill out the form and took
realistic amounts to type in text inputs and so on, the server will let them
skip the reCAPTCHA.

~~~
pak
If it's in JavaScript it's trivial for me to hack on the client side, and if
you insist on mouse movements I can just send along a prerecorded array of
mouse moves before my request. Maybe even tweak a few every time, so you don't
catch on right away. It's more trouble than it's worth.

------
jarin
CloudFlare (<https://www.cloudflare.com/>) does something similar to this. It
will present a CAPTCHA challenge page to users who have had suspicious
activity within a certain period of time, according to Project Honeypot. If
you want to be really stringent (at the cost of some false positives), it will
also challenge people based on HTTP header analysis, and with a paid account
they will also protect against XSS and SQL injection in POST requests.

They've done a fantastic job of cutting down the number of fake profiles on my
client Set For Marriage (a dating site where it was a huge problem before),
and the rest of their features are pretty good too (the asset caching has
resulted in about 60% faster page load times).

------
kwamenum86
Interesting ideas but several drawbacks most notably a) If you have some set
of heuristics that serve as a proxy for information (in this case "humanness")
then people can easily optimize on the proxy and win and b) the code would be
running on the client-side and as a result super hackable

------
mung
Jeff Atwood has mentioned in his coding horror blog a few times the idea of an
internet license - using Facebook, Google etc. as a credential. Could this not
be extended to anonymous comments, but requiring authentication against a real
ID?

------
16s
Check the user agent. Many bot writers never bother to change it. PycURL,
Java, urllib, etc. If you only want humans on your website, only accept
browsers used by humans. Sure the botter's may clone IE's user-agent string,
but you'll still stop the lazy ones.

------
nixy
Shouldn't using something sensing the movement of a mouse cursor be enough as
a captcha? If no mouse movement is detected, fall back to a text input
captcha.

~~~
tintin
Page opens, I press TAB to enter the form. Now I have to enter that stupid
Captha again. And what about Tablet PC's? I'm not sure there is a lot of mouse
movement there. But well maybe It could be enough for a lot of desktop users.

------
TimothyBurgess
Random but... you could take a few pictures of your hand... and go the classic
route of asking how many fingers you're holding up.

