
CAPTCHAs decrease conversion rates - harrybr
http://www.90percentofeverything.com/2011/03/25/fk-captcha/
======
citricsquid
This sounds like it's coming from someone who hasn't had any real experience
with large scale spam problems.

We operate a forum with 250k members and ~800k posts per month, a new
registration every minute and we get so many spam bots even _with_ captcha
(mechanical turk etc) and without captcha it's unworkable. Captcha is a
necessary evil, but it does help.

This seems to be coming from someone dealing with a site where spam wouldn't
be that much of a problem, who would sign up to animoto to spam? Very silly
post.

~~~
jcromartie
They are proposing alternative methods that don't put the burden on the user
in the form of explicit action. By using honeypot fields that would only be
filled out by a robot, and timestamp analysis which effectively detects
automatic form submission, they can weed out the bots without asking their
users to do anything.

What's so silly about that?

~~~
Ysx
Honeypot fields and their ilk are easy to bypass with a focused attack. For
smaller sites, that's fine - who's going to make the effort to target you?
Keep out the opportunistic bots rattling your contact form, and life's good.

For juicier targets, something more sophisticated is necessary. Captchas are
one answer.

~~~
darkmethod
Are there alternate yet equally as effective measures than what has been
discussed? If so I'd like to hear more.

~~~
mkr-hn
Ant-spam services like Akismet.

------
ejames
Since several commenters have been asking for an explanation of honeypots and
timestamps, here's a link[1] I happened to run across just recently and a
quick explanation.

\- Honeypots: Add a field to your form that is styled to be invisible to
normal human users, such as being located off the screen, sized to 1 pixel, or
placed behind/under images on the page. Bots examine a page through HTML
rather than through eyesight and will not distinguish these fields. Reject
submissions which have entered text in the honeypot fields. \- Timestamps:
Some spambots operate by 'playback' - a human fills the form out correctly
once, then copy-and-pastes the form output into a script that replaces the
comment text/etc. with desired spam links. Place a hidden field in your form
that contains a timestamp (possibly hashed or combined with other form
output). Reject submissions which contain a timestamp far in the past,
indicating a bot which is 'playing back' an old submission.

The idea with defeating spam is not to be 100% accurate with unbeatable
security, since no matter your system, a bot tailored to your site can defeat
it. However, putting several simple techniques together can defeat general-
purpose bots that shotgun spam across many sites. This reduces spam to levels
that are manageable by hand.

[1]<http://nedbatchelder.com/text/stopbots.html>

~~~
jcromartie
Another form of timestamp analysis is to detect submissions that happen too
quickly. A spammer's signup script is likely to fill out the form and submit
it nearly instantly. Of course a spammer could beat this by waiting a small
randomized amount of time, but that makes spam signups more expensive and
might also deter them.

~~~
falcolas
Many automated form fillers for normal people, such as LastPass or even
FireFox's form fillers will fill out the submission forms and submit them
quickly as well. Perhaps not as quickly as an automated script, but worth
looking out for.

~~~
zerd
Well, that's for login. For registration it is usually not that fast.

------
Jabbles
"We left the test running until the results were statistically significant to
a 99% confidence level."

This is _absolutely_ the wrong thing to do - not that I'm doubting the
conclusion, but the data does not support that confidence level.

<http://www.evanmiller.org/how-not-to-run-an-ab-test.html>

~~~
jarin
This is why I'm terrible at statistics. To me, it looks almost like wave
function collapse.

The results are good only if you do a set number of observations (say 500)
instead of waiting for a significant result (say it happens at 623). But what
if you had decided to run 623 tests at the beginning?

~~~
Jabbles
No problem with that. But compare these two experiments:

    
    
      for i in range 623:
        data.add_result()
      s = calculate_significance(data)
      if s > 0.95:
        publish()
    
      for i in range 623:
        data.add_result()
        s = calculate_significance(data)
        if s > 0.95:
          publish()
          break
    

The second one gives you many more chances to succeed, which must result in
your confidence in the answer going down.

------
jasonkester
I run a Travel Blog host, and get several hundred spam attempts per day,
accounting for more than 90% of the posts on the site. Still, I refuse to put
CAPTCHAs in between my users and what they want to accomplish. It's just a
terrible use experience.

Instead, I use a combination of human detection scripts, bayesian filtering,
and moderation. Combined, this keeps the site pretty much 100% spam free from
the perspective of our end users, and more importantly, Googlebot.

More details here:

[http://www.expatsoftware.com/articles/2010/03/care-and-
feedi...](http://www.expatsoftware.com/articles/2010/03/care-and-feeding-of-
happy-spammer.html)

~~~
nbaumann
I agree, no matter how much a problem you have with spam, you shouldn't put
the burden of combating it on the user.

------
chime
I have been using pseudo-timestamps and honeypot fields for a while now and it
has worked pretty well for me. I get a bit of spam every now and then but it
is usually someone manually copy-pasting. I could safely block those too but
it is infrequent enough that I don't need to bother.

Here's my algorithm:

    
    
        /form-show
    
          fieldhash = hash(ymd(today))
          valuehash = hash(remoteip + ymd(today))
    
          <input type=hidden name=fieldhash value=valuehash>
          <input type=text name=email value="" style=display:none>
    
        /form-validate
    
          field0 = hash(ymd(yesterday))
          value0 = hash(remoteip + ymd(yesterday))
          field1 = hash(ymd(today))
          value1 = hash(reomteip + ymd(today))
    
          if(post[email] != "")
            // reject form
    
          if(post[field0] == value0 || post[field1] == value1)
            // accept form

~~~
jasonkester
Wait until your site gets popular. It'll get fun sooner or later.

The latest thing I'm seeing on my site is a robot that automates real web
browsers, jumps between ip addresses, scrapes real user content off the site,
then posts it back using some form of Markov generator to make the content
look unique. It'll do that on new accounts for weeks before trying to insert
any links.

It's amazing the lengths spammers will go to to get their content onto your
site. In this case, the crawler is clearly written specifically for my site,
even though it's only PR4 and nofollows all its links. It's no wonder 99% of
the content on big sites like Blogger is spam.

~~~
khafra
That's getting dangerously close to xkcd's gold standard of spambots that
actually make well-written, useful contributions to the discussion.

~~~
IgorPartola
Link: <http://xkcd.com/632/>

~~~
corin_
Actually he was referring to <http://xkcd.com/810/>

------
JoachimSchipper
The second half of the article reveals that (in a specific case) removing the
CAPTCHA improved conversion from 48% to 64%. I didn't much like the rest of
the article, but this is interesting.

~~~
showerst
What they failed to mention is what percent of that boost came from
autofillers/spammers.

They say they successfully used timestamp/honeypots to keep out spammers; if
so, how many spammers did they keep out? If it was tons, then say so, that's
useful information. If it wasn't very many, then they didn't need the CAPTCHA
in the first place.

~~~
RyanMcGreal
I'd be interested in knowing whether the application itself is designed to be
immune to autofilled accounts. Assuming people use it to create slideshows
they can then share with their family/friends and not socially/crowdsourced a
la flickr, a bunch of bots with garbage accounts no one has to look at
wouldn't actually harm anyone else's experience of the site.

------
TamDenholm
I've got to say that from a developers perspective its worth trying whenever
possible to not put CAPTCHA in a form if at all possible for the benefit of
your customers. No one enjoys filling out a CAPTCHA. I'd say trying honeypot
fields and timestamps, hashed value matching, etc that are all invisible to
the end user.

I think not being a lazy developer in order to allow your customers to not
make as much effort is a good thing. Only at a point where other methods dont
work should you then employ CAPTCHA.

~~~
mayank
This just encourages spambots to upgrade their technology. You could upgrade
spambots quite easily by just running them inside a headless browser with full
javascript support, like phantomjs.

There is very little distinction between writing a phantomjs unit test and
writing a spambot.

~~~
TamDenholm
You can apply the same to CAPTCHA as well, its not hard to automate CAPTCHA
input either. But the reality of the situation is that the vast majority of
spam bots are simple and for every additional check you put into your form you
increase the effectiveness by a magnitude.

~~~
mayank
You can't do the same with captcha because that would require a degree of
brute forcing. And my point was that existing spambots could _trivially_ be
upgraded to handle hidden form values and keystroke timers and other automated
javascript validation.

~~~
roel_v
Sure, they could be, and also to trivially solve captchas on mechanical turk,
what's your point?

~~~
mayank
I'm not sure what you're talking about.

With "invisible" JS form validation of any sort:

1\. you run your _existing_ spambot software through phantomjs.

2\. your unmodified bot fills in all the visible fields without changing a
single line of code, and the webkit backend transparently computes your hashes
and other automated javascript "human" tests.

3\. again, your _existing_ "stupid" spambot code submits your form, and your
site is now overrun by spam.

With Captcha, you get an image and a unique ID that is validated at the
server. Sure, you could run it through mechanical turk, but I'm guessing that
a few CPU cycles to load a webkit backend is still vastly cheaper than farming
work out to MechTurk.

My point is that _you wouldn't even have to change your spambot software_ to
defeat these "new" validations, and they can be trivially overcome, as opposed
to MechTurk+reCaptcha. Add to that the benefits of targeting sites that are
relatively spam-free, and you have a real incentive for spammers to simply
plug-in phantomjs instead of using WWW::Mechanize or what have you.

~~~
roel_v
The point is that all these measure are 'trivial' to break, and so are
captcha's. Except with captcha's you impose a burden on your user, and with
other techniques you can offload that burden to the developer. I'm not sure
what the 'existing' part in 'existing spambot' has to do with it - the time it
would take to add farmed captcha solving is marginal (you don't even have to
mech turk it - most captcha's are broken with OCR software readily available
on the underground market anyway).

captcha = sign of clueless or lazy, or both, developer. I don't put up with it
anymore - I have yet to meet a single registration that I actually _need_ that
uses a captcha. I'm not the only one, either.

------
JonoW
Sometimes I think we have it wrong. Instead of trying to determine if someone
IS a spammer, why not try figure out if they're definitely NOT a spammer.

So start with a pessistic view they they are, and that they need to be shown a
CAPTCHA. Then do some analysis to try figure out if they're legit, e.g. time
spent on page, mouse/keyboard interaction, geo-location, referrer etc.

If they're all good, don't show them the CAPTCHA (perhaps just rely on
honeypot inputs), otherwise show them a CAPCTHA as a next step after posting
content (and apologise in case it's a false positive).

~~~
puredemo
Isn't this the whole point of a reputation system?

~~~
JonoW
Sure, if you have authenticated users, I'm talking about anonymous users

------
StavrosK
I'd just like to point out that the way they did their A/B testing might be
flawed, you can't run the test until you get a certain confidence, you have to
decide beforehand how long you'll run it. They seem to have run it until they
got 99% confidence, which is probably the wrong way to go about it.

------
gcr
Here's an idea: Force registrants to submit a computationally expensive token
along with their registration form. Perhaps it's computed with javascript.
Users usually spend more than 15 seconds on the form anyways, and spammers
will hate to peg their hardware like that.

Any thoughts?

~~~
jeremyw
Fun, a number of blog plugins have picked up the hashcash ideas, e.g.
<http://wordpress.org/extend/plugins/wp-hashcash/>.

Add 100ms-of-2011-avg-cpu computation and tie it to the submit button
(avoiding any complications interleaving with user activity). So that deals
with first-order dumbbots and makes life a little harder for Javscript-
executing (but still volume-based) folks. Marry to a bayesian system to handle
the third-order mechanical turk-style miscreants.

~~~
gcr
I see! Interesting. Thanks for sharing this.

For curious people, Wikipedia also has related information:
<http://en.wikipedia.org/wiki/Proof-of-work_system>

------
eli
Sure, until you get hit by a 10,000-ip-strong botnet all trying to fill out
your form at once.

~~~
harrybr
The article states that Animoto use "honeypot fields and timestamp analysis"
instead of CAPTCHAs, which they claim has been effective to date. What do you
think of this?

~~~
eli
I use honeypot fields myself and they stop a ton of spam submissions. I'm sure
timestamp analysis can be very effective too. I'm totally a fan. But are there
bots smart enough to defeat it? You bet!

Some of my forms also have a CAPTCHA. I think it's got to be case-by-case. Do
you have something desirable to bad guys (like the signup for a new Yahoo
account, or a high-ranking blog about pharmaceuticals)? Do you have tools in
place to deal with spam submissions effectively when they do occur? Will a
bunch of bots signing up for accounts degrade service for legitimate visitors?

For example, the Contact our Sales team form definitely does not have a
CAPTCHA. The sales team will gladly sort though a pile of junk if it means one
more inbound lead. But the Post a Comment form would be an absolute disaster
without a strong CAPTCHA. A surprising amount of junk gets through anyway, in
fact. (As far as I can tell, it's actual humans in developing countries
copy/pasting into comments by hands. Blocking referrers from Google that have
the phrase "post a comment below" made a dent)

~~~
larrik
"Blocking referrers from Google that have the phrase "post a comment below"
made a dent)"

Can you elaborate? I haven't heard this technique (I don't personally have a
lot of need for spam fighting), and I'm very curious as to what you mean.

~~~
5l
Think he probably means spammers are searching for the phrase 'post a comment
below' on Google looking for forms they can spam. You'll see this search term
in the HTTP referrer header.

Edit: obviously you could just avoid using this phrase on your site instead.

~~~
larrik
Ah, that makes sense, and is rather clever.

Thanks

------
JeffL
We were getting Spam bots on our forum which uses the same registration info
as our game. We used Captcha for a bit, but also noticed a big decrease in
conversion rate, so then we tweaked the forum software a bit to require that
you have gained at least 1 level in the game before you can post to the forum
and now no captcha and no Spam.

~~~
originalgeek
One might argue that gaining a level in the game is a captcha. Though, given
your audience, it is probably not a nuisance like a traditional captcha.

------
hoop
I was surprised to find that when I pressed control-f and typed "duh" that
zero results were found in the comments.

However flawed the experiment might've been, it's obvious that if you add
barriers (e.g., CAPTCHAs) before some end goal and detract from user
experience then you decrease your conversion rate.

------
dazzla
Try mollom (<http://mollom.com/>). It uses text analysis for the most part and
only uses CAPTCHA if its not sure. Even though I don't have a huge site it
blocks a lot for me.

~~~
jarin
CloudFlare (<http://cloudflare.com>) also works great, since it does a quick
Project Honeypot check on any suspicious visitors (along with a bunch of other
good stuff).

------
dm8
CAPTCHAs were designed for identifying computers and humans apart. Initially,
they were simple tests, which required users to identify certain words.
However, computer vision is growing leaps and bounds. So these test have
become so complicated that even humans find it difficult to comprehend
CAPTCHAs. CAPTCHAs have gone from simple tests to extremely complicated ones
over last 10 years but design has never changed. We need overhaul of CAPTCHA
design. They need to be both usable and secure.

P.S. I'm working on the project to make CAPTCHAs more usable. We will have
some updates soon. :)

------
bugsy
The study he did isn't broadly valid because he only tested using a captcha
system that is quite abysmal, and for which the results were not surprising.

If he wants to increase conversion rates, he should get rid of the irrelevant
fields such as date of birth, zip code, country, gender, and check-to-agree to
legal contract.

Ha, checking the actual site, "sign up" leads to "pricing" and not a sign up
page. So much for their grave concern about losing sign ups at each stage.

On the other hand, his link to an article about including Honeypot fields is
good advice and valuable. Timestamp analysis is not so great since it requires
javascript and cookies. The more stuff you require the more users drop off.
The problem with captchas is bad captchas that are impossible for humans to
decode. Sometimes the reason these are used is because simpler captchas are
implemented in a faulty manner that allows spammers to decode them without
even having to do OCR. So the site developer upgrades to more complex captchas
rather than fix the underlying problem that is breaking the captcha security.

------
wladimir
I think it depends on the kind of CAPTCHA, how many people will give up. Some
captchas are literally easier to read for a machine than for a human. For
example, some use simple rotated text in unreadable grey on grey. Humans can
hardly read it, but an algorithm doesn't care about the contrast at all. Very
stupid. A captcha should be as easy to read by humans as possible.

------
joshfraser
Before arguing that "CAPTCHA's are a necessary evil", it pays to know the life
time value of a user/customer for your site. It's likely that the cost of
dealing with the spam would be lower than the amount of revenue lost from your
CAPTCHA-impaired conversion rate.

------
stcredzero
If it's so hard to tell the true humans from the machines (CAPTCHA) shouldn't
it be a lot easier to tell true machines from humans and human/machine
combinations? (Human/machine combination, like a person in a debugger with
some reverse engineering tools.)

Couldn't this be used to increase the security of computer systems? What if
one could extend this to be able to tell _particular_ machines from humans,
human/machine combos, and counterfeit machines. I suspect one can do this. I
have been working on this problem for the past 3 months, and I'm about to
implement it and publish it on the App Store.

------
dansingerman
Anyone able to expand on the timestamp/honeypot techniques mentioned?

~~~
Natsu
The Project Honeypot website can help you with setting up a honeypot as well
as blocking spammers other users have already detected:
<http://www.projecthoneypot.org/>

~~~
jerfelix
I could be mistaken, but I think Project Honeypot is trying to address a
different problem - harvested email addresses.

I believe the Honeypot concept that has been discussed on here is referring to
creation of a honeypot field on a web form, tempting the bot to fill it in.
Many bots will blindly try to submit something into each field, just to make
sure that they get all the required fields on their form submission.

By adding a honeypot field, and adding text that instructs humans to leave it
blank, a very high percentage of bot submissions will be detected, with few
false positives.

Furthermore, you can hide the field from humans, with CSS tricks, as others
mentioned. Make it 1 pixel. Make it hidden. etc.

~~~
Natsu
They catch comment spammers, too. It's kind of buried in the FAQ, though:
<http://www.projecthoneypot.org/faq.php>

"How does a honey pot catch comment spammers?

In addition to including specially tagged spam trap addresses, some honey pots
also include special HTML forms. Comment spammers are identified by watching
what information is posted to these forms."

Here's a list of comment spammers they've caught:

<http://www.projecthoneypot.org/list_of_ips.php?t=p>

You're absolutely right that fake fields like that are a good way to catch
bots, though, and that making your site unique is a great way to avoid being
targeted by mass attacks that go after, say, all MediaWiki sites. Of course
that doesn't help when you're big enough to be worth attacking specifically,
but it makes things a little harder for the spammers.

------
plasma
I'd love to see Google tackle this by identifying this spam and immediately
penalizing the links they are spamming.

I assume the spam is there in the first place to increase search engine
rankings; so why not update the Google ranking algorithms (for example) to
identify this spam and immediately give the targeted site (but not the site
with the spam on it!) a terribly low rating?

Then, hopefully, the incentive to spam in the first place is gone.

------
wordchute
The bottom line is that people don't like CAPTCHA, and it cannot leave a good
impression to irritate potential customers/users within the first five minutes
of a visit. Most people don't really understand what they're used for, and
they get frustrated when they cannot read them and/or get rejected. I have
have definitely been taking steps to limit my use of them or dispense with
them altogether.

------
taylorbuley
I've had similar moments: <http://fuckyoucaptcha.tumblr.com/>

------
bobds
Here's a trick not a lot of people use:

1\. Whois IP address of spam accounts.

2\. Identify bad blocks of IPs. If it's a datacenter, someone is probably
running spamming software on a dedicated server or VPS. Maybe get your hands
on some of those open proxy lists that are floating around.

3\. Use your data to prune bad accounts, throttle or block creation of new
ones, etc.

~~~
Tangaroa
A word of warning: between forged IPs, compromised systems, and formerly
hostile IP space given to new owners, an IP blacklist will eventually hit
legitimate customers. I speak from experience on this since I had the same
bright idea.

~~~
bobds
You are right about the blacklist, however, it's very unlikely you will have
legitimate users coming from datacenter IPs. I've used this trick to prune
hundreds/thousands of bad accounts in a couple of forums. You need to be
careful with it, but I think it's a worthwhile method.

------
suhail
We've found that required email confirmations can drop conversion rates by
60%. Capatchas I wouldn't worry about unless you have serious spam problems.
Seems better to detect and push capatchas on unhuman like engagement.

------
callmeed
Roughly how would timestamp analysis work? (I'm guessing a honeypot field
would be an empty text field in a hidden div or something along those lines)

~~~
BoppreH
When the client loads the page, the server sends a hash of the timestamp and
asks for the client to store it. When the client submits the form, it also
sends the stored hash.

This exploits the fact that bots don't usually run javascript or load all
resources on a page.

[http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without...](http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs)

------
apedley
I thought this was common sense. Every step needed for a user to signup in
anything will eventually make a difference in conversion rates.

------
bfe
If a popular site uses captcha and you can make it work without, sounds like
an opportunity.

------
btipling
Just create the form with JavaScript. 100% no spam.

------
GrandMasterBirt
I understand the honeypot technique, which is quite cool. However what is this
timestamp analysis stuff? anyone has a link to a decent explanation or care to
say it in a few words?

------
ck2
_They then removed the CAPTCHA, and it boosted the conversion rate up to 64%.
In conversion rate lingo, that’s an uplift of 33.3%!_

Pretty sure that 33% was bots, lol.

And they do train the bots to avoid negative-fields and timestamp analysis -
all they have to do is look for type=hidden or display:none/visibility:hidden
on the CSS

I use simple math instead of word captchas, seems easier on people.

------
p09p09p09
Pro tip: You can usually get away with entering invalid similar characters on
recaptcha when the word is really blurry. Substitute 'ri' for 'n', for
example.

I like to do this as a game, to see what I can get away with, adds some fun to
the drudgery of typing in a captcha.

~~~
5l
The really blurry word is the word they're trying to OCR; generally it doesn't
matter what you type as it'll accept it provided the other word is entered
correctly.

Of course your 'game' is hurting reCaptcha's goal of digitizing books.

------
Confusion
Compare: "security labels in clothing are a way of announcing to the world
that you've got a theft problem, that you don't know how to deal with it, and
that you've decided to offload the frustration of the problem onto your user-
base. Security labels suck, because you can't properly try some pieces of
clothing on with those labels in them, which means sales go down."

Such complaining doesn't accomplish a thing, unless you tell them about an
effective alternative. If you don't change anything about the trade-off they
have knowingly made, nothing will change. To have any chance of convincing
anyone, you at least need to explain the alternatives. Everyone that reads
this post just shrugs their shoulders and ignores you, because their captchas
effectively solve a problem they _and their clients_ would suffer from without
those captchas.

In this case, if you open with

    
    
      Using a CAPTCHA is a way of announcing to the world that
      you’ve got a spam problem, that you don’t know how to deal
      with it, and that you’ve decided to offload the
      frustration of the problem onto your user-base.
    

then I think it is very dissatisfying[1] to follow up later with

    
    
      They replaced the CAPTCHA with honeypot fields and
      timestamp analysis, which has apparently proven to be very
      effective at preventing spam while being completely
      invisible to the end user.
    

which indicates that you have no idea about alternatives for fighting spam,
apart from some measures that have 'apparently' helped in one particular case.
It's not better than someone in a bar complaining about stupid government
rules, without any idea or suggestion for how to improve things.

[1] it said 'hypocritical' here. That is not the correct word for it.

~~~
pdx
I wish I could downvote you twice.

That he offered up the word "apparently", even with strong evidence of proof
shows that he's being an objective reporter and a good scientist. I'm
disheartened that this would earn somebody ridicule here.

~~~
Confusion
The plural of 'anecdote' is not data. Simply reporting an anecdote makes you
neither a scientist nor a journalist, no matter how strongly the anecdote
supports your feelings on some matter. In the end, this is about his feelings
on captchas. He hasn't made the case that a better trade-off between fighting
spam and a higher conversion is possible; he has only suggested something
based on an anecdote. As others immediately questioned: what happened to spam
levels? 'Apparently' is not good enough when dealing with that serious
problem.

