
You probably don’t need ReCAPTCHA - ve55
https://kevv.net/you-probably-dont-need-recaptcha/
======
blakesterz
"Many developers vastly over-estimate the likelihood of customized spam."

I run 100s of small random low traffic low priority sites. Without some form
of form control, the ALL get hit with customized and random other crap spam. I
don't have decent experience with many things in life, but I can say this is
one topic I have YEARS of experience with. I've never over-estimated the
amount of any type of spam any form can get after being on the web for just a
few months. Doesn't matter how big they are or what they do.

I'm not saying ReCAPTCHA is the only thing out there or even the best, but
having an open form is just asking for trouble.

~~~
sbov
Have an input element that can't be seen. If it has something in it, ignore
the submit. Works for all my sites so far.

~~~
charrondev
Doesn’t work as soon as you’re big enough to target.

The company I work for makes a SaSS forum product, and while we do have
multiple spam prevention methods (akismet, stopforumspam, honeypot, a hidden
input), there’s enough stuff out there that has targeted our platform that a
Recaptcha on the registration form is needed.

We haven’t need it on any other forms yet though. After registration it’s all
handled by the other methods and various moderation tools.

~~~
mattigames
Did you try randomizing the 'name' and 'ids' of the inputs? (including the
invisible one)

~~~
mobjack
I really dont know how well that will work against a dedicated attacker.

I am much more confident in ReCAPTCHA of stopping bots compared to any roll
your own solution.

I dont want to hope that an alternative is good enough for my needs. I want
the best when it comes to protecting my site.

Any alternative needs to have a proven track record and support to make
consider replacing ReCAPTCHA.

~~~
briandear
So you force your users to consent to sharing all of their data with Google?
That’ll teach ‘em.

~~~
shaki-dora
"all their data" is a bit much, isn't it? ReCAPTCHA gives Google exactly one
datum, namely the user's visit to the one page it is on.

And I would even hazard a guess that the TOS specify that Google will not
retain/link that information, considering that's how Analytics is run.

~~~
ComputerGuru
No, that's how it used to be. Now with ReCaptcha v3 the recommend you load it
on all your pages, not just the forms you are trying to protect, so they can
predict friend vs foe more accurately.

~~~
inferiorhuman
So how does one block this?

~~~
sli
Firefox and uMatrix[0], and then never go to those sites again, because you
won't be able to use them anyway. Whether or not you want to contact the owner
of the site and tell them what's up is up to you.

[0]: [https://addons.mozilla.org/en-
US/firefox/addon/umatrix/](https://addons.mozilla.org/en-
US/firefox/addon/umatrix/)

------
julianlam
Most websites probably don't. If you're one of those people, congratulations!
Stick a honeypot input into your form and call it a day.

However, if you're working on anything with non-insignificant amounts of
traffic, you'll get hit with some customized spam.

I've been dealing with these spammers, and if you do nothing, your forum will
be filled with korean ads. We implement Akismet, StopForumSpam, Project
Honeypot, and ReCAPTCHA, with the latter being the most effective (sadly). I'm
pretty sure some of these spam agencies have customized tooling to handle
NodeBB (they're using websockets to submit the posts, instead of HTTP POST).

Outside of these strategies, the most effective by far is reputation
restrictions. Post queues if you're new or don't have enough upvotes, etc.
However it does require manual effort, of course.

Would definitely love an alternative.

~~~
ordu
_> We implement Akismet, StopForumSpam, Project Honeypot, and ReCAPTCHA_

Did you tried some techniques from the article? Like hidden form fields,
simple javascript checks or simple captcha?

~~~
sroussey
I used to run a forum hosting service. After a while you try everything.
Honeypot fields, incorrect field names, etc. You will notice things like CSRF
tokens generated by one IP and used by another. When stuff fails, they send
100 real people and record what worked, and it fixes their script. It’s all
pretty automated. IP reputation can be helpful for some players, but most
snowshoe.

Spam attempts would grow exponentially. So every time we cut it down by 90%
via one of these tricks, it only gave us a bit of time.

None of this stops the determined troll though.. they can have all day to
manually add offensive content. Shadow banning was good for this (1999) and
group shadow banning was the best (bifurcated forum posts so all the banned
people saw each other, but no one else did). Ah, memories. So good.

~~~
dennisgorelik
> all the banned people saw each other

What did happen with that strategy?

------
cardine
In my experience, the biggest issue I run into is targeted botnet brute force
attacks.

In cases like these, someone loads up a huge botnet, a downloaded list of
hacked usernames and passwords, and tries every single combination hoping to
find a reused username/password combination.

In these cases, it is almost always extremely targeted. Log correlation has
helped quite a bit, but it is still very painful since they alternate IPs with
every request.

Automatically adding a blanket ReCAPTCHA on all login pages during a
distributed brute force attempt is one of the few things that actually stops
an attacker like this with minimal negative consequences.

I'm sure it is frustrating to users, but I think service disruption from what
is effectively a DDoS is a worse user experience.

~~~
chii
and you can easily count the number of failed attempts from a particular IP,
and just show captcha for those over X failures, rather than every login.
Normal users don't fail _that_ many times, and so are non-the-wiser.

~~~
throwawaymath
As the commenter said, they rotate IPs. It is not that easy. I've also been on
the other side of a sophisticated attack like this. The really savvy
adversaries do the following, at least:

1\. Rotate through several thousand to several hundred thousand noncontiguous,
geographically distributed, residential IP addresses,

2\. Associate each IP address with a single user agent and suite of cookies,

3\. Associate each IP address with a particular target username,

4\. Only attempt a few incorrect logins at a time, and a somewhat random
(albeit realistic) number at that, within a given time interval,

5\. Use random, apparently human delays between successive requests,

6\. Issue requests using extremely high fidelity simulacra of web browsers,
customized to the sequence and structure of HTTP requests on the website.

When the stakes are high this is the kind of opposition you'll get. Bank
account takeover, social media account takeover, ticket scalping, automated
sneaker buying, financial research, market research, etc.

Recaptcha introduces unpleasant user friction, but it usually works well. To
invert a popular turn of phrase, it makes stopping simple attackers easy and
hard attackers possible. The most sophisticated attackers will still lease
reputable Google accounts and mechanical turk time to bypass Recaptcha
challenges, but it will be expensive for them.

Technical sophistication is only one dimension of this game. The other is
making adversaries spend more money than they can gain from being successful.

~~~
mirimir
> automated sneaker buying

???

Please ELI5. I mean, why are sales bad, even if automated? Are they using
stolen cards?

~~~
mcpeepants
Most likely related to high-demand, limited run sneaker "drops", which people
then resell on the secondary market. Sneaker-scalpers, if you will. It's a
problem because it prevents legit buyers from getting in on the sale.

~~~
mirimir
Why don't they just sell more of them?

Or as ALittleLight says, auction them?

Edit: OK, I know, limited editions. Like numbered and signed prints. But it's
arguable that people who want them the most will get them. Even if it's just
for resale. Doesn't seem like the seller's responsibility.

~~~
chii
> Doesn't seem like the seller's responsibility.

sounds like to me that the seller doesn't want the scalper to sell outside the
official channels imho. It might dilute the brand as well.

~~~
inetknght
Exclusivity agreements are anti-competitive tbqh

The real problem is supply. Popular tickets are scalped because there's only
so many tickets. Then unpopular tickets are scalped because it was so easy to
scalp the popular ones.

There's only so many sneakers that can be made: making more chews up the
supply chain for something which isn't _truly_ being consumed.

------
hawaiian
ReCAPTCHA has crossed into the domain of cattle-corralling users and thus
should be considered harmful. If the system decides it doesn't like you (most
likely because you're "too anonymous," but you don't really know) you will be
presented with slower-loading images to click and more click-all-the-things
rounds. To pretend this is about slowing down bots is disingenuous as best. On
top of that, usage of ReCAPTCHA perpetuates the very problems I just
described.

Why isn't there a solid alternative offering yet?

~~~
bduerst
>Why isn't there a solid alternative offering yet?

The latest version of recaptcha doesn't even prompt users. It loads on the
front-end and uses a scoring system. It's likely you've used it but didn't
even know because it's invisible.

It's the older implementations that have the slow loading images.

~~~
gbear605
On Google Chrome, with adblock on, without my Google account signed in, in a
new incognito tab with no extensions, I have the experience of it being
invisible. When I go back to the same site on Firefox, logged onto my Google
account, no adblock on, no privacy options on, I have to identify dozens of
photos.

As far as I can tell, it just checks to see if your browser is Google Chrome
to give you your score.

~~~
bduerst
Again, that's still not the latest version of recaptcha, which never prompts
users for photos.

------
amelius
I think we have to take a step back, and consider why we want to separate
humans from computers in the first place.

Humans can do a lot of bad things that computers can do. Think of armies of
low-wage people in Asia, that are paid to click on ads, spread spam, or write
reviews.

And also consider that computers can actually do _good_ things, for example,
allowing humans to automate their work on certain websites, or providing
better accessibility for certain users.

Therefore, instead of introducing CAPTCHAs, why not focus on the actual
threats. If you want to protect against spam, then build a spam filter. If you
want to prevent bots from bulk-downloading your data, then build a rate-
limiter, etc.

~~~
Tepix
But that's something captchas are used for. Prevent fake signups.

~~~
aflag
It doesn't do that, though. Humans also create fake accounts. It does make
mass creation of fake accounts impractical, though.

I solved that in past by actually charging for my service. I think the
internet would benefit from having more paid content and less ads driven
stuff.

One thing that captchas do protect from is brute force attacks on user
passwords. Although there are other possibilities (like making the connection
slow after a number of attempts).

~~~
m-p-3
This is why you need a multi-layered solution to fight spam. Captchas reduce
the amount of fake accounts, which can then be taken care of by the additional
layers.

It's easier to fight against a bigger opponent (botnets, etc) if you mitigate
their superiority in number first.

------
Blackthorn
Literally none of those alternative methods listed worked on my moderate
traffic wiki. Recaptcha (and before it went away, identify the dogs or cats
from Microsoft) is literally the only solution that stopped us from getting
spammed. I wonder how much experience the author of this article really has in
this domain.

Recaptcha has saved the internet as far as I'm concerned.

~~~
tszyn
For my small wiki, refusing all submissions with external links eliminated
virtually all spam. Yes, it's drastic, but in my case external links were not
essential. I still use ReCaptcha to cut down on spam account signups.

~~~
Blackthorn
We do that as well, outside of a small whitelist of allowed external links.
Turns out it's not enough and we still need recaptcha for a few reasons.

We have a small set of anonymous edits every day, which go through recaptcha
to be allowed.

------
seancoleman
I fought an interesting implementation of customized spam on a web app
registration form a few months back. Suddenly, every 2-3 seconds, we would get
a sign up from a random email address @qq.com (it really clogged our sign up
Slack channel). I didn’t want to go full CAPTCHA so I dropped in a simple
honeypot and the spam stopped for a good 4-5 hours. Then it picked right back
up like normal. I then implemented a randomized honeypot, e.g. <input
name=“lastname478482”> and again, it ceased immediately and picked up a few
hours later. Finally I just blocked all submissions with emails @qq.com and it
stopped completely and hasn’t returned months later.

Sometimes I wonder if there was a real person on the other end writing code to
combat the code I was writing at the same time, and finally gave up on the 3rd
iteration.

~~~
Washuu
There is someone else on the other end. I had one of them previously get on
the forums I worked on and complain about how quick I was to blocked them. I
wrote an entire system for learning the patterns and automatically blocking
them as they occurred. Eventually I determined that the vast majority of the
IP addresses originated from Bangladesh so I just banned the entire country
from accessing the web sites. That was the only solution, unfortunately for
those legitimate users there, that made the spam stop for good.

~~~
frosted-flakes
I have to ask, why do people do this? What do they get out of it?

~~~
Washuu
I assume SEO targeting and associating phrases with various products. For
example, people on Reddit will name and shame by dropping people's names
accused of crimes next to the thing they are accused of in a complete
sentence. This is an attempt to promote the relevance of those two facts in
Google Search to each other.

~~~
michaelt
Doesn't the nofollow attribute [1] work to remove that incentive? IIRC that
was the intended purpose of nofollow?

[1]
[https://en.wikipedia.org/wiki/Nofollow](https://en.wikipedia.org/wiki/Nofollow)

------
3xblah
"ReCAPTCHA relies extensively on user fingerprinting, putting emphasis on the
question of "Which human is this user?" rather than the ordinary "Is this user
human?". "

Classic example of collecting more information than what is needed.

~~~
comex
Depends. The traditional techniques that automatically establish humanity
without determining identity are more and more vulnerable to AI, so the only
way to keep CAPTCHAs effective is to integrate identity. From that
perspective, ReCAPTCHA isn’t collecting more than “needed”. On the other hand,
the cutting-edge cryptographic technique used by Privacy Pass does supposedly
preserve anonymity by making it impossible for CloudFlare to link “who solved
the CAPTCHA” to “who wants to access X site”, but it still involves
information being collected in some form.

~~~
mcv
> _The traditional techniques that automatically establish humanity without
> determining identity are more and more vulnerable to AI, so the only way to
> keep CAPTCHAs effective is to integrate identity._

Is it? What's stopping AI from developing an identity in the eyes of Google?
An AI that might behave exactly like Google's ideal user. Searches random
stuff on their search, looks at and clicks their ads, logs in to various
Google services, and when it encounters a ReCaptcha, it clicks the "I'm not a
robot" checkbox.

At some point, caring about privacy might turn out to be the distinguishing
feature of humans.

------
partlyFluked
I've noticed a number of major cryptocurrency exchanges using this slide
puzzle as a form of captcha [1]. I dont exactly know how it works but I assume
the jittery nature of a human sliding a mouse is enough to discern the bots
from the nots. Are there any major downsides to this form of captcha that I
may be missing? [1] [https://www.geetest.com/en/](https://www.geetest.com/en/)

EDIT: I now see the article does actually mention this, though I still do
wonder how far the fingerprinting goes.

------
jknz
Requiring to solve ReCapcha __after __may be more sensible than the usual use
case where ReCapcha is required before any interaction.

Let the user create account, let the user create first comment/post, save it
somewhere but hide it, then require ReCapcha to make the post visible and
activate the account.

The issue with requiring ReCapcha __before __is that the website owner will
never know whether a legitimate user was turned away or whether it was spam.

~~~
dorgo
And how does the timing change things? You want to look at first
comments/posts to determine whether it was spam or not?

~~~
jknz
After the post is saved (but hidden until the user solves ReCapcha) you can
look at the database and analyse what's happening. Did ReCapcha turned away an
interesting post by a legit user? What is the ratio of spans versus legit
users? Etc.

If you turn both spammers and annoyed users with ReCapcha before any
interaction,you will never know how many were legit and you won't be able to
manually accept interesting legitimate content written by a user that is
annoyed by Recapcha.

------
saltyhiker
I had this same thought in 2013 and created this library which uses a few
simple tests to protect from comment spam. It definitely could be
improved/tuned (and oof this code I wrote is baaaad), but for many sites it is
good enough.
[https://github.com/mccarthy/phpFormProtect](https://github.com/mccarthy/phpFormProtect)

------
sb8244
I used to use a script to combat uncustomized spam. A hidden input would get
populated with my birthday in hex and then checked on the server. Every JS
based client would be able to use the forms and we went back to math captcha
for noscript.

Literally thousands of spam comments per day would stop instantly on
deployment. We put this on over 400 sites and never had an issue of customized
spam.

Definitely agree with recaptcha not being necessary. However, it's probably
needed for popular sites which you're more likely to use. Do we see more
recaptcha because of that?

------
beiller
Here is an idea I thought of for a captcha. Render your webpage and form and
include a hidden "password" field. Use a javascript hashing algorithm to hash
the password on the client browser (preferably a very slow one that uses a lot
of CPU). When you submit the form check the calculated hash the client did
with a pre-calculated hash on the server. If they don't match reject the form.
You can pre-generate a list of password/hash combinations to avoid slowing
down your servers. Could it work?

~~~
norrius
It could work... If you want to set minimal system requirements to visit your
website.

It will also annoy users of password managers with auto-filling capabilities.
"password" is normally used for actual passwords.

Besides, nothing stops the attacker from replacing your code with a faster
implementation.

~~~
mnw21cam
There are password hashing algorithms out there (like bcrypt) that
specifically take a long time to compute using the fastest method that we can
think of.

------
pornel
I've developed an alternative to captcha for blog comment/forum spam a while
ago.

There are many methods that are easily dismissed with "but not all spam is
like that" and "someone _could_ work around it easily":

• blocking of links (if you don't need them, or in fields that are not for
them).

• blocking of obviously spammy keywords (or bayesian filter)

• invisible fields and syntax to trip up dumb implementations

• requiring JS, properly-functioning cookies

• blocking of IP ranges that belong to VPS providers and a couple of 3rd world
telecoms that allow spam

Each of them is surprisingly effective and combined they block 99.8%.

You really shouldn't flatter yourself thinking that a spammer will even look
at your page. They have literally millions of sites to spam, and they couldn't
care less if they get yours or not. A bot will find a 1000 other sites to spam
quicker than it takes a human to click "View Source".

Most of spam is done by amateurs who take shitty off-the-shelf spam software,
seed it with a target list copied off some forum, and run it on a couple of
spam-friendly or incompetent VPS hosts. By volume, this is the vast majority
and it's very easy to block.

------
arbuge
> The comment form of my blog is protected by what I refer to as “naive
> captcha”, where the captcha term is the same every single time. This has to
> be the most ineffective captcha of all time, and yet it stops 99.9% of
> comment spam.

This is what we did on one of our sites. 5 minutes to implement it using a few
lines of code. Same result, couldn't be happier.

------
mac_was
“It’s worth noting how much easier it is to successfully solve ReCAPTCHAs when
the user is logged into their Google account”. Well to me it makes absolute
sense as Google knows that the logged in user is a human. This article is just
following the current trend all Google is bad.

~~~
lucb1e
Right, someone can't register a google account and make their spambots bypass
CAPTCHAs with it. Just knowing that a human was present once doesn't mean they
are always human. Googlen probably lets you pass easily once or twice in a
certain amount of time but keeps track and starts giving you more difficult
solves as time goes on. But then, someone might register a thousand google
accounts and rotate through them to allow cool off time, so I'm not sure how
this is handled. It is not a given that rewarding people with a google account
is a good thing.

------
nsajko
I understand when Recaptcha gets used before registration forms, but why, oh
why, does Discord do it before any login?

If I know my email address and password I should be able to login without
being logged in to Google as well ...

~~~
jhabdas
On the plus side you've likely become fairly adept at identifying street
signs, light posts, bicycles, storefronts, bridges and buses in order to
access Discord, especially if you were trying to access via Tor to gain a
little privacy.

~~~
mikro2nd
Can't speak for OP, but personally I simply stopped using Discord.

~~~
jhabdas
Ditto. Don't forget to close your account as it will not self-destruct.

------
mvkel
We did everything to avoid using CAPTCHA because spammers on our platform
could defeat CAPTCHA anyway. Oftentimes, they're real, actual humans.

Hidden inputs, honeypots, etc. None of them worked long-term.

Our solution: focus on the content.

All content now runs through the following filters, and takes care of 99.9% of
spam: 1) Auto-approve any and all content immediately, unless: a) The content
contains any HTML (url included). If it does, send it to a moderation queue.
a1) If the submitter had submitted content previously and gotten approved,
auto-approve their content, even if it contains HTML.

------
Wowfunhappy
Aren't hidden form elements are a major issue for accessibility?

Admittedly, so is ReCaptcha, so the trade-off may be necessary as much as it
sucks. But, it's probably at least worth a mention?

~~~
dylanz
They are definitely an issue for accessibility. I make sure to put "Hey! Don't
put anything in this field!" as a placeholder.

~~~
Wowfunhappy
That sounds like something bots could easily adapt to if the practice become
widespread.

~~~
ShamelessC
It's already widespread. But, it requires customization to overcome for many
spammers. Depending on the popularity of your site, your threat model may
require you to do more. But it is still a useful tool.

------
scarface74
Microsoft’s implementation is the worse. I sometimes have a hard time
deciphering the captcha. Why do they need that in an iOS app? Are robots
emulating people from an iPhone?

~~~
anchpop
The iPhone app probably communicates to some servers over an API of some kind
- there's no reason someone malicious couldn't pretend to be an iphone and
communicate over the same API

~~~
arlk
and no reason someone malicious couldn't click-farm iPhones either.

[https://gizmodo.com/thai-click-fraud-farm-busted-using-
wall-...](https://gizmodo.com/thai-click-fraud-farm-busted-using-wall-of-
iphones-to-c-1796057368)

~~~
scarface74
Mechanical Turk style processing never ceases to amaze me.

------
progval
> Uncustomized spambots are also so unintelligent that they do not correctly
> answer simple questions such as “What is 2+3?”, or “what color is this
> website?”.

I found that some uncustomized spambots are able to solve simple math
challenges like this one.

And the color question is bad for accessibility.

Many of the suggested alternatives are also terrible for accessibility,
because they require solving a visual puzzle, without an audio alternative.

~~~
mcv
> And the color question is bad for accessibility.

That's a valid concern, but not a reason to use ReCaptcha, because ReCaptcha
is worse for accessibility.

------
nroets
Websites should use ReCAPTCHA, but set a cookie that includes sufficient
(pseudo) random numbers. Keep a copy on your server.

As long as a browser provides a cookie that is present on your server, there
is no need for another ReCAPTCHA.

If the user misbehaves, e.g. too many wrong guesses of password(s), remove the
cookie from your server.

You can even email the user the cookie as a url e.g.
example.com/?cookie=12345678901234567

------
downandout
One thing you can do to slow down bots is make it computationally expensive to
sign up. Utilize a JavaScript-based proof of work library. Normal users can
breeze right through, but somebody trying to create thousands of accounts will
find their operation grinding to a halt.

~~~
buboard
coin miners would be an option, however adblockrs would block them and
antiviruses consider them malware

------
_Codemonkeyism
Biggest pain with ReCAPTCHA is since I've switched to Brave. Sometimes it's 4
to 5 ReCAPTCHA forms coming up until the system is satisfied I'm a user.

------
MagicPropmaker
I wish Hacker News didn't use it. It's a PITA logging in! I don't see it when
I'm in the U.S. but when I'm overseas...

~~~
shakna
From a French IP that usually gets flagged by everybody as potentially a
robot, I see exactly 0 JS on the HN login form.

Are you sure it is HN that uses ReCAPTCHA?

~~~
ve55
I've seen multiple comments suggesting HN uses ReCAPTCHA, but I have never
encountered it myself, and I even have Javascript disabled and login through
'anonymous' IPs such as tor, so I'm unsure what these users could be doing
that is 'worse' to trigger ReCAPTCHAs.

If most users don't even know that ReCAPTCHA is used, that's a good sign that
it is being used as little as possible, though.

~~~
jimktrains2
I wonder if it's from CloudFlare and not HN directly.

~~~
briandear
It is from Cloudflare. They love reCaptcha. It’s ok to give free user data to
Google if it means less work for them.

------
dearrifling
Although the Badge pattern makes it flexible to add functions with different
badges, this one doesn't have the awkward interface:

[https://godbolt.org/z/25k1DG](https://godbolt.org/z/25k1DG)

It just makes use of the non-transitiveness of friend.

------
tjpnz
I don't tend to have too many issues with ReCAPTCHA except when I'm travelling
out of the country - then it makes my life really hellish. Given everything
Google knows about my travel arrangements you would think they might be able
to connect the dots.

------
donatj
In my experience, the vast majority of spam bots don’t run JavaScript, so
simply setting a hidden input to a specific value on key down and checking for
said value server side has prevented 99%.

~~~
aventrix
Seems to me this would interfere with the accessibility of the website for
those that use screen readers.

------
jmpeax
x = exp(-i)

~~~
mnw21cam
x = cos(1) - isin(1)

------
techslave
i had to give up after the 10 or 12 anti google paragraphs. after reading
that, this website certainly doesn’t need recaptcha because i’m going away!

there were a couple of feints as if he were about to get to the content, then
ha! back to diatribe.

