Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Doesn’t work as soon as you’re big enough to target.

The company I work for makes a SaSS forum product, and while we do have multiple spam prevention methods (akismet, stopforumspam, honeypot, a hidden input), there’s enough stuff out there that has targeted our platform that a Recaptcha on the registration form is needed.

We haven’t need it on any other forms yet though. After registration it’s all handled by the other methods and various moderation tools.




Please just don't use the bouncing ball that Dropbox made me use once. It was the first time my lack of athleticism prevented me from a signing in.


I’ve not heard of this one before. Wouldn’t it cause serious problems for accessibility?


Almost certainly.


What is that? This comment is the first Google result for dropbox bouncing ball.


I tried from finding it myself, and I've only seen it once. I was logging into Dropbox via Chrome on my Android phone, and I must have done something to trigger captcha spambot hell. I just tried triggering it again and was not able to do so.

The ball had an animal in it and I was asked to bounce the ball, causing it to rotate. I had to bounce the ball with just enough force to get it to land so the animal was positioned upright. After several failed attempts, I gave up.


Did you try randomizing the 'name' and 'ids' of the inputs? (including the invisible one)


I was preparing a response here, but many of the other commenters have covered it.

I recently spent time ensuring our Auth pages’ HTML could be easily cached outside of our application servers. They were a common target of DDOS attacks because we were generating a unique nonce for CSRF protection.

Randomizing form field names does not defeat a targeted attacker (and we have definitely been a target), prevents HTML caching, and will prevent auto filling fields by browsers and password managers.

Additionally it will be terrible from a usability and accessibility standpoint.

It’s trivial to target a form field by the text/label around it so those would need to be randomized as well.


> Randomizing form field names […] will prevent auto filling fields by browsers and password managers.

I wholly agree that this would not help, but for the sake of completeness, I want to point out that <input autocomplete=""> [0] is designed to solve this, by decoupling input field names from their intent.

But Chrome is playing dumb about it [1]. And of course, the spambots will just adapt to parse the autocomplete info…

[0]: https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes...

[1]: https://www.reddit.com/r/programming/comments/ar1qj1/chromiu...


>Additionally it will be terrible from a usability and accessibility standpoint.

ReCaptcha is by definition terrible from a usability and accessibility standpoint too, just has all the privacy problems too.


> and will prevent auto filling fields by browsers and password managers.

I would MUCH prefer the recaptcha over this!


Your browser probably isn’t smart enough to autofill the ‘comment’ part of a guestbook or the ‘body’ of an email.


I really dont know how well that will work against a dedicated attacker.

I am much more confident in ReCAPTCHA of stopping bots compared to any roll your own solution.

I dont want to hope that an alternative is good enough for my needs. I want the best when it comes to protecting my site.

Any alternative needs to have a proven track record and support to make consider replacing ReCAPTCHA.


> I am much more confident in ReCAPTCHA of stopping bots compared to any roll your own solution.

I'm much more afraid of ReCaptcha blocking bonafide users. It's a harmful obstacle that punishes legitimate users for not sharing as much data as possible with Google.

Even if you really need a captcha, there are better solutions out there.


Google really, really does not like people who use Firefox and/or a VPN.


> I am much more confident in ReCAPTCHA of stopping bots compared to any roll your own solution.

I am as well. We enabled Recaptcha on one site and had spam signups drop by 99%. Unfortunately, regular signups also dropped by 20% because people give up when they hit Recaptcha and don't absolutely, seriously need what it's protecting. To us, joining the arms race against the spammers (which, so far, we've easily won) was much more profitable than turning away legitimate customers.


> I really don't know how well that will work against a dedicated attacker.

>>> You probably don’t need ReCAPTCHA

Probably being the keyword, because you probably aren't a big enough site for a dedicated attacker. Or for a dedicated attacker to be an issue.

And really, let's s/attacker/bot/g. Not every bot is a problem. Not every bot is an attacker, i.e. someone doing something malicious.


For 20$ you can solve a few million ReCAPTCHA's using Buster and a paid-for STT engine. Atm Buster works about 95% of the time, so you'd see significant amounts of spam even with ReCAPTCHA.


I'd love to pay $20 for a firefox extension that makes this problem go away. I use lots of privacy extensions on Firefox, and those Captchas are annoying as hell. Tor is even worse.

Can you please provide a few ready-to-use links?


Just install Buster, it's free on the Mozilla Addon Site, you can set it to a STT provider other than Google, which I recommend since they seem to detect using their STT engine now.

You can pay for Azure and other STT engines to solve it for you an dthe results are usually a bit better.


When you consider what “the best” means, please include the value of not feeding your users into Google’s gaping maw.


Some have never cared or either stopped caring altogether because of the average user's apathy. Not that I agree with it, but I can see why someone would ignore that con in favor of the pros.


Most of the times when I encounter recaptcha I don't even bother filling it out - it's a huge pain in the ass and apparently I look scary because I always have to jump through way too many hoops before I'm allowed into the crappy walled garden that it's probably protecting. I can't be the only one that feels this way, that is something you should consider when picking a captcha solution as well.


How much does ReCAPTCHA's aggressively targeting non-Chrome and/or privacy-enabled browsers and making completing captchas exceedingly difficult factor in your decision?

Do you want your site "protected" from those users, too?


So you force your users to consent to sharing all of their data with Google? That’ll teach ‘em.


What's an alternative that works at scale, though? It's easy to say "this is bad for these reasons, don't use it" while ignoring that there's not really better options once you get targeted.


I used a bunch of randomized questions with single word answers (case insensitive and typo tolerant) and hidden fields for years now.

You can use common knowledge or simple ambiguity of language. You can use simple math arithmetic, written in properly obfuscated html. and randomly generated on each page load. You can use custom question about the content of the article (helps with informed answers).

On a small blog of mine just one question with one answer on the contact form prevented all spam for over 5 years already although it would be trivial to exploit in a targeted attack.

Targeted attacks are rare unless your captcha protects a juicy target that is worth a targeted attack at some point.


Yeah but to be fair he did ask for alternatives in case you are targeted. It happened at work here too, someone with a grudge and a botnet waged a multi-month targeted campaign, and reCAPTCHA was the only thing that helped.

Are there alternatives in situations like this?


To clarify, I do think that this post gives good alternatives because most spam is not targeted. However, you must do something like this if you're a big site or a small site who pissed someone off


The reasonable thing to do would be to initially create challanges with multiple levels/difficulties so you can quickly change the mechanism when you are really targeted.

For my personal blog I managed to be spam free with a simple question/answer pair for 5 years. Took me a minute to implement and leaves my user data where it belongs.


"all their data" is a bit much, isn't it? ReCAPTCHA gives Google exactly one datum, namely the user's visit to the one page it is on.

And I would even hazard a guess that the TOS specify that Google will not retain/link that information, considering that's how Analytics is run.


I am fairly certain that ReCAPTCHA does many things behind the scenes. It probably is using webGL and many other browser features to "fingerprint" your browser, OS, graphics card, sound card, etc. This is simple by just for example drawing some polygons in the background then reading the frame buffer, because different graphics cards / drivers may output different buffers slightly. Then it can store that fingerprint to show you less ReCAPTCHA in the future if you successfully pass the first one. This will also link that fingerprint with all other websites which use google analytics and now they have your full browsing history. The TOS may specify they are not _sharing_ that information, but they can do whatever they want internally to fully mine that data.


Well on top of that you train the image recognition algorithms of a tech giant. So for them it is a win-win strategy: user data and free labour


ReCAPTCHA basically looks up your google account and checks your browsing history and if your IP looks "spammy" to determine if you are a bot. The actual challenge is just a data mining operation and isn't meant to actually prove if you are a human because if it has determined you are not a human it won't let you through even if you do 10 challenges correctly.


Theoretocally it might use signals when you are logged in, but Recaptcha also works when you are not logged into google. So, not really.


Just because login isn't required, dosen't mean its not recorded.


No, that's how it used to be. Now with ReCaptcha v3 the recommend you load it on all your pages, not just the forms you are trying to protect, so they can predict friend vs foe more accurately.


Or rather keep Google tracking cookies alive forever and updated.


So how does one block this?


Firefox and uMatrix[0], and then never go to those sites again, because you won't be able to use them anyway. Whether or not you want to contact the owner of the site and tell them what's up is up to you.

[0]: https://addons.mozilla.org/en-US/firefox/addon/umatrix/


It's trivial to detect element visibility, this just doesn't work in bigger sites.


You are right it doesn't work but is not trivial at all to detect visibility, there are millions of ways to hide an element using CSS, for example a rare one (without using "opacity", "display" or "visibility") is: transform: scale(0.00001);


The only way I see this being useful is if you do this for one or more elements as well as encrypt the name of every input element and also randomize the layout enough that they can't easily use CSS selectors or regular expressions to fine the relevant inputs by page location.

I can and have defeated forms that tried to do all of those things very easily in the past.

Keep in mind that if you randomize across a few variations (i.e. 4-5 page layouts), that's easily discerned if you pull the page source down 20-30 times, doa complex diff, scrub out obviously random strings, and check the total unique variations you're seeing.

That may seem like a lot of work, but consider that if you don't do it all at once, but instead roll out small change after small change, the person or people using it are not weighing to cost to do everything required to bypass it compared to finding another open mail form, but the cost to bypass just the new fix you put in place. Also, they might think it's fun doing so...

And on the site dev's side, they can just choose to outsource it to a CAPTCHA (not that there aren't services to easily bypass CAPTCHAs at scale at sub-cent per CAPTCHA rates, see https://anti-captcha.com/).

Note: To forestall any assumptions, I wasn't doing any spamming or helping spamming in any way.


You're not trying to make your site absolutely bot-proof. Someone deliberately targeting your site can figure out any such measures. (You want legitimate users to do so.) You're just trying to throw in enough friction that most common drive-by scripts won't succeed.

It's a "don't have to outrun the bear" situation, make yourself just difficult enough that some easier target gets snagged instead.


> It's a "don't have to outrun the bear" situation

If everyone else is incorporating recaptcha, they're all running faster than you. Even with bypass services, cheap is not the same as free, especially at the scale spam runs at. I imagine a mail form that obviously doesn't incorporate a CAPTCHA is going to garner some attention. It might work for weeks or months if it's not being paid attention to, so that's probably worth them spending a few minutes looking at.


I use a simple english question with a five letter word as an answer sucessfully for 8 years on a contact form now. The text isn’t obsfucated, the answer is always the same.

This is as primitive as it gets. I didn’t get a single spam mail in all that time.

The idea is not to outrun your competition, it is to become a special target that would demand special work to successfully get into. Bots are dumb as long as the humans behind them don’t give them a hint how to deal with your site.

And if you’re really that valuable of a target, you can step it up a notch or even switch to google’s data collecting solution.


I mostly agree... I used to work for a classic car website, and in that case, we dealt with a LOT of comment spam, and scams that were out there. A lot of it is actually individual people, doing actual work to get past. We also did see a lot of custom bots, etc. It took a few different approaches and even recaptcha wasn't always the best option, but it did help with most of the non-scam traffic.


> Even with bypass services, cheap is not the same as free, especially at the scale spam runs at.

Spam doesn't scale on a small site. Say you can absolutely fill a small site with spam comments to the point that 99% of comments are spam. Very few people visit the site (it's small after all). Fewer still read the comments. Virtually none of those will click on the (usually obvious) spam links. And still fewer will buy, making you money. If you spend 2 hours customizing your spam script to circumvent anti-spam measures on a small site, you might as well flip burgers at McDonald's, you'll make significantly more money.

Spam works at scale only when you're not customizing. I'm involved with quite a few small to medium and a few larger sites (the largest getting around 4m PI/month) and though we use WP we get virtually no spam because of trivial deviations. We get an immense amount of attempts though. The little we do get is obviously manual spam: in the correct language, with content targeted to the individual page/post content (beyond "very interesting article, I wrote about the same" one-size-fits-all).


The spam I see is trying to add little bits of pagerank all over the place.


What about the wikipedia solution for this: rel="nofollow" ?


The spammers don't care; they spam anyway.


That also scales only if you don't customize. A low-value backlink that will usually be removed in the near future isn't worth an hour or two of a developer's time.


do you eliminate all spam that way? i get signup SEO spam even with recaptcha & stopforumspam.


No it does not. Some creeps through the registration system, which is why we have additional trust measures afterwards.

On my own site I see 1 or 2 spam posts a week although I get the feeling it’s real people doing the registration. They sign up, make 1 comment, get reported very quickly, then banned.

We haven’t had to make our signup/registration system that strong in of itself though, because most of our largest clients end up using some SSO method exclusively and will have their own prevention methods.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: