Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: A Good Alternative for ReCaptcha?
368 points by Ayesh on May 31, 2019 | hide | past | favorite | 199 comments
Are there any good alternatives for reCaptcha? It has come to a point that traditional text/sound captcha challenges are trivial to bypass today, and the more distorted we make them, the more challenging they are for humans.

Google went Sparta with their reCaptcha be, and nobody in their right mind should add a script that fingerprints users, specially from an adtech company.

What solutions do you use to thwart bad boots from submitting your forms and automating things where it should not have been?




For bots which are not specifically targeted at your page i simply add an invisible form element named url. Bots _LOVE_ to share their viagra urls. Any request which submitted an url is discarded.

This trick is simple stupid and should not work but somehow the simple spam bots have not improved.

This does not work for sophisticated bots (never met one) or the ones programmed specifically for your site (happens very rarely).


Be very careful how you do this, unless you want to exclude blind users. I've seen a blind user have an online form silently fail at them because they filled in a field that wasn't visible. Using display:none applied indirectly via CSS is probably reasonably effective against bots and won't interfere with screen-readers.


Are there tools or guides to help test how websites "appear" to people with disabilities ? It is difficult to design something if you don't have an idea of what will be the outcome, but I wouldn't know which software is used by (e.g.) a blind user, let alone how I would use it.


There are lots of options for automated accessibility scans, but there are many common accessibility issues that aren't feasible to test for with automation.

My team at Microsoft recently open sourced a tool called Accessibility Insights (https://accessibilityinsights.io). The web version is a chromium extension that includes both automated scans and also a guided assessment option that leads you through how to test for and fix the stuff that has to be found manually. This is the tool Microsoft pushes its own teams to use as part of their release processes.


Simply try to browse the page without using mouse, just the tab key. The next step would be to use screen reader, NVDA[1] (Windows) or Apple VoiceOver (MacOS). There are automated testing tools, but they don't cover the whole spectrum of problems. Nevertheless you can try

* WAVE (Chrome or FF plugin, https://wave.webaim.org/extension/)

* AXE (https://www.deque.com/axe/)

* AChecker (https://achecker.us/checker/index.php)

* Funkify (Chrome plugin, tries to emulate various disabilities)

* Lighthouse in Chrome Dev Tools also checks some accessibility rules

The full list of things that you need to take care of: https://www.w3.org/TR/WCAG21/ (it's huge I know, it takes 5-7 days to test everything from this list)


Please do the tab test. Some of us do this even if we don't need accessibility tools, as it can be faster (those who create forms which cannot be tabbed through are evil).


Everything everyone else has said. However, if you don't have a budget for this, you can do a few basic things that will greatly improve the usability of your site.

First, can you navigate your entire site without using a mouse (including any widgets, forms, embeded stuff)? You should also have a "Skip to main content" button that is the first element you hit when you tab into your page.

Next, download the NVDA screen reader, which is free, turn off your monitor (or close your eyes), and navigate your site using it. I recommend using FireFox for this.

Finally, use a color contrast analyzer plugin for your browser to ensure you have enough contrast between all of you elements.

From there, you can review the WCAG 2.0 spec to get into the fine details. If you have the budget, hire a consultant/contractor. What I described above doesn't make your site pleasent to use for a disable person, just usable.


Surprised no one has mentioned this yet, but pa11y was what a large enterprisey company I used to work at tried to adhere to. http://pa11y.org/


You can switch any iPhone to accessibility mode. Those with vision impairments love their iPhone because generally "it just works"


I couldn't find anything like it, which surprises me. It should be in the interest of companies selling screenreaders that the web is accessible with them. Creating a service where you can submit a link and it shows you a textual representation of how the screenreader sees the page would be immensely useful.


uh, there are a lot of services that do automated scans. These obviously arent completely failsafe, but still give an indicator.

just google relevant keywords https://www.google.com/search?q=accessibility+scan+website

and if you want the full experience: just enable the screen reader and try to use your website with it.

/edit: and i almost forgot: chromes build-in Audit tool in the Developer Panel includes some Accessibility tests as well


These scans are in my experience always just that: they read the source and match it against a set of common anti-patterns. I was talking about something that would tell me how common commercial screen readers interpret an arbitrary new construct. If you have a specific reference to something else, please send me a link!

Screen readers often cost significant amounts of money, and are not trivial to "just turn on".


I believe y4mi was referring to screen reading software, not a physical device. Like VoiceOver on macOS.



I'm sorry - did I offend someone. Or did limited mobility cause a click on the fiddly and untitled downvote link instead of following the link :-)


We do this, but we've labeled our field very explicitly with something along the lines of "Please leave this field blank, it is for SPAM control.". We also provide a hidden error message if it is filled in to alert the user that it really should be blank.

It's also always place this field after the Submit button with the idea that a user with a screen reader would never make it that far. Bots still see it and add it to the post request since I don't think they care about the order of the form fields.


But then what if you disabled CSS or use Lynx? (You should also add text next to it to say don't use it, and hide that text as well sa the form field. Then, if the form field is visible for any reason, the user will know not to use it.) (Using scripts for it also applies, if you have scripts disabled or are using Lynx or something that does not implement JavaScripts.)


Can confirm, formerly used a spin of this technique and had to stop to better support blind users.


I've used this technique with a label that said "Leave Blank", also hidden with CSS. Seemed to work great, but now I wonder.


I wonder if using aria-hidden on the input field would work or if the bots would also ignore it.


On the face of it this looks like an issue with screen-readers reading the page simplistically.

An element that is not displayed should not be 'displayed' by screen-readers either.


It's been a couple years since I've implemented one of these, but I've used this method in various forms on projects for at least a decade. Sometimes I use a hidden "email" field. Sometimes a hidden "subject" field. Hadn't thought of a "url" field, which is a great idea.

It works surprisingly well.


> This does not work for sophisticated bots (never met one)

The basic prerogative of a sophisticated bot is to ensure you believe that.

Having written a variety of sophisticated bots - some from pre-existing libraries, others from scratch; for specific websites and for general purpose - I'm reasonably confident most people who think they've never seen a sophisticated bot are mistaken.

I agree with you that anything more sophisticated or bespoke than a mass-spam bot is rare. But rare things happen often to most websites with nontrivial traffic. The types of bots with the most funding and skill behind them are the ones which don't try to spam anything on a website at all.


But this is good enough in many cases. It's intended to reduce the volume of input from a form in order to prevent overloading human resources at the next level of filtering. If one or two bots per week get through, it isn't a big deal. Not only are the more sophisticated bots less common, they also tend to not send a bunch of messages.

So if you have a contact form, this simple method may reduce the spam content to a low enough level that the effort to implement some type of third-party service is not necessary. There is not enough incentive for someone to target the form directly.

In addition, the captcha service may actually deter real submissions, whereas this is completely invisible to non-bots.

If you have a form that has a greater incentive for bots to abuse, then you need something more sophisticated.


If you work in the Ruby ecosystem at all, there is a gem called invisible_captcha that does just this. It beat back most of the bot signups we were getting, though some still slip through occasionally.

https://github.com/markets/invisible_captcha


Note that that library is not accessible and is hence illegal to use for most websites in the US. I've raised an issue on the project [1].

[1] https://github.com/markets/invisible_captcha/issues/52


Are most websites run by businesses which employ 15 or more full-time employees?

To be illegal the website must be run by a business which employs 15 or more full-time employees. Or the business is some form of public accommodation like a hotel. From what I have read.

Of course it would be better to make sure the website is accessible, but I'm mostly commenting on the statement that it is illegal.

https://www.businessnewsdaily.com/10900-ada-website-requirem...


Whoa, it is actually illegal to make "not accessible" websites?


In the US, it is if you're an incorporated business serving as a "public accommodation" as defined in the Americans with Disabilities Act, which includes hotels, restaurants, theaters, storefronts. If it is just your personal blog, it doesn't apply. It is the same law that requires storefronts to have wheelchair ramps, but not homes. See Gil v. Winn-Dixie: https://scholar.google.com/scholar_case?case=674450226911160...


More than that; as someone else linked [1], if you're in a business employing at least 15 people (more than half the year), you are also required to have your site be accessible. You don't have to be serving as a public accommodation, for that.

[1] https://www.businessnewsdaily.com/10900-ada-website-requirem...


https://www.section508.gov/manage/laws-and-policies

They even have a compliance tester.


Yeah it's been the only way to make sure people with disabilities aren't left behind.


Ontario, Canada has a similar (maybe a bit more relaxed) law: https://www.ontario.ca/page/how-make-websites-accessible


Awesome! Love a good Ruby solution. Please keep these coming :)


A similar tactic I have is to just require JS to submit a form. Sorry noscript users.

That said, we encounter many sophisticated bots and also a decent number of what I'm pretty sure are real people in low-wage countries pasting data into forms. That last one is tough.


This is simple form of a honeypot but it is really ineffective. Any bot with even minimal sophistication will know to leave the hidden field empty.


I agree that it's possible - potentially trivial - for a bot to figure out if a field is hidden from view, but "really ineffective" seems a bit extreme for something I've seen work well multiple times.

It's not a definitive solution, but it's an easy and practically free first line of defense for a young project, and depending on the project, can stand for years.

Overall, it depends on the sophistication of the bots your project attracts.


You hide it with CSS, not type='hidden'


So now you're filtering out bots and people with disabilities?


Nope, screen readers (usually?) ignore elements with "display: none".

To be sure, you could add aria-hidden="true", which I'd guess most bots don't recognize.

http://alistapart.com/article/now-you-see-me/


Many don't just use display:none but position text off-screen or make it tiny or use sizing and overflow:hidden. I've seen a blind user be tripped up by these, so yes, it often also filters out disabled users.


Many people use tools incorrectly, that doesn't mean you shouldn't use those tools, you just have to be aware of the problem, which everyone in this (sub)thread now is.


A decent screen reader will look at the final rendered Dom and obviously cull items off-screen, the same colour as the background, or 1px high.

If it doesn't, its a bug in the screen reader.


There's quite clear standards on how screen readers are expected to interact with the DOM. What you claim is in none of them, and the techniques you mention are explicitly mentioned as something authors of webpages should avoid/mark up correctly for screenreaders.


No such screen reader exists, so you could claim it's a bug in all of them, but it's not a useful position to take. Screen readers are tricky enough to write without having to second-guess people who are trying to hide-but-not-hide text.


Unfortunately I've not given thought to people with disabilities in this case. It's an oversight and that's on me.

I wonder if bots are smart enough to figure out aria-hidden="true". Possibly empty it out using javascript on submit. I would guess a bot not using CSS would also not be using javascript? Unsure, would need testing.


Could we just tell the people with screen readers to ignore it?


As a fallback I imagine so yeah. I'm trying to think about what would be going on at the time, wondering if unexpected instructions inside a form would be confusing.

Disclaimer: I don't know how a screenreader would present this, example only

"Form entry. Input name. Input email. Ignore this field it's for spambots. Input url. Submit" -- In this case does the message more naturally apply to email or url? I'd imagine there'd be a pause after input email (to wait for the input)? I need to set up a screen reader :)


And then hide the warning using CSS too, so it gets picked up by screen readers just like the hidden input!


Should we put the onus on screen readers to not read elements that would not be displayed when the page is rendered in a browser??

Admittedly, I am not familiar with screen reader standards, but my gut feeling is that they are doing their users a disservice if they are not representing what browser users are seeing as similarly as possible.


You can set placeholder with something like "leave it empty".


It really doesn't matter. display:none or position:absolute;left:-1000px makes little difference. Honey pots only catch a small number of bots.


I just removed Google's ReCaptcha for a honeypot solution with an additional name and url field. But there's also a timer which checks how long it takes you to submit the form. If you can fill out everything in <2s, you're most likely not human. I have to figure out how people with disabilities can use the page though. Good to get a reminder here. So far it performs way better than ReCaptcha, I haven't received any spam so far and plenty of ham of course. No human complaints either.


A timer is a novel idea. I hadn't seen that suggested before.

Do you anticipate any problems with form auto-filling tools?


I only have anecdata of course but so far I've had a 100% success rate. I imagine this will come down eventually but for the past ~year or so it's been working fine to just hide the honeypot field with CSS


I'm glad it's worked for one of us at least.


I don't know if it helps any but I've been using a field that looks real.

Depending on where it is the name= would be surname (where the form submission has a name field rather than a first name surname split), website, url, etc


You can still figure out it's hidden.


Sure but the point is that almost nobody does.


Another interesting approach is to include an input which is not visible: due to `visibility:hidden`, due to being positioned well off-screen (`left: -10000px`), read-only, set unfocusable by keyboard (`tabindex="-1"`), etc.

Ideally it would have a varying id / name and a varying ARIA attribute for blind users, saying something like "human users, please ignore this".

It would not stop a really sophisticated bot that runs an actual browser and uses machine vision to detect page elements. But unless your site is very high-profile, running such a sophisticated bot to defeat its protections likely won't be profitable.


Honeypots are the best. It will filter out at least 90% of the spam (or so I have experienced everywhere I have implemented them).


W3C has published an extensive list of reCAPTCHA alternatives: https://www.w3.org/TR/turingtest/

W3C is requesting feedback for the document, if you'd like to make suggestions, please open an issue: https://github.com/w3c/apa/issues


That's a very informative document. Privacy Pass caught my eye: https://privacypass.github.io/ It's an extension that is currently only supported by Cloudflare's CAPTCHA, that pretty much stores tokens after you complete a CAPTCHA, and the next time instead of requiring you to complete a CAPTCHA again, it will use those tokens. The point is that it does it in an anonymity preserving way. You can fork their server for a custom implementation.


So now the spammers only need to solve one captcha?


And more broadly -- it's kind of funny, but the more we all just roll our own solutions to this, the less standardized the solutions, and the harder it is for spammers to scale.


These types of "blanket captchas" basically destroy browsing via something like Tor, right? I feel like I've seen people complain about Cloudflare's captchas when browsing ananymously.



Akismet is a third party service that works really well. You send data there with a HTTP POST and it will reply with a yes or no, it is spam or not spam. It is not that hard to implement. You do have to be aware that you are sending user data to that service, which you have to mention in your privacy policy.

Stop Forum Spam is a similar third party service. You send it an ip address and an email address. It will reply on both items if it is spam, together with a confidence level. Quite interesting way to reply :) It is originally intended to fight registration spam, but you can use it for comment spam or contact forms as well.

JavaScript spamfilters can be very usefull. Most spambots do a HTTP GET for a page with a form. They fill in all the fields and submit it with a HTTP POST. They don't run any JavaScript on that page. You can have honeypot and timeout fields on a form that get manipulated by JavaScript, and spambots will not validate. Works really well, and all transparent to the user. The only "risk" is that in the future spammers might start using more sophisticated spambots, like using Electron or Chromium. I implemented spamfilters like this in a WordPress plugin and it works really well for me: https://wordpress.org/plugins/la-sentinelle-antispam/


> The only "risk" is that in the future spammers might start using more sophisticated spambots

You’re also making your website unusable for people with Javascript blocked or disabled in their browsers.


In the context of the question, that is not relevant. reCAPTCHA requires JavaScript as well. The question is about an alternative to reCAPTCHA. Both methods use JavaScript.

I do understand where you are coming from though. And I also think this alternative is better in this regard. reCAPTCHA loads JavaScript from a third-party domain. With JavaScript spamfilters you are loading them from the first-party domain.


reCaptcha has a noscript alternative with iframes and checkboxes. Once completed you have to manually copy an authorisation string to a field and submit it.


I don't think the noscript alternative works anymore. I get this message "Please enable JavaScript to get a reCAPTCHA challenge."

https://www.google.com/recaptcha/api2/demo


Huh, last time I hit a cloudflare website on Tor it directed me to https://www.google.com/recaptcha/api/fallback?k=...

Perhaps that's just a cloudflare thing.


That I didn't know. I don't remember ever seeing it, I suppose it is not used much?


I can't think of any reason why js would be disabled in a browser other than by personal choice? In which case you're making the website unusuable for yourself.


It's often disabled by people who use screen readers, or by users who care about their privacy.

If your regular website (not a webapp) doesn't work without JS, then you're failing as a dev.


[flagged]


Please don't do this here.


As per google "reCAPTCHA is a free service that protects your website from spam and abuse" but instead one can argue that reCAPTCHA is a service that transfer spam issue from the provider to its users, so at the end one provider will be free of spam (I guess) but all of his users will be spamed, tricked, fingerprinted and abused to actually constantly work for free for this 3rd party ant-spam service


I suppose you expand the usage of spam to also mean the recaptcha mini-games where you tick all the boxes with traffic lights in them. I agree with the sentiment. I'll add that if you don't use anti-fingerprinting or anti-tracking measures, then recaptcha catches on really quickly that you're a person, and it's not much of a bother in that case. The problem with it is that it's made for the ad-peddling web, not for the private web, or whatever the alternative should be called.


I think it is best to design your own captcha around your use case. All you need to do is make the amount of work for spammers too high for targeting your site.

Just recently, I added the idea of a captcha that might actually be enjoyable for users to my list of "things that should exist":

http://www.gibney.de/things_that_should_exist

The idea is to show the user a random image and ask what is on it. If the image is beautiful, that might even be fun. And there are many sites that offer beautiful public domain images. And have tags for everything in them.

There probably are many other funny and enjoyable captcha ideas one could implement.


Rolling your own CAPTCHA is very, very likely to introduce accessibility issues for your site. It is the accessibility equivalent of "rolling your own cryptography" for security. Among surveys of screen reader users, CAPTCHAs are regularly listed as the single most frustrating part of trying to use web sites via assistive technology.

Even if a CAPTCHA does offer a non visual alternative, it is very common for it to be inaccessible for folks with cognitive disabilities (eg, dyslexia) or motor impairments. Another common issue is assuming that users all speak English fluently. In this example, "beauty" is likely to be sufficiently culture specific to cause localization challenges.

https://www.w3.org/TR/turingtest/ is a good resource for learning about the accessibility implications of many common types of CAPTCHA implementation.


PassThePopcorn has a similar CAPTCHA-like implementation to the Pexels one you mentioned in your article. It has a repository of movie posters (which I think are user submitted to the movies themselves), one of which is chosen and the user is asked what movie the poster is for. I've always thought of that as an "enjoyable" CAPTCHA.


GazelleGames does that with game covers. Lichess has one where you need to find a checkmate in one move. It's usually pretty easy, but could in theory be frustrating for someone just starting with chess.


I've always thought this is a terrible CAPTCHA, not because it's hard but because one could reasonably write a bot that simply reverse image searches the movie poster and picks the closest response.


Exactly.

Plus with your use case there may be other criteria, for instance, if you have an 'apply now' job application form you can take in other data such as how long it took for someone to fill in the form and where their IP address is.

If you are hiring for a job in London and you are not likely to hire the office manager from Timbuktu who spends less than ten seconds uploading their CV and writing some cover letter then you can make your backend form processing not forward that email on to the HR department.

Putting a timestamp in the form as a form field with it encoded is easy. On the submit side you can unencode it and come to some judgement on the matter.

There is also the hidden checkbox with 'hideit' set to 1, not sure why that works but it does with a form you have written yourself, i.e. not stock Wordpress.

Although we don't like Google doing their deep-stalking of the visitors, fingerprinting them in re-captcha, there is no harm in collecting a little bit about the user. The user agent, screen size and location is useful in a sales/support perspective. If someone has a posh computer that says something about them. If they are using an old copy of a Microsoft browser then that says something about them.

On a general forum there can be standards of English to enforce. If someone is not using capital letters to start sentences, not using punctuation and not spelling so well then that can be flagged before they hit the 'send' button.

I have done a lot of tidying up of email lists created by bots and what surprises me is how easy it is to spot the fakes. It is like the bad guys in movies and games, doing everything possible to make it easy to get 'em. If spammers did real world robbery they would carry a bag labelled 'swag', be wearing 'Groucho glasses' and a stripey jumper.


I had one of those answer a question boxes on my MoinMoin wiki. It was something like, "Enter an element which combines with oxygen to form CO2". The spammers worked their way round that pretty quickly. There must be humans who are working on helping the bots.


I had a serious problem with bots spamming my forum. I implemented all the usual captchas but none of it worked. What I found interesting was that I was able to defeat the bots simply by "tricking" them. I kept the old forum up but basically set this forum to auto delete the content on it. I then setup a brand new forum and for whatever reason bots don't spam it, at all. It is almost like the bot goes to the original forum, spams it, and then moves on thinking it has completed its mission. I even flat out disabled the captcha to see if anything changed and nothing did. The new forum never got spammed. I have no idea why that happened but it strangely did work. When I do get spam, I don't think that spam is from bots. It is from humans posting instead but that is at least manageable to clean up.

It kinda leads me to conclude that each developer has to create "out of the box" solutions instead of some plug and play solution. If a plug and play solution is developed then all the spam bot creators start figuring out ways to crack or simply create a service for human based cracking. If unconventional methods are used on each site then it gets more complicated for the spammers.


I’m a fan of the Chinese-style captchas where you just move a puzzle piece with a slider. I have no idea how defeatable it is vs reCaptcha but it’s far far less painful.


It seems like it is effective.

We were getting a lot of automated requests, and right when we put the waterwall on our page, it did a good job of picking out those users and not impacting others.

After a few days though, those users were able to start getting through again, but based on the timing between requests, it looks like they might have had to start operating the page manually.


Literally anything that is even a bit unusual works against general-purpose spambots. No need to have big complicated games and puzzles.

But if a bot decides to target you specifically, all of these things are trivial to defeat. So once again they are not useful.


What happens here in the background?

I can imagine at the frontend you have some JavaScript, where an input field gets filled in or something. There has to be some server side checking as well, otherwise a simple HTTP POST would submit fine.

I do like the idea, but if you need JavaScript anyway, why not have some invisible inputs. They work for now.


Yeah, it's very widely used in the crypto world :)


How do these work? I assume they try to do some machine learning or other magic on the sliding action?


I don't know these, do you have an example?


Geetest is one of them, I've seen it on crypto projects generally, binance for example.


A picture with a missing piece as a hole is presented to the user. The missing piece is floating in front of the picture that can be dragged with the mouse (or via touch). A slider let the missing piece to be dragged left or right, to make it easier than 2D free movement. The user has to drag the missing piece on top of the hole on the picture to prove he's a human. The location of the missing piece is randomized for each interaction.


QQ mail have this captcha: https://mail.qq.com/ This is how it looks: https://i.imgur.com/9Am4PWu.png


The best solution I've ever come to that didn't negatively impact my clients was generating a UUID on the server via an ajax call 100ms after page load. That UUID was stored in a cookie, and returned via AJAX and stuck it in a hidden field on the form.

Server checks cookie != null and cookie == hidden field, and returned a 200 OK regardless of if it failed (used the response text for success or failure indication), and deleted the cookie.

Implemented it across a network of sites ~10 years ago, and only a handful of spam had gotten through when I quit that job 4 years ago. They had been getting 10-20 spam comments per day per site.


That will work for low end drive by stuff. Anyone motivated will have a better bot. I was running one 10-12 years ago that was essentially a headless browser. It had a JavaScript runtime and a custom DOM. It could run jquery, prototype, ajax and just about everything else that was popular at the time. I even had a custom flash runtime in there for the jackass sites with the nav in flash.

These days you could just throw together a selenium script and call it a day.

This kind of stuff is fine for stopping comment spam because there are so many other opportunities out there that the spammers move on to an easier target. If you need to protect against a targeted attack then it’s a lot more difficult.



Does any kind of captcha stop a targeted attack? I don't think that's what captchas are for.


Maybe. Maybe not. They are pretty darn difficult. I’ve seen a few that are fairly straightforward to break. For the good ones, you’d need some quality CV tech and if what you have is that good then you’re probably better off using it for something other than breaking captchas in order to post comment spam for penis pills or whatever they are peddling these days.


You can pay teenagers in the Phillipines a few dollars a day to solve any kind of captchas for you. Maybe I have a different concept of a targeted attack, but that seems certainly in the realm of what a criminal enterpise or nation state would expend on a high value target.


Some webpages seem to use captchas to stop (targeted) DDOSs, having a easily defeated (but unique) Captcha might not help in those scenarios.


So this solution would work because most automated spam would not make AJAX calls?

Something like Selenium posing as a real user would bypass this kind of protection wouldn't it ?


It works because most bots do not keep running the javascript after pageload like a real user's browser would.


I was thinking to roll my own cookie solution as well, but this makes the form submissions impossible on multiple tabs.


> It has come to a point that traditional text/sound captcha challenges are trivial to bypass today

I have yet to see a general-purpose tool to which you can throw any text captcha and it’ll solve it.

Just because there are academic papers that demonstrated it once doesn’t mean there’s still a huge barrier to entry in implementing this solution (which spammers won’t do as long as it’s easier to move onto another target).

There are paid captcha-solving services out there and even those are still powered by humans even though it’s in their commercial interests to automate the process. Them not doing so further suggests that AI is not there yet.



Does it actually work? Speech recognition is often terrible enough in normal conditions, so I don't expect it working well on audio captchas which are often designed to counteract speech recognition.


Buster works great! I use it all the time.


It might be worth considering a honeypot approach. E.g. having a field in the form that isn't visible for users that, when filled in, indicates that it is likely a SPAM submission.

https://www.projecthoneypot.org/


I do this but I also put a javascript field that is already checked and js unchecks it. not great for no script users but pretty good for any random bot that is using curl or some other scripting language and doesn’t check your trap box.


You can make that checkbox visible by default and put label "I'm not a bot" and uncheck + hide it using JS, that way noscript users will still see the checkbox and uncheck it manually.


Bots might be smart enough to uncheck that from the label; maybe text near the top of the form that says “please uncheck the checkbox near the submit button”?


You could use CSS to place label and checkbox visually close but completely unrelated in the DOM. Not great for accessibility, but better than the current situation.


Unfortunately, there aren't many good captcha systems that don't do the equivalent of what ReCaptcha does, because we're at a point where fingerprinting users is a strong signal to help identify contractors doing captchas on behalf of bots.

Even some silicon valley products use captcha-breaker services. These services present themselves as sophisticated APIs but in reality they're just dispatching work to humans who accept pennies an hour at internet cafes; a competition with Amazon's mechanical turk for digital sweatshops. They're common and cheap and the tech industry feeds them. Undercutting the workforce doing the captcha busting is the only viable way to stop that.

Your real alternative is to do the fingerprinting yourself.


There was a good podcast about this [0] just a couple weeks ago. They interviewed the guy who invented CAPTCHA as well as the head engineer on ReCaptcha v3.

The gist of it was that in a few years, all Captchas will be useless because machine learning is too easy and cheap. The only way to defeat spam will be to use reCaptcha v3 or something like it, because those services will use what they know about you to determine if you're a bot or not, plus their own machine learning of what "normal" behavior is for your website. It sounds like ReCaptcha v3 is basically an app level IDS.

[0] https://www.npr.org/sections/money/2019/04/24/716854013/epis...


> The only way to defeat spam

No, no, no. There are many ways to combat spam, and there is no silver bullet. "Determining whether you're a bot or not" is just one tool among many, and one that lets human spammers through, or gets too intrusive and starts blocking humans.

IMO the best approach still is to focus on the content they post rather than trying to figure out who/what they are.


Google wants you to think reCaptcha is the ONLY tool. That way they can get more user data.


> those services will use what they know about you

This is inherently user-hostile, as it presupposes tracking and identification. I don't want them to know anything about me!


Which is kind of the trick - there may be a point not to far in the future where it is nearly impossible to tell the difference between hostile bots and users who are just really into privacy and not being tracked.


I'm fairly certain that Google doesn't care.


It's not about machine learning, it's about a spray and pray business model.

If your goal is to generate, say, 5,000 spammy backlinks you're going to have the choice of building smarter and smarter bots to bypass CAPTCHAs and filters, or just tossing the same dumb bots at a wider pool of target sites. The latter is always cheaper, if you're focused around your basic blog-spam sort of scenario.

I could see it different if you had a specific high-value service that was worth bot writers targeting-- think of registering email accounts en masse, or an ecommerce site getting thousands of test charges an hour on stolen cards. But even then it's still about just a matter of being "faster than the other guy the lion is chasing" -- you just need to be inconvenient enough that the malicious user finds a more accomodating service. That needs little in the way of an AI arms race, it can be something as simple as rate limiting.


In my personal blog I am using "Riddler" Drupal module, and have had good experience: https://www.drupal.org/project/riddler

You can create your own Captcha questions / answers. I feel like this is the preferred way of handling spam posts, creating your own custom Captcha implementation.


This is what I do on my own sites too.

Of course, given some bots relay the captchas to humans, this arguably works best for a site with a specific niche in mind, since they can ask a significantly more difficult question that only someone interested in the topic would know.

Like say, how a PhD maths related community or blog might ask a university level maths question, a chess forum may ask a chess related one and one about a certain gaming franchise might ask questions about said franchise. Bonus points for keeping out people with no genuine interest in the topic.



Eventually your questions will be answered by humans and added to database.


If it takes a while, it wouldn't be too painful to update to a new question periodically, i suppose.


Might be good idea to extend the plug-in so adding questions is as easy as sending an SMS, then you can spend less than a minute daily to add a question/answer combo:

    Color of the sky at night? Black


#000000


I have a mail server with a new address generated per post (or per comment for thread functionality) on a blog i run. People then get to mail their comments. For all reputable mail sites I let things directly through, for everything else I use a spam filter turned to 11 together with a mail-back link for post verification.

I have had zero spam the last 8 years.

The code is ancient and runs on an even older version of lispworks with Auth details hard coded all over the place, so I the time it would take for me to share it would be longer than to rewrite it in some hip language.

Had I been lazy and not as privacy conscious I would have let Gmail do the spam filtering for me.


Have you considered open-sourcing that and posting it on HN? I'd use it.


I have been asked to many times, but it is part of a largish website written in Common Lisp. The codebase is written by me over 3 years with no consideration for modularity. It would be a considerable effort to break it out.

There is nothing technically novel about it. Heck, python even includes an SMTP server in the stdlib. You could.probably write a PoC in a couple of hours.


> so I the time it would take for me to share it would be longer than to rewrite it in some hip language.

I'm reasonably sure the answer to your questions is yes.


Depends on why you need it.

Captchas work well for telling humans from bots for the purpose of denying automated/scripted access. But here a simple IP-based blacklist works well, because of how many bots now live on Amazon's properties and some such.

You don't need a captcha to filter out bot spam. That's a massive overkill.

stopforumspam.com works well. You can combine it with a simple keyword based filter, have it tag hits with a cookie, temporarily blacklist the IP and then filter them out based on that as well. Auto-submit it to stopforumspam too. Obviously, also have whitelisting in place, e.g. to let through existing customers, previously cleared posters, etc.

For bonus points, first-time posts that look OK may be put into a "shadow ban"-ish mode, whereby they are visible to the posters and mods, but not anyone else. Until they are cleared. This works equally well.

The bottom line is there's no spam that doesn't try to promote something and they aren't likely to target just you, so there's always a keyword/URL you can latch onto, and it also makes sense to participate in a distributed monitoring framework to piggy-back on each other's first hits.


"there's no spam that doesn't try to promote something" is, unfortunately, not true. Or at least, some spam plays the long con. On sites where karma or social graph confers advantage, bots will harvest one or both through low-effort, high-payoff posts. Various disinformation and distraction campaigns may sell only confusion, discord, or volume of content. And if it's the graph itself or specific connections which matter, ossiby for hishing, recruitment, or other compromise, you'll see other behaviours.

Not all media manipulation is commercial. Not by a long shot.

You're fighting the last war, if not older.


As a user I found geetest [1] to be really friendly and much easier to use than recaptcha. I have never integrated it myself.

[1] https://www.geetest.com/


Simplest way is to use filtering.

``` (defparameter spam-words '("viagra" "cialis" "v1agra" "c1alis" "tamadol" "hydrocodome" "doxycyline" "prozac" "prozca" "prizac" "doxycyclins" "anx8ety" "amytriptylone" "poker" "laxative" "anatrim" "breast" "penis" "fiorinal" "sexy" "kaspersky" "hoodia" "thyroid" "coupon.com" "vuitton" "coupon" "fetish" "famotidine" "footwear" "sweetwater" "sunglasses" "ninja" "www" "http" "cheap3ddigitalcameras.com" "aquadivingaccessories.com" "tastyarabicacoffee.com" "yourmail@gmail.com" "bit.ly" "cottonsleepingbags.com" "italiancarairbags.com" "newpopularwatches.com" "glasslightbulbs.com" "browndecorationlights.com" "fx-brokers.review" "ceramicsouvenirs.com" "xevil" "senuke" "captcha" "xrumer" "vkontakte" "апрап" "erectile" "spellingscan" "lialda" "lamborghini" "doubles your bitcoin" "pro-expert.online" "specified wallet" "selected wallet" "online casino" "multimillionaire" "win-win lottery" "lottery" "Перезвоните пожалуйста" "yuguhun88@hotmail.com" "meeting-club.online" "from2325214cv" "did you receive my offer" "Domain zone .de" "all your photos" "Pay 1 BTC" "to our bitcoin wallet" "you will be sued" "police will be interested" "hacked")) ```


If HN had this filter, you wouldn't be able to submit this comment.


I had a strange idea about solving this problem: How about a micro-payment, something like $0.01, instead of solving a puzzle? In that case maybe you won't care if many bots login to your website.

I think that I by this time I have the technology to make something like this work, I was wondering if this is a good solution though. What do you think?


Having to make a payment to join a website is maximum friction and makes your website seem dodgy. It seems more likely they are trying to steal cc details by asking for such a small payment.


> How about a micro-payment, something like $0.01, instead of solving a puzzle?

This approach may turn legitimate people away, namely:

1. People from regions where it is uncommon to have means to interface with payment processors.

2. Minors who, for one reason or another, are not able to obtain a debit card/credit card. Similarly, PayPal refuses minors.


You could use JavaScript cryptocurrency mining instead. User clicks a button to activate miner script, it runs in their browser for 10-30 seconds or whatever, then reports back to your server that they are good to go.


You’ll lose people who are blocking this kind of shit. Also you might end up on a Firefox blacklist.


Don't know how the blacklist works. As someone who blocks javascript by default, if I were asked to enable this to submit a login, I would probably be okay doing so if I could be confident a script was only mining and not fingerprinting me. (Big if)


There are torrent sites that do this already (not as spam prevention, just to generate money). It's pretty annoying but I guess for spam prevention you could make the length of time shorter.


I'm against automatically mining on people's computers, but I think it's interesting as an opt-in: click this button to run the miner for a fixed amount of time in return for {posting a comment, attempting a login, etc}.

Since I'm also generally against javascript, ideally the code would open-source and it could be verified that it doesn't do anything malicious...


It also seems like a nice idea to me ¹. It could also be a potential solution to ads and such. Pay a few cent and get a cookie allowing you ad-free browsing on a given website. You could also imagine a half centralized system where “website collectives” get together and the fee you pay allows you access to any of the website in the collective…

Couldn't gnu taler allow this kind of stuff?

¹ https://news.ycombinator.com/item?id=13829545


Payment processing is nontrivial to set up, and doesn't make sense in a lot of contexts.


What if we had a mechanism that makes this kind of setup very trivial, and also risk free for the visitor? Then would you consider this kind of solution? EDIT: Same question for tty2300 (:


We already do with cryptocurrency. Many people are incredibly hostile to using their credit card anywhere that isn't a physical location.

My parents don't even trust Amazon. They use Visa prepaid cards that they fill up at CVS whenever they have to.

Arguably they're doing the right thing and we're all doing the wrong thing. Either way, micropayments are one of those "ideal world" scenarios that are unlikely to transpire anytime soon.


What would prevent the bots from using the same system?

The minor cost may be offset by the fact that getting a post through to a site that has almost no spam because of its filter is valuable in itself. If the bot is sophisticated enough that the stuff it posts is hard to distinguish from a human shill or shitposter it may even prefer sites protected with such a system.

Useful for the vintage VIAGRA HERE link dumpers perhaps though, but those can be filtered out with a content filter.


Google will never allow this to happen (as they benefit from the web using ads for monetization).


You have no idea how often I've shouted at my monitor that I would rather pay the site hoster a one-time fee than use google captcha ever again, and I really wish more sites offered this.

Say what you want about the site, but 4chans pass method of payment, which just removes all the captcha's when posting and lowers the post timer, was a fantastic idea, and just works.


I think that there is a real opportunity here. I just went over all the comments about this idea. It seems like most of the objections relate to issues with the payment medium itself, and not with the actual idea of paying instead of solving a puzzle.

Some examples:

> It seems more likely they are trying to steal cc details by asking for such a small payment.

This relates to being afraid using a credit card online against a non credible seller.

> Payment processing is nontrivial to set up

This one is about the technical difficulty (And possibly also regulation) for setting up payment processing

> regions where it is uncommon to have means to interface with payment processors.

This one is about inaccessibility to payment means, which also relates to a problem with the payment medium.

> You could use JavaScript cryptocurrency mining instead

I assume that bo1024 suggested this because he implicitly believes that setting up a payment processor will be more difficult, and give bad experience to the user.

The only objection I have seen which is more specific to actually recognizing humans is this one (by Freak_NL)

> What would prevent the bots from using the same system?

First, I think that it should make operating bots on a wide scale much more expensive. Second, if you make money out of bots coming into your website, would you really care that they are bots? I assume that the reason someone would block bots from the first place is that bots harm their money making business. If you had a system to collect this money back, shouldn't this considered as a problem solved?

I am working these days on a decentralized payment system that supports micro-payments. (I promise there are no blockchains inside). Solving the captcha thing can be a very useful use case. I will be happy if anyone wants to join forces on this one (My email is real (at) freedomlayer (dot) org).


I and I think most people would be happy to pay a small amount to each site we visit, but payments system to make the idea practical just doesn't exist. I for sure am not going to hand out my CC number everywhere; it's inconvenient, it's intrusive (full legal name, address, phone number), and I have no idea how securely and for how long they store that information.

I would really love to see a way to make small payments to sites that I visit simply and all managed in one place.


Whatever you use, please remember not everyone has good vision / hearing / dextrous mouse control. Captchas can be a nightmare for accessibility. Most of the 'clever' solutions to this will completely block some subset of keyboard users / blind users / eye gaze users along with the bots.


It's really frustrating talking to client side developers these days about 508 compliance. It feels like only one in 10 understand the concept of accessibility.


For automation I would recommend ratelimiting endpoints. I personally tend to use 5 requests per IP/second along with 100 requests/minute as default and then override specific endpoints to e.g. 1 request per IP/hour.

For user input I recommend keeping the first comment submitted by a new account/IP hidden until you/moderators have approved it, after which new comments from that user no longer needs to be approved before they become visible to other users.


If it's a problem with spammy blog comments I would recommend to just remove any kind of input on the site and ask people to send you an email with their questions and concerns.

Be sure to use a separate email and give it to readers on your about page via some language like "questions (dash) and (dash) comments (at) (this domain)".

If it's for account signups just send an email confirmation link and possibly include a code in the email that has to be submitted manually as well.


For those who are interested in an alternative CAPTCHA service, we at NetToolKit are putting the finishing touches on a service that we hope to launch at the end of next month (June). The CAPTCHAs are interactive and meant to be fun for the user -- no machine learning training involved. We'd be thrilled to get some early feedback before launch, so if anyone is interested, please reach out via email or via our website (both in profile).


What about PoW (https://en.wikipedia.org/wiki/Proof-of-work_system) ?

It require minimum user interaction, and you will eliminate most of spammers bot, since it will lose its cost-effectiveness. You can implement something like coin-hive proof-of-work, without having to mine monero anyway


Looking at these comments (141 at the time of this post) the answer looks to be: No.

I have small business clients, Google's reCAPTCHA is our best option. They aren't willing to pay for some obscure, and expensive one-off solution that might work. They just want the spam to stop. I fill out reCAPTCHAs every god damned day because I work on the web. Asking "normal" users to fill out a handful each year isn't asking that much.

Maybe for your startup "rolling your own" makes sense, but not for small biz.


There are definitely alternatives. For example, https://hcaptcha.com/ (I have not used or evaluated them).

If any of your small business clients might be interested in our new CAPTCHA service that should launch late next month, please let me know (see profile for contact information). Our pricing is projected to be $10 for 100,000 requests.


I think reCaptcha is very terrible. For HTML forms, a simple question could be used (change them sufficiently often when spam is received), or you may require the user to edit the URL manually in order to access something, based on the client IP address perhaps (which would be displayed). I also invented a protocol-independent CAPTCHA, which is also text-based, and uses SASL. You should allow the user to implement the code themself if they want to do rather than requiring that they use your code.


I also recommend mailing the website owners who uses ReCaptcha about why it's a nuisance and stating that you and many others won't be using the site anytime soon.


Doesn’t work, they don’t care. I mailed+twitted to many among which were Dropbox/HumbleBundle/Packt etc.. most just ignore me or came back with canned responses like “we value security blah blah improves security blah blah your data...” :(


Unless your website is under targeted attack just putting "2+3" on a image will block 99.9% of all bots. You hardly even have to distort the image or randomize the math but doing so could help against script kiddies. Only drawback vs reCAPTCHA is you have to show the captcha all the time instead of automatically suspecting bots.

If you are under targeted attack by someone more dedicated, captcha is not going to be the only defense in your book.


> Unless your website is under targeted attack just putting "2+3" on a image will block 99.9% of all bots.

Absolutely.

I did exactly that with a PHP script which generated images with the GD library.

It definitively worked for me.


For the use case of blocking general web form spam, we've had good results with relying solely on IP reputation crowdsourced via AbuseIPDB:

https://www.abuseipdb.com/about

Occasionally we're an early target of a fresh IP, but we report it back to the database to help later victims. The more people contribute to such a system, the better it gets.


Many IPs are shared by more than one endpoint, and hundreds or thousands of endpoints sometimes share a single IP. Say a home router is compromised on an ISP using CGNAT and you block by IP, you could potentially be blocking an entire neighborhood of innocent users.

Having parts of the Internet blocked for huge numbers of their customers puts pressure on ISPs to monitor and censor users traffic, which is not the direction I want to see the Internet go.

IPs do not have a one-to-one relationship with users, and I feel strongly that they shouldn't be treated as if they do.


Thank you, that's well-put. I also share that value.

Another value that I hold is that the cost of bad actors shouldn't be externalized to innocent people, i.e. that it's unfair for me to pay the cost of completing a captcha just because somebody somewhere else is misbehaving.

I'm not sure how to reconcile those.


> nobody in their right mind should add a script that fingerprints users

Fingerprinting users is no more a problem than using cookies, there are far more legitimate reasons to use these things than illegitimate. The problem is Google and Facebook using these techniques to spy on people at massive scale.

Once again the problem is Google and Facebook not the internet.


The problem with fingerprinting is that it's used to track users across sites. Cloudflare's "super cookies" and ordinary ad-network cookies are both examples of fingerprinting which use cookies and could definitely argued to be bad for user privacy.

A text-based challenge-response captcha doesn't fingerprint you in any way. Google's reCaptcha does -- not to mention that it uses you for free labour that I would argue should be a violation of minimum wage laws in most countries (Google hires people to do data entry for ML, so why am I being forced to do the same work for free in order to post a comment on a forum or log into a website).


A commenter on HN some years ago claimed a 100% success rate at blocking spam by requiring all web form submissions to be cryptographically signed. This solution struck me as stunningly elegant both by raising the standard for constructive feedback and promoting public awareness of secure communication.


Cryptographically signed where and how? Do you mean to say that you get a full string like "a=1&b=something&c=[1,2,3]" and hash/encrypt that? Or do you encrypt each individual field? Or something else?



I like those math questions captcha. Fun and I doubt a bot or even a real persona spammer will waste time on this. Make the question appear on an image instead of text and the bot will also have to do OCR on top of being wolframalpha to defeat your captcha.


TBH, I had decent results with the most trivial "3+4=" style CAPTCHAs. I know some sites where there's even only a single, hard-coded question/answer.

In the world where it's not about scaling to infinity, you're not getting targeted attacks, you're getting stupid bots that pummel every site thinking it's a WordPress install from 2007 and anything different is enough to scuttle it.

By the time your service is popular enough to justify building a focused bot or tossing low-wage workers at it, hopefully you have the revenue in to finance something more sophisticated.


> nobody in their right mind should add a script that fingerprints users

I helped vendor-select and lead implementation on a fraud solution that was an integration with SiftScience (yc-funded, https://sift.com/), which relies on fingerprinting. This was years ago but I still think about the project and how it plays with user privacy etc. I will say that -- fingerprinting as a component in fraud management is/can be highly effective.

The problem is, once you get into payments fraud through bots, I think the conversation becomes way more nuanced. If you're looking for a solution to bots spamming or throwing bad data into your app, maybe that's a little extreme. But if the choice between privacy and becoming a front for credit card fraud and chargebacks, you're in a choice between who the victims of your service are going to be, and how much ill is done.


Have seen these guys, met the founders a while back at AppSecUSA: https://funcaptcha.com/ It’s those puzzles/games as captchas.


> Google went Sparta with their reCaptcha be, and nobody in their right mind should add a script that fingerprints users, specially from an adtech company

Elaborate?


Recaptcha is also blocked in China. Users there wont be able to bypass it at all to accomplish a protected task.

Anyone knows of a good alternative that works there?


Yup, this is a huge problem on my company's website. We really like reCaptcha but, alas, have to fall back to our home-rolled version because reCaptcha completely fails for our Chinese users.


"went Sparta"???


recaptcha v3 is invisible.


Just ask a math question?


A bit late, but honorable mention: https://xkcd.com/233/


Just out of curiosity, isn't that feasible today to implement some machine learning to stop spammers? Is there any project trying to come from this angle?


That's like saying "why don't you use algoritms and code". Like, sure, but what is it you're proposing? What features would you learn from and match against?

(For those unfamiliar with algoritms and code as solution, it's a reference to this: https://www.reddit.com/r/ProgrammerHumor/comments/5ylndv/so_... )


Actually we have implemented something like that for HTTP requests. Features would be: IP (first 3 octets are probably enough), posting time, length, time to solve captcha, time between clicks, country where the IP is located, post contains certain words (can be learnt from spam posts), does the post contain a link(y/n)

I think I would start with these, probably looking into what other people are doing.


An ideal machine learning implementation would also need the context, such as the original post itself, parent comment(s), other comments in the thread, etc.

It can be quite difficult than one might think. For example, now that we are talking about spam, the word "Viagra" shouldn't block my comment, even though my parent post doesn't mention the word or in a situation where nobody else mentioned it.


Sure. I know a developer at Distil Networks[1] and that's exactly their core business idea. Apparently it's very hard, though, because they have positioned themselves as an alternative for very high-value targets (like airline tickets) and charge accordingly for their services. It also seems to require some amount of human intervention still.

[1]: https://www.distilnetworks.com/


Good to know, I guess it boils down what sort of features you can use, which might be hard to find. Based on my experience feature hunting can be a lengthy process.


Akismet has been doing this for some time I believe. The downside is that you have to send the form submission and other meta data (IP address, etc) to them. This can be better than the current reCaptcha v3 though.


Good to know! I am going to check this out.


How about put your users first and don’t farm them out to Google ML training because someone told you to. Recaptcha is a cancer on the web.


Isn't that what they're doing, hence this question?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: