Hacker News new | past | comments | ask | show | jobs | submit login
Captchas are Becoming Ridiculous (andrewmunsell.com)
330 points by JoshTheGeek on June 25, 2014 | hide | past | web | favorite | 194 comments

The squished-up word is the control word, and the straight one is the unknown one. You only need to get the wavy word right and just guess at all the cut-off examples.

^ This is the way to go. Don't waste any energy on the "unknown" word, just fill out the one that has been smeared out and fill in bogus for the rest.

If you want to post on 4chan and don't have a Pass, you need to solve a captcha for every single post. It becomes easier with practice, I fail maybe 1 in 10 captchas. And the more captchas you solve correctly, the easier the captchas for your IP get.

If enough people colluded to use the same unknown word, say "foobar", then couldn't they train recaptcha to believe that is the true value of the word? If I understand recaptcha correctly, and assuming they don't detect collusion well, then eventually the known word pool would get poisoned with a surplus of foobars.

Some 4channers tried that already. You can guess which 6-letter epithet for black people they tried to use.

Ladies and gentlemen, I give you: "the internet".

>You can guess which 6-letter epithet for black people they tried to use.


Hint: It starts with "N" and ends in "igger"


did it have any letters in between?

An engineer said they have defenses against this

>Checking for offensive words is only one of the filters we have. Even without that filter, it’s essentially impossible to get recaptcha to return a false result.


> it’s essentially impossible to get recaptcha to return a false result.

Famous last words :P

This has happened in google image labeler. Now offline, it was a game where two random people were matched and needed to find new tags for the same image - the scoring was cooperative so points were given when you both chose the same word in the same game. /b/ of course quickly organised the jesusporn metagame... Label the first image as jesus, second as porn, third with jesus, etc. This resulted highest scores for some time.

ReCaptcha does do some collusion detection, and I assume they do it well, since it has not publicly been stated to be a problem area.

The first one.

Probably the mankind will hate me, but I'm the kind of person who answers correctly the control word, and writes an incorrect, but similar word for the unknown one.

This is my way to protest against recaptcha.

That one doesn't even have any kanji, come on.

rnmnthwu cntnfru

You don't need to enter anything at all for the second (non-control) word

I've solved a lot of captcha's in my time, and really have never experienced the trouble the author is detailing. Not only am I relieved when I see a reCaptcha since they are some of the easiest and most forgiving challenges, but I don't recall ever having repeated bad/unsolvable challenges presented on the same page.

Sure, maybe sometimes you get a weird one and fail it. But typically the next challenge is easy to pass. Seems the author cherry-picked some of the worst reCaptcha examples for the article, but wrote it in a way that made it seem they were presented back-to-back.

Besides this -- the article makes no attempt to offer a better solution.

Captcha's are really the best way we have right now to "prove" someone is not a bot. Hidden Form fields, etc, don't work and are easily spoofed. Sure Captcha's can be beaten by bots sometimes -- but I trust Google's scale/volume with ReCaptcha to handle that for me (for the most part).

Captcha's are not going anywhere anytime soon.

>Besides this -- the article makes no attempt to offer a better solution.

That's completely irrelevant. Criticism is not about solving the problem. It's about pointing out that the current solution is inadequate.

Most movie critics never wrote, directed, or acted in movies. It doesn't invalidate their criticism.

In fact, your criticism of the other poster's criticism doesn't offer a better solution than criticism either. You simply criticize that post. (And that's OK, if ironic.)


I hate with a passion the attitude of "Don't bring me problems. Bring me solutions". Sure, if you've got a solution as well, that's great. But I'd much rather know there's a problem that you don't have a solution to than be completely ignorant of it.

I go even a bit further: don't bring me solutions, just tell me what makes your life difficult.

As a tools developer, I observe that user-provided solutions almost never address anything outside their specific problem, which is potentially only one of many things the feature with the issue is designed to address.

Some of my colleagues tend to point to user feedback as gospel, including the ways suggested to "fix" the issue. But those fixes are often myopic and laden with technical debt.

But I will never disbelieve a claim that something is confusing, or hard to use (almost never, anyway; some people are idiots). Just don't be offended if I don't fix it in the way you came up with.

>That's completely irrelevant. Criticism is not about solving the problem. It's about pointing out that the current solution is inadequate.

Well it's a pointless criticism when everyone knows CAPTCHAs suck. People have spent a lot of time working on them trying to find better ones that work consistently at scale and have failed.

It's similar to someone now writing a criticism on the testing procedures at the Chernobyl nuclear power plan.

"Captcha's are not going anywhere anytime soon."

Its turning in to the new version of "3rd world outsourced phone support" A strong indicator the user simply doesn't care about the customer experience.

For some industries / companies, this is perfectly OK and BAU. For others it can be a company-killer.

I see a captcha and I know the company doesn't like me, doesn't like what I'm doing, and doesn't care if I know how they feel about me. And for some situations that's perfectly OK. Certainly not all.

In the past month the reCAPTCHA challenges I have encountered have been like the article. I just have to hit reload until I can get something remotely readable or I managed to guess the letters correctly.

This is exactly my experience too. I find myself having to hit reload increasingly to get something readable.

That's been my experience too, I thought I was just getting old.

> Captcha's are not going anywhere anytime soon.

Which is truly unfortunate, as they're a fucking abomination, an embarrassment to the IT industry in general.

Try solving these actual examples:



湘悪 rhaval

fgsfds onsupsel

I don't quite see why people assume recaptchas are real words, they haven't been for a really long time. The control is always a made-up word and almost always solvable. If you can't solve the other one, enter whatever.

> Try solving these actual examples:

First one: a rhaval Second one: a onsupsel

You need to realize that only one word needs to be entered. For the other word just enter any string (or even no string works in some/all cases).

How the hell is anyone "to realize that only one word needs to be entered" when that's not what the instructions say to do?

Same reason 4chan reCaptcha is no longer used to digitize the other word. If they had their way the correct answer to all reCaptchas would eventually become "nigger nigger"[1].

Fun times.

[1] the idea was to always write the correct word and just enter "nigger" as the other word. This eventually led to reCaptcha disregarding 4chan answers from the pool of resolving weird words.

You aren't supposed to realize it or do it. Google uses the human's work on solving the unknown word to either get street numbers from photos or digitize books.

However, the whole point of them having the humans do that work is /they do not know the correct answer/. Since they do not know the correct answer they cannot be basing the test of the CAPTCHA on it. From there it is not a big leap to surmise that you are only actually being tested on the word that actually looks distorted on purpose.

Either you just know because internet or you get it from trial and error.

It's probably written on recaptcha website or wikipedia article, but basically you get presented one word that machine can't read (the distorted letters) and something out of a book that google had digitized and uses you to ocr it. You also get a lot of address numbers for google street view.

Just input the distorted word and type anything for the other, and it will work. Of course the instructions won't tell you that or else people would act accordingly and google would lose this free labor source for the tedious work of proofreading digitized version of books.

You don't have to preemptively realize it. You just have to try it. Like he said, literally anything you type for the unrecognizable word will work.

Experience with reCaptcha.

I can totally understand why those would frustrate non-technical users, but on a site like HN I would expect people to know how reCaptcha works.

That said I think reCaptcha has been getting much harder recently due to the arms race with bots. I now sometimes fail 3-4 times in a row.

>I would expect people to know how reCaptcha works.

Why should you even expect that? Recaptcha was interesting in 2006, but anyone not following the news around that time or around the its acquisition by Google might not have learned this.

If most other people entered the same thing then the word gets into the database, and your guess may not pass muster. So the question is - what percentage of users should be able to get the captcha before the answer is set?

> Captcha's are not going anywhere anytime soon.

You're probably right. But the fact remains that captchas aren't good enough. They can be partially automated; blackhats can use captcha solving farms which will be at least as accurate as the average human (probably more accurate, I imagine).

A better solution might employ heuristics similar to DDoS mitigation techniques. I really don't know, but there is a need for something better here.

"I've solved a lot of captcha's in my time, and really have never experienced the trouble the author is detailing."

That might be the problem right here. Try browsing with Tor or passing through an anonymizing proxy. The more you solve correctly, the easier they get. The more unknown you are, the harder.

If you look farther down the page, you'll notice hashcash launched their proof-of-work captcha replacement thing today. I'm not saying it's related, but both of these are on the front page together.

have you tried this recently? zendesk's use of recaptcha was so bad that they are getting rid of it.

I personally experienced this and can't wait.

Here's something interesting. If I go to the ReCaptcha demo page in Chrome that is logged in to Google, I get all house numbers, a lot of which seem like easy OCR. If go to the same demo page in Incognito mode, I get the two word version instead, like this blog is complaining about.


Yes: reCAPTCHA now keeps a profile of you and gives humanoid users easier reCAPTCHAs.


in other words, it's not giving you reCaptchas, it's giving you unCaptcha's yet so you can do work for them for google streetview.

Yep, in incognito mode Google hasn't built up a profile of you yet so they assume you're a spammer.

If they are using your profile, what is the point of recaptcha then? It's circular reasoning.

1. Block people who don't have a profile that may be spammers

2. Free human OCR

You are free labor

It was one thing when they were digitizing books. I refuse to help them improve Google Maps. I have never correctly entered in a map number. As long as you're only off by one digit they are accepting

> I refuse to help them improve Google Maps. I have never correctly entered in a map number.

So why don't you stop using Google products if you hate it so much?

Let's hope this gains some traction


I do, but sites still use captchas that give my work to google.

I assume that too many failures puts your profile back into the "hard captcha" pool.

They probably count total tries per period of time, not failures. Bots can be accurate if the captcha is easy.

And if you start entering some number that are close to the correct answer most times it's accepted. But if you do it like ten times you might be switched to the two words version.

Same here, maybe they trust people who are using their services more and want them to OCR some google street view stuff.

Leaving no captcha would be a hint for spammers to provide requests with valid cookies thus simplifying the task. I assume that's the reason for increased difficulty in incognito.

I've also seen a lot of house numbers in Firefox.

It is based on the google tracking cookies, not being logged in. I don't have a google account, but I still see the house #s on firefox, until I flush their cookies.

The audio captchas are psychotic. They are scarier sounding than anything I've heard in a horror movie lately, and I have never been able to solve one.

Psychotic and almost impossible to solve. I tried an audio captcha recently on a whim and I was bewildered.

I think any site that uses reCAPTCHA must not have any regular vision impaired users.

https://www.youtube.com/watch?v=HhFLC8ZZQeM https://www.youtube.com/watch?v=KNVcIogEXOo

Analysis of audio captchas has led to a number of exploits for several captcha systems including recaptcha - which is what all that horrible obfuscation is trying to prevent.

Yes, the goals of the reCAPTCHA project are to filter out bots and to digitize the text of the world (not necessarily in that order).

The mission statement of the reCAPTCHA project is "Protect your website from spam and abuse while letting real people pass through with ease."

I don't think anybody is passing through the audio captchas with ease. Nor are they helping to digitize anything.

I think that the use of captcha's as a reverse turing test was always secondary for reCAPTCHA anyways. If someone can write software to solve the visual captcha's, that is a great accomplishment. Once that is accomplished, we will need a new type of reverse turing test. Perhaps we are approaching that point.

I'm interested to hear the perspective of users with a visual impairment on the audio captchas.

It seems to be really good at preventing visually impaired users from using your site. Unless it is true that they really do develop better hearing.

I don't know how you would solve that problem, other than not having one at all, which of course wouldn't be fair. Having the audio clearly describe the solution would make the captcha useless.

Yes, I'd really like to hear their perspective on this. I listened to a screen reader once and found it very difficult to understand as well.

They are especially hard because the numbers overlap with each other-- so the numbers "three", "five", and "seven" become this weird mash of "threeven".

Apparently, if you have good cookies you get an easier captcha

find accessibility: https://www.google.com/recaptcha/intro/index.html

Here's the one I got on linkedin recently


I got one in Hebrew once:


Looks like something you might get with CRAPCHA:


Brilliant. I think I'm going to use this on my site and ask humans to not enter anything. Anyone entering something will be considered a bot.

OP's is obviously


while yours is

rlyeh ryights

I was just thinking this yesterday when I had to recover a Flickr account I hadn't used in ages. I had to solve captchas from Yahoo and Microsoft.

The Yahoo captcha used rotating, bouncing letters on a scrolling background of more letters - ridiculous. Microsoft's was just a typical smeared mess, but no easier to actually solve.

I think I failed each at least 3 times.

It's not just difficult captchas, but use of them everywhere. The site my university recommends for ordering textbooks starts inserting captchas if one searches more often than perhaps twice within a minute. Another I can't recall the details of requires a captcha solve to make any sort of profile change despite being previously authenticated.

The article's largest complaint is not being able to read one (1) of the two (2) words in the captcha challenge.

ReCaptcha only requires one (1) out of two (2) words to be correct in the challenge.

It presents one known-by-the-system-word, and one not-known word. If you get the known word correct (the easier of the two to read) then it passes the challenge.

ReCaptcha then pools the answers for the second not-known word and after pooling thousands (or more) responses, then that word becomes "known" based on the average answers (and then that word is "digitized" and used by google maps, or ebooks, etc).


And for those wondering, I find it easiest to read captcha's by just looking at the letters by shape.

Going down the list in the article:

onightsl secretary.

. phaRega

o ndaaar

proximity rsgrrem

and khseeke

. azedcg

elearsal 5

ination amesye

se ebtyR

Reomi now

ivestshm nwre

Again, it's important to note, you only have to get one of the two words correct to pass the challenge. So.. probably 99% of the above list would pass.

No, the problem the author mentions is explicitly with ReCaptcha. He addresses your edit in the article, which you would know if you actually read it and didn't just skim. The problem, as evidenced by the author's many examples, is that the control word is often distorted beyond reasonable recognition, and the new word is not valid data. So neither of the two words is solvable.

Edit in response to your edits

You've deleted your previous edit. Still, even with your current edit it is clear you are not actually reading the article. You say:

Again, it's important to note, you only have to get one of the two words correct to pass the challenge. So.. probably 99% of the above list would pass.

However, the author of the article explicitly states that he did this:

I decided to just guess the first word and hope “secretary” was the control. It wasn’t.

So the author correctly identified one of the two words (and makes the same identification as you did), but was still rejected because it was not the control word.

You obviously don't have an accurate understanding of ReCaptcha implementation, and you apparently are not reading the article with comprehension, despite claiming several times that you have.

> You obviously don't have an accurate understanding of ReCaptcha implementation

I do (I've implemented them many times), but no point in arguing.

I must be the only person who finds the level of security a captcha provides worth the 1 to 2 seconds it takes to type in a Captcha. And if done properly, you should only have to type a captcha once per site.

Which is easier? Allow form spam on your site, or have a user type a captcha once the first time they visit and decide to post a comment or something? Captcha's have provided a tradeoff between inconvenience and protecting your site.

Pardon my confusion, but wasn't your original comment arguing against homebrew Captchas?

Also, you say that you have an accurate understanding of ReCaptcha implementation based on the qualification that you have "implemented them many times". ReCaptcha was created by Google, so unless you work for Google on the team that implemented ReCaptcha, it doesn't seem possible for you to have "implemented them [ReCaptcha] many times".

Minor point, but reCAPTCHA was purchased by Google, not created by them.

I don't think you understand how to implement a captcha on your website. Unless you use a 3rd party CMS where implementing a captcha is just a checkbox and pasting in your api key, then it's a lot more work.

Don't re-invent the wheel? That is, please continue to give useful data to Google for free?

From the article, in the context of ReCaptcha, it seems like Google has stabbed itself in the face with the sword of data.

Google may think the data is telling it something, but what it's really managing to do is irritate legions of humans with terrible (borderline hostile in this case) UI/UX.

Yes, please do. Because Google has and will continue to make way better use of that data than anyone else.

The original article was specifically about reCAPTCHA, not homebrews, and how difficult they now are (something I've also noticed lately). Either give it a (re-)read, or if you're saying you were able to easily read the examples in the article please share the answers! :)

ReCaptcha only requires one (1) out of two (2) words to be correct in the challenge.

It presents one known-by-the-system-word, and one not-known word. If you get the known word correct (the easier of the two to read) then it passes the challenge.

ReCaptcha then pools the answers for the second not-known word and after pooling thousands (or more) responses, then that word becomes "known" based on the average answers (and then that word is "digitized" and used by google maps, or ebooks, etc).

Again, sorry, I've got to point you back to the original article. The author explains the details of reCAPTCHA's known/unknown word-pair clearly, as you have done, but goes on to explain that the impossible-to-read word was actually reCAPTCHA's "known word", so the CAPTCHA was impossible to pass.

Yes, but the author actually only tried once. Every other example he gives, he claims he hit refresh to get a new one instead of attempting it. Also, I have to wonder if he was simply mistaken about that first attempt. Are we sure it didn't just fail because his username/password was wrong and display a new captcha, causing him to assume he had failed the first captcha?

I do have to admit most of those are cases where both words are difficult or impossible. But we can at least assume that the easier of the two (the one not cut in half) is the control in most of those.

You might want to read the link...he's complaining about reCaptcha. The basic complaint is that the ability to recognize letters is no longer good enough to distinguish human intelligence. We need to identify some other trait that's simple for humans and difficult for computers. Those "home grown" captcha solutions are likely better in this regard from reCaptcha since they don't have the possibly contradictory goal of digitizing books.

Please re-read the article

I have.

The article's largest complaint is not being able to read one (1) of the two (2) words in the captcha challenge.

I was pointing out, that this complaint is not valid since reCaptcha (where all of the article's screenshots are from) only requires one of the 2 words to be correct.

You must not have read very carefully then. From the article:

It’s important to note the way reCAPTCHA works. Each user (or bot) is presented with a control word, and a word unrecognized by OCR. This control word is already known to Google (who runs reCAPTCHA). If you get this first word right, it is assumed that you get the second word correct as well. So, in reality, you only need to guess the key word correctly.

The author explicitly addresses your point and if you looked at his examples, most are very difficult for both words. In many of his examples, the control word is distorted beyond reasonable recognition, and the new word is cut in half or worse.

Actually, reCaptcha requires a specific word to be correct. Specifically, the illegible one.

The article does make this point.

You must have missed the part where he tried to guess with the easier of two words and still failed. The control words are rarely any easier to read than the unknown word.

Your original comment said that the biggest problem was home-grown captchas, and that people should use something established like recaptcha - the article was specifically complaining about recaptcha.

After your 'edit' my comment makes no sense.

I don't want to spend more than a second or two working out what a captcha says - if it wasn't something I absolutely needed I'd probably have gone away long before the author's patience ran out.

Google exploits ReCaptcha to recognize street numbers. When typing a numerical ReCaptcha, you are doing OCR for Google Maps for free.

Well, not exactly for free. Its a trade. You're willing to donate your small amount of time to Google in exchange for Google providing security benefits to the website which you are attempting to use.

I'll repost this once again. Why you should never use a CAPTCHA: http://www.onlineaspect.com/2010/07/02/why-you-should-never-...

The proposed alternatives are crap. Why shouldn't an attacker read the CSS...

Many (maybe even most) people who use CAPTCHAs are never going to be targeted with a personalized attack. Instead they're using CAPTCHAs to prevent generic, spray and pray spam. The bots know how to post a comment on a Wordpress blog, but making even a small tweak to your comment form can get rid of 99% of them.

> select three hot people but there's only two on that image

Lately I've noticed 90% of my captcha's being a single number. That is it. A number like "1057" with nothing else. what do they honestly expect me to do with this?

Basically I have to fill in the number and then guess whether it was the first or second set of characters and fill out bogus before or after the number and hope I got it right. The numbers weren't even hard for a computer to read. The only thing it does is waste everyones' time.

This happens when they already know there is a fair chance that you are human. So they give you an easy one and get their OCR for free. You are probably already logged in your Google account when it happens. Go incognito, browse with Tor, etc. and you will get the impossible ones.

I've seen guesses that those numbers come from addresses in pictures taken by the Google Street View cars.

What is everyone thoughts on this type of CAPTCHA?


If I saw that it wouldn't be worth my time to continue with the site. Next we'll be punching the monkey for a prize.

it's interesting but two things come to mind. Viewing it on this site makes it seem like it's great, but what about in context? If i came across that, I'm forced to think more than a capthca, just since it's so different and unexpected (maybe that goes away if it's very widespread). Also, it looks like an annoying banner ad game from orbitz or something of years back - that might make me avoid it, ignore or just not trust it.

in context we get about 40%-60% higher conversion rate depending on the use case and have over a 95% success rate. we have a lightbox mode that makes it really apparent what you are doing and only has you attempt a captcha after you submit and the form is validated.

Looks too much like spam.

Founder of Are You a Human here. Happy to answer any questions.

Nice! I am playing with the idea to put it on my site for fun. :-)

I use it as an alternative captcha for contact forms on some WordPress sites I run.

Met a couple people who work at the company (I'm in Detroit and think that's where it is based) at some startup events a couple years ago. That's how I found out about it.

I can't believe we have not figured out something better than captchas by 2014. I would imagine Google could figure at least how to bake something into Chrome which many would eventually follow. It's asinine that all legit customers have to go through such a silly, completely unrelated hoop.

Google has figured out something "better". Basically they use all the data they collect on you to determine whether you're likely to be a bot or not. If not, you most likely won't even see a captcha. (And if so, you'll get a difficult one.)

It must think I'm a bot, then. I get captchas all the time, and they've been definitely getting harder and harder to solve over time. I suspect their algorithm pushes me further and further into the bot camp the more captchas I fail (I have about a 10-20% success rate now, and it's dropping). At this point, I only stick around for the 5-6 tries it takes if I REALLY want to use whatever service it's guarding. Often, I'll just leave the moment a captcha appears because I don't want to be bothered. If it's a contact form, I'll use google to find a direct email address or phone number instead (Yes, I've had to resort to that approach many times).

Hmm ya, I could certainly see a positive feedback loop developing. Out of curiosity, do you regularly log into a google account and/or use services like gmail? I'm guessing not, since presumably that would give it plenty of data. (Unless you actually are a spammer... ;) ) Also, are you connecting from a location that Google might see as more likely to produce spam? (I would guess any non-"western" country to some extent, with Nigeria likely being at the far end of the scale.)

I have gmail open most of the time, but it rarely asks me to log in. It's only when I try to use other services that use recaptcha, or the rare occasions where I have to relog into google and mistype my password more than once. Last time that happened it locked me out of my account for 24 hours because I couldn't solve the captcha (although I could still access gmail on my phone).

Are you suggesting somehow automating the process of proving you are a human?

No. Just trying to avoid something that is completely stupid on many levels.

I always worry that they're getting harder because I'm getting old, so it's comforting that an arms race against bots is the real cause! :)

I literally had the same thought after I unsuccessfully tried to get through a captcha for 10 minutes.

It was for a contact form on a vendor's website. Ended up going with another vendor who had identical product

would you wait instead of entering captcha with hashcash.io widget? Widget like that - https://hashcash.io/auth (notice unlock switch)

absolutely not -- at least not on every login into a service.

the on/off metaphor is not clear either - at least make the "login" button be not enabled until the switch is moved

Its sitting on the login form for at least a minute now, filling up the switch background btw

We need a browser extension to help us solve captchas with OCR. This is indeed ridiculous.

I thought about this as well. I went and registered instacaptcha.com but I never managed to do it and the domain expired.

Seems to be doable. The user pays 1 usd/month and gets 100 credits. The extension author can outsource the solving to http://antigate.com/ and get the answer in 15 seconds.

There's webvisum and Captcha Monster. Captcha Monster is like $5 per 1000 captchas or something. Not sure about webvisum.

If someone could use [something like this](https://github.com/mekarpeles/captcha-decoder) to make an extension it would be great.

I think it's time to move to the ultimate captcha: "Is this post spam?"

Then we just hope that the spammers create a perfect solver again :)

I realised an intersting thing there. I also get those complex captaches using firefox. But I also have an Opera12.17 running. With this one my captchas for the same page are ridiculousy easy. Sometimes it's just an house number. One item. I never had one even close to what I get on FF.

I think it depends on a cookie. Once you correctly solve a dozen or so hard captchas they'll give you easy ones from then on.

That would be the logical thing to do. Unfortunately I rarely use Opera. I can't remember solving more then 5 captchas with it. I use FF most of the time with hunderts of solved captchas there. (FF Private Mode did not help either)

I have this fear that skynet originally started as a captcha solver algorithm. :p

A paper came out awhile ago showing that neural networks are extremely vulnerable to adversarial examples [1]. They showed even slight perturbations of an image generated with their method could cause NNs to misclassify it, but appear no different at all to a human. I am interested if methods like this could be used to extend the life of CAPTCHA a bit longer, even as computers are starting to beat even humans at object recognition tasks.


I think you have made a good point and possible solution here. But I am keen to see if researchers can quickly address this issue with NNs. Someone might find an easy fix to this problem.

We found that neural networks can solve CAPTCHAS much better than humans, 99.8% on the "hard" ReCAPTCHA instances: http://arxiv.org/pdf/1312.6082.pdf

This is why visual recognition is just one of the signals you need to use to tell humans and computers apart http://googleonlinesecurity.blogspot.com/2014/04/street-view...

Dear website owners, please, do not use reCaptcha. As was noted in other comments, Google discriminates against the users who try to protect their privacy by showing them nearly unsolvable variant. For instance, I see the hard version all the time since I started to use Privacy Badger for Firefox. It is also not impossible that they discriminate by user-agent.

And generally it is a very bad idea to choose the most popular service among the alternatives, as by doing so you are contributing to the centralization and monopolization of the Internet.

It's almost like, if they already know you're not a bot, they don't have to try very hard to re-prove it, or something.

Think of it in a Bayesian sense.

If 10% of anonymous users end up being bots (the prior), and the "hard" recaptcha has a 1% false-negative (incorrectly identifying someone as a human) rate, then of the anonymous users who succeed in getting past the recaptcha, .1% will be bots (the posterior).

But if 1% of sign-in users are bots (probably less than that), you only need a recaptcha with a 10% false-negative rate to achieve the same bot throughput limit. And, those users are less frustrated.

While Google is worried by the false negative, we as users measure frustration with the false positive (failures to identify an actual human) rate. Ideally they would find a system where both rates are independent or where false positive are rare.

Ideally, yes.

I understand how it is justified technically, but that does not invalidate the fact that reCaptcha is discriminating against the users who care about privacy.


Pretty neatly conveys the feelings on this topic.

It took me about 30 min and >15 captcha's before I could register for this site. The audio didn't help either...

They are getting ridiculous.

[shameless blog post promo ahead]

One simple way for minimizing junk going through automated submits. Idea without using recaptcha at all: http://ademsha.com/notes/simple-proposal-to-stop-spam-going-...

It works only with JS enabled and uses randomization in order to stop bots learning how to avoid it.

I have to type this awful thing every time I log into Envato and I can never get it right. It's so frustrating. Envato refuse to acknowledge its an issue.

I don't even get the point of it since you can get passed them by just hiring people off like at http://antigate.com/ for as little as 70c per 1000 captchas

If computers get so good at solving captchas, are we also getting better OCR?

Time to switch to next, harder, AI problems as captchas :)

I hate to be so simple but XKCD said this first


Recent discussion, https://news.ycombinator.com/item?id=7419667 (there are others)

Also a couple of examples http://alicious.com/hard-recaptcha-huh/.

Since 2012 there have been some changes that make it easier under "normal" conditions: http://googleonlinesecurity.blogspot.com/2013/10/recaptcha-j...

Is it not obvious in the first case that "secretary" is the unknown word? Clearly ocr wasn't able to read it due to the fading. Likewise, the cut off words spanning two lines in the later versions are obviously the unknown words. The author states right at the beginning that he understands there is a control and an unknown word; he then proceeds to "hope" that the obvious unknown word is the control in the first case, then skip numerous captchas where the control word is straightforward and the illegible word is obviously the unknown. This certainly sounds like willful ignorance for the sake of a blog post.

Also, '“Onightsl”? “Onighisl”? Are those even words?' No, my understanding is that dictionary words are never used as the control, so as not to be vulnerable to dictionary attacks.

Edit: I'm not suggesting that these captchas are in any way good; they do clearly have issues. I'm just saying that storyline in the blog post seems contrived. To me it would be more convincing if presented in a more genuine manner. However, perhaps he was simply very unlucky.

What a concise description of why captchas as they exist today are just awful and we have to come up with a better solution!

There you are, talking on and on and on about some tiny unimportant but extremely specific implementation detail no one should ever have to care about. People shouldn’t have to read a manual about the inner workings of this captcha implementation (and have some experience with what types of text computer vision is good and bad at recognising!) to have any chance solving it.

In this case the author clearly had no idea how that control/unknown system works in detail (it seems like they, just like me, only know that you do not have to recognise both, but they didn’t really understand the reason for that – nor should they have to) but that doesn’t really matter for their argument even a tiny bit.

Fair enough. I didn't mean to suggest that these captchas are in any way good. Only that the author does appear to have technical knowledge of how they work - otherwise they wouldn't use terms like "control word", so it seems that the difficulty experienced was likely contrived for the purpose of writing the blog post.

For me at least, the point would have come across better if that (seemingly) false ignorance were dropped. (Either that, or frame it in terms of, "Here's what an average user sees when they try to log in," or something along those lines.)

It's also clearly "onightsl" and not "onighisl". A couple of years ago, captchas were just as easy / difficult as they are now.

Except for some untrusted websites / users who can get really difficult captchas sometimes: https://i.imgur.com/6pAatnC.png

I believe that this will eventually become a losing game. Normally there's an arms race between those creating security and those thwarting it. In this case, once the recognition schemes are as good as humans, the game is over for good.

If only we could invent the verbal equivalent of a trapdoor function. A word puzzle that would be extremely easy for computers to generate and humans to solve (since we understand language), but extremely hard for computers to solve.

It's a nice idea, but you have to consider the complexity of the word puzzles compared to the average human's brain power. Most people are quite dumb. If there aren't a sufficient number of problems/answers, or they're simple enough for computers to solve, or they're too complex for a minority of humans to solve, you're boned.

The whole thing is a technology arm's race. The best solution would be one where you simply verify fixed private information. We use captchas for verifying a human being is not a bot, right? And we do that because we assume the user is anonymous for a short time.

Instead we could simply provide a secured authentication gateway where one could provide private information that is linked to a human identity. That way it can't be abused unless they have an unlimited supply of stolen identities. Even better would be if everyone signed up for a TOTP service provider and used their token generator and service-account to prove their human-ness without needing to put in sensitive information. But that's probably too much work.

> Most people are quite dumb.

I know what you're trying to say here, but consider today's xkcd[0] as a counter-point. I think "most people" are quite capable of solving a lot of puzzles. This issue is that any puzzle that we can solve in a reasonable timeframe is often a good target for a computer-generated solution as well.

[0] http://xkcd.com/1386/

The xkcd is only necessarily true when it is the median average that is considered. However, most people are not necessarily of mean average intelligence.

Except you lose the benefit of anonymity, which is a big draw for many of the places using Captchas. Unless I don't understand your idea, which is possible.

Well anonymity isn't the purpose of captchas. Captchas are intended to provide human-confirmation with the least friction possible, mainly for rate-limiting of services. Having to establish you are a specific individual takes effort, but just typing in a random word is simple. Anonymity is just a by-product of the frictionless [simple] part.

You can still come up with new ways to verify someone is a human for specific uses where you want anonymity, but they will always be part of the tech arms race if you want them frictionless. To avoid them getting more annoying you need a way to authenticate an individual identity, as that allows you to rate-limit access.

You could, of course, do TOTP and totally preserve anonymity. Unless the TOTP service provider is compromised, in which case all bets are off (but perfect-forward secrecy might solve that?)

Anonymity is not just a byproduct of frictionless experience. It used to be a fundamental part of most interaction on the web (on the internet no one knows you're a dog, etc.).

I agree that anonymity is orthogonal to the purpose of captchas, but usually a captcha is only required when you don't have identity. This can be because you haven't established identity, or because the identity is in question, but also because the site does not want to require identity. In fact, outside of first time user sign ups, most captchas are used specifically to allow people to engage without needing an account. So in most cases you use a captcha because you want to allow anonymity.

There already exists several systems like you describe: login with your Google account, Facebook, Twitter. There are already several comment systems (Disqus for example) which make using these as simple as using a captcha for sites who don't care about anonymity. We don't need to integrate identity into captchas.

Off topic slightly, but does people with dyslexia have a hard time with captchas?

I have dyslexia, no problems with captchas though. In fact I found the examples from OP article to be not all that difficult. Maybe it's because when I am solving the capture, I just look at one letter at a time, instead of trying to read and comprehend the word. I found it that in many cases the control word isn't actually a word at all, just string of characters. I usually have 80-90% success rate nowadays, used to be 100%, but they are really getting more and more difficult.


Easy... A stereogram "captcha" ... What's the hidden 3D image? More fun too... http://www.brainbashers.com/stereo.asp

NO!!!!!!!!!!!!!! Please NO! I'm stereoblind but also not a bot.


Sure, if you don't mind cutting out 30% of your audience.

This is where Facebook comes in handy! Please add your captchas there: https://www.facebook.com/IHateCaptchas?fref=ts

It is funny how link [1] from my app solving this problem got more upvotes :)

[1] https://news.ycombinator.com/item?id=7944540

There are plenty of tricks around Visual Captchas. What you need is a semantic captcha that's only recognizable as such by a human. Hide a simple question somewhere in a piece of text.

Who writes the question and answer?

Although...maybe you could outsource the question and answering to Mechanical Turk. Turn the whole thing on its head. Have a real person write a question to try to trick the bot into revealing its botness, have the real human grade the answer.

The problem isn't captchas, but users not understanding how to interact with them. So what if a few are bad? Hammer out best guesses, fast as you can, until you're successful. It's not as if you're graded on accuracy. There is no reason to ever resort to the refresh button,

Out of curiosity, I went and opened the demo page (https://www.google.com/recaptcha/demo/ajax) in a new incognito window and timed myself. I can do about 8/minute at maybe 90% accuracy.

Captchas are only a problem if you compulsively refresh in hopes of getting something clear.

After I've filled out 20 different forms the last thing I want to do is deal with a completely illegible captcha which might go far as to refresh the entire page or wipe out what I've entered in certain fields (password, ssn, etc) each time I get it wrong. That's one way to push me away from signing up for your service.

That's not really a problem with recaptcha so much as the integration with the rest of the site.

Well then you obviously understand some websites have problems with this, so why would a user risk losing all of their filled out data?

Sorry, the user is never wrong. This is an interface problem, and the interface is terrible.

Eh, I've hit some really frustrating captchas where after five or so attempts I ended up just closing the page (these were reCAPTCHA ones...) The problem was most definitely with the captchas.

> Captchas are only a problem if you compulsively refresh in hopes of getting something clear.

You're ignoring people with visual impairment or cognitive impairments.

I've recently seen a bunch of them with just one number. Just a single 7 or 4 on a white background and nothing else. Kind of scratch my head at those ones.

Do you mean the house numbers/street signs? Like these: http://i.imgur.com/yD1FrlH.jpg

If it's those, I guess google uses recaptcha to get data for streetview.

I've seen that, but I've also seen plain black numbers on white that look like they're right out of MS Word. Got me.

hmm, I have to say I haven't had a recaptcha that bad yet, but I have had some bad ones.... But uh... on the first bad recaptcha when trying to guess their password they thought - this recaptcha is ridiculous I will try to solve it of course but just right now I am also going to screenshot it because this is naturally the first thing I think to do!

I agree that those captchas are obscenely bad. :)

I think we really, really need a replacement solution for them that works as reliably vs. bots.

So at what point do we 'switch over' which is to say that the Captcha code realizes that if you solve it your a robot/script because humans can't ?

The API stuff that solves these captchas is really akin to Amazon's Mechanical Turk and outsourced to places like India.


Scroll to the bottom.

I experienced animated GIF captchas with Yahoo's login process. Not sure if that was better or worse than reCaptcha.

I don't get why they're doing this. Animation adds information and makes CAPTCHA easier to break!

Attacker can choose the frame that's easiest to attack and they can segment better with help of motion vectors and differences between frames.

It's difficult, but I was able to read all of those captchas (the wavy ones). Maybe it's a special skill?

If i am relaxing back and have to enter a difficult captcha to watch a movie, I am not watching that movie.

This made me remember: I once saw a website with a moving captcha.

Can't remember where I saw. Anyone knows?

Comcast does that on their password reset pages. E.g., go to https://login.comcast.net/myaccount/reset, type in "foo", and click "Next."

Edit: From checking the source, it looks like they're using NuCaptcha (http://www.nucaptcha.com/). Looks like O2, Groupon, and StumbleUpon are also NuCaptcha customers. You can see examples on this page: http://nucaptcha.com/features/security-features

There is a blog post about defeating NuCaptcha here: http://www.elie.im/blog/security/how-we-broke-the-nucaptcha-...

Happened to me yesterday when logging to flickr via yahoo.

Yesterday I had to go through a moving captcha when trying to log into flickr. I got redirected to the yahoo login webpage where I copied and pasted that 20 something random characters yahoo had me working on for a cumulative time of an hour (I had to tweak pwgen to get some reaaally random stuff and yet see yahoo rejecting it because "too easy" and then wait for an hour or two before I could try again).

Then they had me confirm I was not a bot by asking me to type the moving letters in a captcha.

I had one read 'drink issue' once... Wasn't sure if normal capture or advice.

Shoulda stopped at number 5: "and khseeke" seemed pretty clear to me.

But point taken.

I've said it before, I'll say it again: hashcash.

Alternatives exists, but the usage is low.

A simple solution is google Authenticator (or similar systems).

The only problem is a system for all kind of users and equipment.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact