I broke textCAPTCHA with Python

bjonathan · on Nov 12, 2010

Wow very impressive, just tried it:

Q:Which word from list "duckweed, commendations, civilised, receptionists, flours" contains the letter "v"? A: FAILURE

Q: What word from "interdisciplinary, disorientating, decamps, moan" begins with "i"? A: FAILURE

Q: Egg, pink, white, green and pub: the 1st colour is? A: pink

Q: The day of the week in dog, Wednesday, penguin, trousers, hotel or lion is? A: wednesday

Q: Hand, jelly or Jennifer: the person's name is? A: jennifer

Q: What is fifty three thousand seven hundred and one as digits? A: 53701

Q: Cake, rice, snake, head and jelly: how many body parts in the list? A: 1

Q: If yesterday was Tuesday, what day is today? A: wednesday

Q: Which of these is a body part: lion, jelly, church, butter or nose? A: nose

Xk · on Nov 12, 2010

Since some people are asking how it works, I'll explain how some of those work/fail.

Q: Which word from list "duckweed, commendations, civilised, receptionists, flours" contains the letter "v"? A: FAILURE

This one fails because it recognizes 'duck' as a word and doesn't know what the trailing characters are, so it fails to parse. Adding it to let a word be followed by extra characters makes it twice as slow, but makes it pass it, and so I didn't do that.

Q: Hand, jelly or Jennifer: the person's name is? A: jennifer

This is one of the tricky cases, because I convert to lower case when I pass to the parser. So if parsing fails and the word 'name' is in the string, then I return the first capitalized word and hope for the best.

Q: What is fifty three thousand seven hundred and one as digits? A: 53701

This one was an interesting exorcise in parsing as it is grouped ((fifty three) thousand) (seven hundred) and (one). I assume that all numbers are of the correct form, because it will parse 'one one and two' as the number 4 (1+1+2=4).

Q: If yesterday was Tuesday, what day is today? A: wednesday

This was just putting the dates in order and doing a lookup into the list.

Xk · on Nov 12, 2010

I had yesterday off and so spent four or so hours writing this.

I haven't made it recognize all of the cases. It still gets a few wrong, but usually it'll just say FAILURE when it can't solve it.

As the page says, it's just a ~300 line grammar file and a parser-generator with syntax directed translation to solve the questions.

jasonlotito · on Nov 12, 2010

Chicken, egg: which came first?

A: FAILURE

sigh So close.

amanuel · on Nov 12, 2010

the rooster? FTW!

bl4k · on Nov 12, 2010

care to publish the src?

Xk · on Nov 12, 2010

I replied to another comment, but since this one is getting upvoted I'll reply here too: I do not want to publish the grammar file yet. Later on, yeah, I can do that. But for now I want to leave it up to the people who run this service to improve it. I simply want to demonstrate that their scheme is not effective, I do not want to cause harm to anyone who uses it.

jrockway · on Nov 12, 2010

People using brand-new untested technology for security purposes are doing it for fun, not because it's a good idea. So releasing the source code to break it should not cause any problems.

davidjhall · on Nov 12, 2010

src or it didn't happen. ;-)

oofoe · on Nov 12, 2010

Can I ask what you're using to do the parsing? Or did you write your own?

This is definitely the kind of thing that I could see doing with REBOL's parse operator, but Python doesn't (seem to) come with such a tool built in.

Thanks!

Xk · on Nov 12, 2010

For a class in college I had to write a parser-generator, so I'm using mine.

lisper · on Nov 12, 2010

Very nice. Any chance you'll release the code (or the grammar)?

Xk · on Nov 12, 2010

I'd like to wait a little bit to see if Rob (who it seems runs the site) will make it a little better. There are people who use this service and I don't think it would be right to let spammers attack them.

Granted, it wouldn't be hard for someone else to do the same as me, but I just don't want my work used to do harm to someone else.

jerf · on Nov 12, 2010

You might as well release it. The idea is fundamentally flawed, advantage attacker, and there is nothing they can do about it. You won't be hurting them, you'll sort of be doing them a favor. They've already sunk time into the idea, this is a chance for them to cut their time losses and run.

Though they may take it wrong and try to start an arms race with your code, to which I'd say: An arms race in the CAPTCHA space is already a win for the attackers. In case of tie, spammers win.

jaspero · on Nov 12, 2010

I think you should release your wonderful algorithm not directly as a way to hack text-captchas but in some other useful forms.

mattmanser · on Nov 12, 2010

Good job on the code and kudos for not just releasing the code without giving the author some time.

bockris · on Nov 12, 2010

It's totally hilarious to me that you had to protect your captcha breaker with a captcha.

Xk · on Nov 12, 2010

Haha, yeah. I did it so that people wouldn't use it as a spamming service: I wanted to show that those captchas can be broken, but without any harm to anyone else.

ldite · on Nov 12, 2010

You should have protected it with one of the text captchas - so to use your service to bypass it they'd have to have broken it already :)

Xk · on Nov 12, 2010

Yeah :)

But then it occurred to me that someone could solve one by hand, and then use that answer to solve five more captchas that I asked for, and then recurse, eventually solving thousands of a site's captchas.

bockris · on Nov 12, 2010

It makes perfect sense why you did it, but still funny as hell. (to me anyway)

chaosmachine · on Nov 12, 2010

Great job. Seeing this solve 10/10, it almost feels like artificial intelligence. You should send a resume to the Wolfram Alpha guys (their hit rate was significantly lower: http://news.ycombinator.com/item?id=1891375)

Xk · on Nov 12, 2010

Yeah, but I have the advantage of knowing all the forms of the questions. If you were to change one word then mine would break. Or misspell anything.

gibeson · on Nov 12, 2010

Sorry, but what do you mean by "knowing all the forms of the questions"? It seems like maybe you mean all that all the questions formatted via one of several formulas like: name <x> of list<a..z>. Which would imply your tool simply identifies the format then applies some specific logic to solve the question. I'm unclear that if that is what you mean, why then do you know all the forms of the questions? Text capture purports over 180 million questions, so isn't it likely you may not have found all the possible formats of the questions? Or perhaps did you know because you have a relationship with the text captcha creator and they provided that info to you. Or better yet, did your comment mean something entirely different.

Just curious, thanks.

Xk · on Nov 12, 2010

You are correct, I look for a pattern and apply very very basic logic.

I just got 1,000 questions from the demo page and looked how they were similar. For example, there are many questions of the form "[list of something]: the [ordinal number] [type] is?" so I put that as a rule in the grammar file. I did this for about twenty different types of questions, each of which has three or so different phrasings.

And yes, they may have 180 million questions, but in reality they have only a hundred or so modulo the specific words they pick (for the items in the list, or for the day of the week, or for the letter that the word starts with).

pyre · on Nov 13, 2010

  > I put that as a rule in the grammar file

What are you using the process the grammar file? What type of grammar?

Xk · on Nov 13, 2010

I wrote a parser generator, so I'm using that.

It's BNF-like.

techiferous · on Nov 12, 2010

I just typed in similar questions without following the format and it performed poorly. But at least it's a good proof-of-concept! :)

Name the third item in the list: book, fork and spoon.

A: FAILURE

Q: What is two times 8?

A:

Which is a number: yellow, seven, book, flag?

A: FAILURE

Xk · on Nov 12, 2010

They don't give questions formatted in other ways, so I didn't include that in the logic. As someone else pointed out, even deleting the question mark would break it. (However, making the question mark optional would be a two second fix.)

jamesjyu · on Nov 12, 2010

It's basically an arms race. They could inject a few misspellings and random noise here and there to throw your system off.

endtime · on Nov 12, 2010

Fuzzy pattern matching would probably solve that.

joelvh · on Nov 12, 2010

Great job! I wrote up that post about WolframAlpha and wonder if there's an API so you could integrate WA to make it more robust?

Xk · on Nov 12, 2010

Right now there's no API or anything -- I wrote this just to see if I could. I'm not sure if I'll be putting up an API, because I figure by the time I'd do that I can just release the source for it. But if you really wanted I probably could provide an API it in a couple of days.

adamc · on Nov 12, 2010

The world orange has which letter in the penultimate position? A: FAILURE

Aircraft fly through the? A: FAILURE

Q: A beetle has how many legs? A: a

It does well when it recognizes the questions, but there's still plenty of room for writing new ones. It does show that it's an arms race, as someone else noted.

Xk · on Nov 12, 2010

Very true. The person writing the questions has the advantage in this one because they can always throw in more variations and I would have to figure out all of the new ones.

There is, however, an inherit limitation to question-asking, and that is it must be understandable to a general population. So I would switch from using a parser to just looking for key words and then matching from there. (Which is, for example, how I find names in a list.)

Even if they made it so I only get one fifth the number of correct answers, a 20% success ratio would be high enough for concern. The attackers can just keep trying and trying and if 20% of the time they can register a new bot, then they win. The defender must win a much larger percent of the time.

BoppreH · on Nov 13, 2010

First question could be "air" or "sky". I would guess 6 or 8 to the second one.

I would probably fail both, so I guess this isn't the kind of question that would realistically appear.

harshpotatoes · on Nov 12, 2010

Given the recent wave of reinventing/breaking captchas, I saw the captcha at the end, and thought: "what if this free app is just a very clever way to get a bunch of intelligent people to answer captcha's for a bot?"

Then I realized if somebody went through all this trouble to design a new app like this, they deserve to have me answer captcha's for a bot somewhere.

Q: What nonsense word do the letters G-B-R-D spell? A: gbrd

Q: Which three letter word starts with the letters TH and ends with the letter E? A: th

Pretty neat.

Xk · on Nov 12, 2010

Haha, no, I don't have this answering questions for any bot anywhere.

Actually, the reason those are solved isn't exactly what you'd expect. When it fails to find an answer, it checks to see if any word is upper-case. If it is, it cuts out all non-letters (to remove the comma at the end, for example) and then returns that word. The reason it does this is because it will occasionally asks questions such as "Which word IN this sentence is in all caps." So that's why G-B-R-D gets turned into 'gbrd', and why it says 'th'.

bl4k · on Nov 12, 2010

As soon as I saw the story about textCAPTCHA I knew somebody would tackle it and solve it.

There have been so many 'captcha alternatives' on HN recently, all of them forget that there is a very good reason why current captcha's are extremely distorted images.

moshezadka · on Nov 12, 2010

This is a variant on Schneier's fundamental theorem: Any fool can design a CAPTCHA for which they cannot program a solver. Just like in crypto, the only true test of CAPTCHAs are which one survive the test of time after having been attacked again and again, which is why it's very dangerous to jump on the bandwagon of a new CAPTCHA scheme (or worse yet -- design your own).

This is one reason why I'm partial to reCAPTCHA: there is a lot of experience in OCR systems, and we know what the current state of the art -- and we know what kind of things foil it.

wwortiz · on Nov 12, 2010

The only problem is you get things like google's captcha system that make you question whether or not you are human.

l0nwlf · on Nov 12, 2010

Tried: Q: What does Python prompt look like ?

Desired Answer: >>>

Answer Given: None.

jmatt · on Nov 12, 2010

I like this sort of approach for community oriented captcha. Ask a question that anyone within the community would know or could easily find out but a general purpose spam bot or average person would be unable to solve.

If the community is profitable enough or becomes big enough some spammer will spend 15 minutes or 50 dollars on mechanical turk and find all the answers. As others have mentioned it is an arms race in the end. Back and forth each side upping the challenge.

philfreo · on Nov 12, 2010

Care to give a little more explanation on how it works?

Xk · on Nov 12, 2010

Sure.

I'm assuming you know what parsing is, and what syntax-directed translation is. If not, the wikipedia articles (http://en.wikipedia.org/wiki/Parsing and http://en.wikipedia.org/wiki/Syntax-directed_translation) offer a reasonable explanation.

The grammar file is generally laid out as follows:

Question ::= Phrase1 | Phrase2 | Phrase3 ...

Phrase1 ::= 'if' 'the' Noun 'is' Color 'what' 'color' 'is' 'it' [and now return the word which matched Color]

Noun ::= [sequence of characters]

Color ::= 'red' | 'blue' ...

The trickiest part of it was getting it to correctly interpret numbers like 'twenty one thousand eight hundred and ninety nine'

That part is basically laid out as

Number ::= Number QuantifiedNumber | Number 'and' QuantifiedNumber | QuantifiedNumber

QuantifiedNumber ::= NumberGroup | NumberGroup Quantifier

NumberGroup ::= SingleNumber SingleNumber | SingleNumber

SingleNumber ::= 'one' | 'two' | ... 'twenty' | 'thirty' | ...

finemann · on Nov 12, 2010

These questions didn't work for me

Q: How many thousands is a million?

Q: How many millions is a billion?

Q: There is a dog, cat and a goat: First animal?

Q: There is a dog, cat and a goat: How many humans?

Q: Black, White: Which is dark?

Xk · on Nov 12, 2010

There are a few types of questions that it doesn't solve.

I only had so much time yesterday and so I didn't put in all the different types of questions. Maybe later I'll go back and get all the questions that it can't solve.

Jach · on Nov 12, 2010

Nice little app, but here are two I've used it fails on:

Please join these two "words" together (without spaces): zrtvyoav and ekozuarn A: FAILURE

What starts with "bow" and ends with "ser"? A: FAILURE

binarymax · on Nov 12, 2010

Very impressive! 4/5 for me.

Here's the only one that broke:

          The 6th letter in "aviator" is?
          A: FAILURE

Xk · on Nov 12, 2010

Just letting people know I dropped it from 10 questions down to 5 because right now it's a free google app and I'd rather not pay for CPU-time.

chacha102 · on Nov 12, 2010

Failed on "The blue rainjacket is what color?". You might want to allow for both 'color' and 'colour'.

c4urself · on Nov 12, 2010

i noticed the form matters: so not including a question mark breaks it in "two plus fifteen equals?"

amanuel · on Nov 12, 2010

* What is 3 + 12 / equal too? FAIL

9ec4c12949a4f3 · on Nov 12, 2010

Congrats, I suspected a higher cost of defeating this test.

http://news.ycombinator.com/item?id=1891026