Hacker News new | comments | show | ask | jobs | submit login
Using WolframAlpha to Hack Text CAPTCHA (joelvanhorn.com)
104 points by joelvh 2259 days ago | hide | past | web | 46 comments | favorite

Was curious about the "text captcha" service. It's a collection of questions with MD5 sums of acceptable answers.

They provide an API, but I think this is a case of a project being a "service" to keep the database of questions from being free. There's no technical reason for this to be a service, and it's not a terribly complicated product that would be difficult to scale. It's a static database!

Might be neat to create an open-source bank of these CAPTCHA questions. Maybe I'll throw something together this weekend.

Wouldn't an open-source bank of CAPTCHA questions open the door for an open-source bank of answers to these questions?

Yeah, exactly, I think this is why it necessitates it being a service and kind of assume that it's not a static database for that reason. If you have a fixed list of questions, it's easy to get answers to those once and never have to do it again. Again, I think its a safe assumption that these are in some way generated on the fly.

If you were able to analyze the sentence structure of all 180 million questions, how many different sentence structures would there be? This all points to the fact that you can build algorithms to guess the answers eventually.

Not even just guess them but accurately determine them.

A few years back I was hired by a third party to build a system to break the CAPTCHA on a popular site for various evil deeds. Morals set aside, the money was good and I had a wedding to pay for. A CAPTCHA system becomes quite breakable when it becomes predictable. The system in question used an image based CAPTCHA that used the same (albeit annoying) font for each image, as well as a static distortion overlay and a second set of random distortion. By extracting a thousand sample images I was able to build a system in Perl that could determine the text with an estimated 98% success rate - and when it failed you would just request a new CAPTCHA.

My solution would be to mix up images with logic. I.E.

In the following list of images, which image number contains the green animal: {pic of zebra}, {pic of frog}, {pic of giraffe}

This would require image recognition as well as logic.

Interestingly enough, WolframAlpha can generate a CAPTCHA image of each of these text questions, as to make it harder for a bot to decode AND answer the question! Check it out: http://www.wolframalpha.com/input/?i=CAPTCHA+What+is+seven+h...

It can't work the way you explained it: I just solved your CAPTCHA with 33% success rate (waaay too high for a useful CAPTCHA). Perhaps if you ask for "the three pictures of X animal out of those 9", and you had a database with which animal has which property (and you also ran some fuzz over the images so no two images would ever be the same). I'm still skeptical...

That would assume that it is multiple choice - however if it's still free form text input, requiring the input to equal "frog" would solve that issue. Text + images + logic would offer a lot more hurdles than just any single one of those.

No, that's the reason the answers are hashed. You can't get the answer from the hash, since a hash is a one-way function. This is the same reason you never store passwords in your database in plaintext, but rather hash them first.

As elliotcarlson pointed out the issue isn't that answers can be determined automatically, it's that the cost of determining the answers can be amortized over all uses. It's the same vulnerability problem as rainbow tables. With rainbow tables you spend a lot of (automated) effort computing hashes for password guesses, the key advantage of this tactic is that it is widely applicable to every naive use of that hash function.

The amount of effort for a human to go through the list of answers and come up with answers may be non-trivial, but once completed it's applicable to every single use of the plain-text captcha system. That's bad.

Mechanical Turk

Still, almost all (or all?) of the answers are excerpts of the question. So just test all short excerpts against the hashes, voila, an answer key.

That's not the issue - as soon as you make a list of questions available for the world, all it takes is one spammer to create a matching list of answers and they can go to town. By providing that list of answers as open source you are making it easier for someone to create the counter part answer database.

Captchas are not restricted to reddit and social news site, despite what your link claims.

Oh, that was just to make fun of xkcd. I don't think xkcd links are really that good an addition to a discussion here.


Ha! Valid point!


Just tell me if you need the source code ;P

Also think about the way algorithms (like WolframAlpha) interpret the structure of the questions. Like some of the other commenters, switching some words around makes WolframAlpha fail.

It might be interesting to come up with a methodology for question structure that is harder for algorithms to interpret...?

This is a very interesting application of WolframAlpha but it appears to be purely luck when "success" was the result. Using things such as "2nd item in a..." or "7th digit in..." work in a lot of cases but lets talk about a few.

"2nd fruit in bear apple goat orange" would result in apple because it is looking for second in a list and neglects context of fruit.

"7th digit in abc123def456ghi789 " would result in d when it should be 7. Again not understanding context and merely looking at logical construction.

I immediately wondered the same thing. An example from their questions was:

Q: "Which word contains ā€œzā€ from the list: zoologist, midwifery, spiderweb, crimps?" A: "zoologist"

But what if you change up that list a bit?

Q: "Which word contains ā€œzā€ from the list: action, jackson, midwifery, zoologist, spiderweb, crimps?" A: "jackson"

Alpha's sentence digestion has always left me going: huh?

> "2nd fruit in bear apple goat orange"[1], "7th digit in abc123def456ghi789"[2]

It barfs on these ones, but not like you predict (it's actually worse). "The 2nd colour in purple, belly, yellow, arm, white and blue"[3] gives back yellow, though, so it's not that stupid.

[1] http://www.wolframalpha.com/input/?i=The+2nd+fruit+in+bear+a...

[2] http://www.wolframalpha.com/input/?i=7th+digit+in+abc123def4...

[3] http://www.wolframalpha.com/input/?i=The+2nd+colour+in+purpl...

Oh, not that stupid, huh?

Query: The 2nd colour in purple, belly, yellow, arm, white and blue

Answer: yellow

Query: The 3rd colour in purple, belly, yellow, arm, white and blue

Answer: yellow

Query: The 7th colour in purple, belly, yellow, arm, white and blue

Answer: yellow

Query: The bluest colour in purple, belly, yellow, arm, white and blue

Answer: yellow


If you change yellow to silver, it gives you "colour silver," but if you look at the assumptions, it assumes silver is an element. Clearly it's not really getting the meaning of the sentence, and the fact that it's picking "yellow" is mostly coincidental, I think, as is also illustrated by your examples.


I wonder what its favorite colour is ;-)

That's according to Python. Ask about the favorite color of Wolfram Alpha:



Right. And as a little experiment, it was surprising the results I got. It also implies that it is not hard to structure the sentences in a way that makes it harder for WolframAlpha's algorithms to get the right answer. But doing that for 180 million questions? I wonder what the percentage of success using only WolframAlpha would be on the whole data set.

Great discussion.

To clarify, there is not a precomputed DB of an enormous number of questions although this would be possible to derive and does occur on a lower level for caching performance purposes. The total count comes from permutation/probability maths based on the question construction algorithm -- when you request a question it generates one which means I can extend the pool quite easily without re-generating a monster cache table.

It is impressive how good Wolfram is at decoding logic, I'll have to have to think about the question construction but I can't make the question too confusing for a real person to solve. As someone mentioned, maybe more abstract questions would be stronger but the difficultly of course lies in generating them. I certainly think logic questions are weaker than a decently obscured/randomised image captcha, but they come with other advantages and work in text-only contexts (e.g. IM-type challenges).

Some ALMOSTs could be turned into SUCCESSes with a few postprocessing rules-of-thumb, like:

- the CAPTCHA usually wants a single word or number

- the desired word is usually the rarer or later one

Exactly. The whole discussion here points to the major flaws in using text as CAPTCHA. Maybe if questions were more metaphorical they would be harder to guess, but then there is no absolute answer. I think a combination of text, image, and logic would be hardest to break.

It's still a game of guesswork. Generally if you fail the CAPTCHA you will be offered a new one; and any good system should lock you out after a certain amount of failures.

Assume spammers are using botnets. The problem of locking them out is as hard as detecting them in the first place.

At the end of the day this is why I don't use CAPTCHA's on my site, they can be broken by a bot that is smart enough. The better option is to use something to analyse the contents of the spam to decide what it really is, and there are some tools that are really good at doing this. Heck I even found for one form that banning 'http:// (and notifying users with JavaScript if they typed it) stopped 100% of the spam I was getting.

I am not sure I understand what he is saying. All the "results" that he is talking about are the interpreted inputs for the application to process.

So for," What is seven hundred and forty four as a number? ", the interpreted input is a NumberQ function taking the main part "seven hundred and forty four as a number" and evaluating whether it is a number or not. The real result is true.

The zoologist one has already been talked about. The rest other than the 7th digit question are all false.

There are many different choices for the inputs, for example with the colour question

The 2nd colour in purple, yellow, arm, white and blue is?

There seems to be some popularity going on. The first choice as input is yellow and the second choice is blue. To further test replacing yellow with black leads to blue as the first choice. Then again even if you were to use the interpreted inputs you would have to determine the syntax for wolfram which last time I checked is not available and is basically a guess the syntax game.


If someone would care to enlighten me on how this could actually work I would greatly appreciate it, otherwise this method does not seem like it will work. Nice creativity though.

Even a 1/10 hit rate is sufficient. You get 10 questions from the website, return the results and one of them leads to your spam comment being posted. You then repeat that process thousands of times.

After testing alternate versions of the three successes (to see if those were based on luck), only two remain to be successful; Changing "The 2nd colour in purple, yellow, arm, white and blue is?" to "The 2nd colour in purple, arm, yellow, house, white and blue is?" causes the question to fail.

Good point. That gives a good indicator as to how the algorithm works. Not necessarily based on colors, but rather words in a list...? Maybe the construction of Text CAPTCHA sentences needs to be chosen carefully when thinking like an algorithm....

If someone would like to try with some more questions and can't bother to write a screen scraper for the demo page, I scraped a few questions a while back:


Thanks for doing some extraction!

Using your file I've been building up a solver for these questions in Prolog (using DCGs for parsing and simple predicates for the common sense facts and answer calculation). It's nowhere near "done", but it does get 45% of the questions now after only a few hours and 450 lines of code.

The trick to factoring someone else's generative space is to spot the symmetries and build your self a little domain-specific language for explaining those symmetries to your program.

Here's some snippets:




    ordinal(2) --> lit('2nd').

    question( tomorrow -> Answer) -->
        p(['Tomorrow is ',token(Tomorrow),'. If this is true, what is today?']),
        { tomorrow(Answer,Tomorrow) }.

    question(count_2(P) -> Answer) -->
        p(['The list ',token_list(L),' contains how many ',pred(P),'?']).
        {include(P,L,Goods), length(Goods,Answer)}.

    pred_name('something each person has more than one of',plenty).

The idea: randomly pick one word in the input and make it to be the answer. It seems from the demo page that most of the answers is already in the question. Suppose 50% of questions that have answers in them, and suppose the average length of the question is about 10 words, you have 5% chances to get it right. and with a computer, this is a quite easy one.

I'm mostly impressed that someone found a use for WolframAlpha.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact