
TextCAPTCHA: 180 million simple logic questions - joshwa
http://textcaptcha.com/demo
======
moxiemk1
Since there are 180million of them, presumably they were generated with a
computer. Then, it seems that they probably fit into a finite number of
patterns. If these could be determined, wouldn't this be a rather easy to
crack captcha system?

Most captchas depend on the difficulty of the reverse transform applied to an
image, especially when you don't know what the transform is. Here, the forms
seem pretty regular, and the "transform" of inserting words is discrete rather
than continuous, so a bit easier to reverse.

~~~
mike-cardwell
Yeah, the 180million number is irrelevant. What's important is the number and
variety of patterns, not the number of questions.

I don't even need to program something which will understand all of the
patterns. I just need to program something that will understand _some_ of the
patterns, which just keeps fetching captchas until it eventually understands
the format, eg "What is X plus Y?"

This captcha system is considerably more simple to break than the typical
image based ones.

------
azim
I tried breaking this captcha. Here are some experimental results and
mathematics:

By applying the mathematics from the Birthday Attack
(<http://en.wikipedia.org/wiki/Birthday_attack>), If an attacker is able to
solve 15.8 million of the 180 million captchas, there will be a 50%
probability that the attacker can beat the captcha.

I tried refreshing the page 10 times, generating a total of 100 captchas. Out
of those, I observed 8 arithmetic problems which I entered into and solved
using Wolfram Alpha. That gives roughly the 15.8m/180m necessary to break the
captcha with 50% probability.

At 50% probability, again going back to the Birthday Attack mathematics, an
attacker would need roughly 16.8 thousand tries before expecting a collision
with one they could break.

This probability will increase if an attacker is able to successfully reverse-
engineer more patterns.

Edit: thinking about this more after MichaelGG's comment, I think my math is
incorrect. Either way, point still stands that Wolfram Alpha can successfully
solve 8% of the captchas and other patterns should be solvable by other means
too.

~~~
MichaelGG
Can you explain how the birthday paradox applies here?

Just looking at it simply, if you can solve 15.8 of 180, that means that for
any given test you should have an 8.77% chance of solving it (6 tests for >
50%). What am I doing wrong?

Also, it looks like some of the other questions are easy to automate. Like
"how many letters are in the word 'whatever'".

~~~
baddox
A Birthday Attack relies on the fact that even for some rare events (such as
two random people having the same birthday), when there is a large opportunity
to observe the rare event (such as 30 random people in the same room) it's
actually quite likely to observe the event.

------
amih
Some of the questions don't have one globally unique answer, for example:
which day is a part of the weekend, Sunday, Friday or Monday. Where I live
(Israel), Friday is part of the weekend, were as I bet the creator of the list
lives in the USA and as many times happens, believes the USA==World and the
"correct" answer is probably Sunday.

~~~
Einh
Stop bombing palestinian children and we'll start giving a fuck about what you
call the weekend.

~~~
bmm6o
After lurking for 80 days, this is what you choose for your first comment?

------
patio11
I would be interested to see what the completion rate for this is versus,
e.g., the Yahoo captcha. My intuition is "not that great." (You require
reading on the Internet... uh oh.)

By the way, picking one token from the captcha and returning it beats the
captcha 7% of the time, if the examples are representative. Spammer wins,
since he can generate requests by the hundreds of thousands.

~~~
dolinsky
I also noticed that certain questions aren't necessarily 'easy'.

> The 1st number from 25, eight, 6, six and 27 is?

So is the answer 25 or 6?

I've come to the realization that CAPTCHAs aren't the solution, or at least
can't be a standalone solution. Make the CAPTCHA easy enough for a human to
not be blocked (pick the cat from these 3 photos) and the bot still wins 33%
of the time. Make it hard enough that the user has to invest energy to 'solve'
the problem in front of them and you alienate users by treating them like
criminals.

~~~
pitdesi
I don't get how the answer could be 6, though I do agree with you on the
paradox of CAPTCHAs

~~~
buro9
Interpretation 1: It's a string list and pick the first element = 25.

Interpretation 2: It's a numerical list of numbers, numbers being ordered by
value have an implicit sort applied to them, pick the first element in that
sequence = 6.

#2 is a very programmer thing to do ;)

~~~
dolinsky
you get a cookie :)

------
nkohari
I understand the importance of CAPTCHAs, but I wouldn't put anything that
required a reasonable level of thought in between my users and something I
wanted them to do (for example, buy something from me). The more complex
CAPTCHAs get, the less likely users are to try to complete them.

~~~
Vivtek
It strikes me that the specific example of _buying something_ is generally
sufficient confirmation of identity even without a Captcha. Not that your
point isn't valid.

------
binarymax
Great concept, but some of the easier ones are very susceptible to an
automated solve.

For example: "What is ten + 1?" ...in bing:
[http://www.bing.com/search?setmkt=en-
US&q=What+is+ten+%2...](http://www.bing.com/search?setmkt=en-
US&q=What+is+ten+%2B+1%3F&cc=gb) ...in google:
[http://www.google.com/#sclient=psy&hl=en&q=what+is+t...](http://www.google.com/#sclient=psy&hl=en&q=what+is+ten%2B1%3F&aq=f&aqi=&aql=&oq=&gs_rfai=&pbx=1&fp=fc8a743f8bb10773)

------
mitko
Trying all the words in the captcha one by one has a big chance of "hitting"
correct answer. If it doesn't a brute-forcer can just request a new captcha
until it works.

Said that, they don't seem very spam-proof to me.

For more info about how hard CAPTCHAs need to be read Luis Von Ahn's papers:

<http://www.cs.cmu.edu/~biglou/>

~~~
mike-cardwell
I get the impression that about half of the questions are list based. And the
questions are about 10 words long. If that's the case, then using the
algorithm you mentioned, you have a 1 in 20 chance of getting the captcha
right. So yeah, it's a completely useless captcha system. That number should
be 1 in a million or higher, not 1 in 20.

~~~
mvalle
And it's probably not the first or last word. And if it precedes a ',' then
it's probably more likely to be it. There are many ways to increase the hit-
rate.

~~~
mike-cardwell
Sure. I know my calculation was very rough, but my point is, if I'm not at
least 2 orders of magnitude out, then the captcha system is very very bad.

------
blahedo
I've thought about this issue before (and proof-of-concepted a similar system,
see <http://www.blahedo.org/botblock/>), and came to similar conclusions, but
there's an important difference:

A crucial part of making this a successful anti-spam system is that it is a
moving target. _Every user of the system must be able to write their own
questions._ If that happens, the spammer's task is intractable. But if there
is a central site serving these, it will be worth the spammers' while to just
hardcode the patterns and write a little bit of logic to parse and answer
them.

Now, there's a fair bit of interesting UI design in the question of "how do I
get a non-programmer to write what is in essence a very small program". My
proof of concept used some cute Perl-isms to basically construct a mini-
language that was restricted enough that an inexperienced programmer could
"script kiddie" their way through it, and I think this is the right general
direction, but you'd need a fair amount of work to really make it accessible
to the masses.

(Other crucial points that he gets right: it must be text based; it must have
questions that hinge on natural language understanding but not be otherwise
difficult; and it must have questions that are really question templates each
of which can generate infinite numbers of question instances.)

~~~
lotharbot
I've often wanted my own text-based CAPTCHA for a video game website I run.
I'd ask things like _"What is the name of the purple weapon?"_ or _"How many
shields do you start with?"_ People who actually play the game could nail
questions like that, while bots would be up a creek.

~~~
darinpantley
What if one of your fans created a simple bot specifically designed to answer
your admittedly easy questions?

~~~
lotharbot
Then he would be a douchebag... and probably a huge moron, too.

Who creates a bot specifically to overcome the CAPTCHA on a forum for a 15
year old video game with very little traffic? We're not really a significant
target; I only have to ban about one spambot per week. There's a tremendously
low ROI from spambots on our forum, I can't imagine it'd be worth anyone's
time to even attempt to incorporate it into their CAPTCHA-breaking bot.

------
megamark16
Very cool, this is one of my favorite types of captchas, because I don't have
to sit and squint at the screen trying to figure out what the heck I'm
supposed to type. Is it an I, or a 1? Is it an S or a 5?

~~~
jerf
The problem is, it turns out computer programs feel _exactly the same way_.

------
vladev
I actually wrote something similar at <http://stopam.com>. Never been to brave
to announce it officially.

------
spc476
At one point I was getting spammed through a contact form (
<http://hhgproject.org/contact.cgi> ) so I added two forms of a text based
captcha---the first one is a single question (that anyone visiting that
particular page should know) and a hidden field (via CSS) that should _not_ be
changed. I haven't received a spam since.

------
TamDenholm
I know some people that would fail a few of these questions...

~~~
daten
I agree. The questions in the example may be easily solved by a technically
minded person, but they could also confuse a large part of your audience. I
would find it very difficult to generate questions that are appropriate no
matter what language, culture, math or literacy background my visitors have.

------
bjonathan
Easy is not enough, Captcha need to be universal also.

For non native english speakers:

"Cheese, cat, mosquito, trousers, elbow and ant: how many body parts in the
list?" or "Soup, dog, trousers, house, mosquito or pink: the colour is?"arent
as easy as "3+1" or reCaptcha . Not everybody speak english on the interweb...

~~~
mseebach
> Easy is not enough, Captcha need to be universal also.

Perhaps on the long term, but solving the captcha problem for the english-
speaking (or any language, for that matter) subset of the internet population
is still a very worthwhile undertaking.

~~~
tropin
Yes, because capchas aren't alienating enough, we should also let non english
speakers out of our non english written webs.

------
joshklein
CAPTCHA (n.) - the outsourced laziness of your development team to your
customers, in order to stunt conversion rates and signups so you don't have to
be bothered to sanitize your own user lists.

------
Xk
It seems to me that this wouldn't work. There are not so many different types
of phrasings, so it would be fairly simple to write a parser generator which
would then pass to a very basic interpreter to solve them.

For example, to solve the "Which of these is a T: W, X, Y or Z?" you would
just put in a rule like "BodyPart ::= Foot | Knee | Leg | ..." "DayOfWeek ::=
Monday | Tuesday | ..." "Color ::= Red | Blue | ..." and then have it match
against those.

Maybe the next time I have some free time I'll see if I can go and implement
it.

------
v21
If you want to produce a cheap AI for solving a particular class of problems,
turn the class of problems into a CAPTCHA...

~~~
Devilboy
I bet you can make this work for image tagging

------
dspeyer
180 million isn't all that many. Keeping the answers in a database is trivial.
Extracting the answers by trial and error is feasible. You'll probably want a
large botnet to avoid getting blocked for suspiciously high traffic. If the
servers can take an extra kqps or so, you should be done in about a week.

------
joelvh
I played around with the demo page and used WolframAlpha to answer the
questions for me.... With a little massaging, WolframAlpha would get you
pretty far in hacking it.

<http://news.ycombinator.com/item?id=1891375>

------
ComputerGuru
Obligatory XKCD link: <http://xkcd.com/810/>

"Constructive Spam"

------
bbest86
The first letter in the word "titties" is?

Beware if you have users that might be sensitive to such things.

~~~
eru
Yes. And here's a picture of some tits
(<http://www.btinternet.com/~micka.wffps/great_tit.jpeg>).

------
fertel
Seems as though there are very few patterns that repeat themselves in a
different fashion.

For example - it would be quite easy to solve which word is capitalized - or
any of the math or series questions.

------
flawawa2
"Ten, 33, thirty five, 10 and thirty six: the 5th number is?"

10? Thirty? Thirty Six?

~~~
confuzatron
Your point is that this captcha system may prevent smart-alec pedants from
commenting? Man, that's a _feature_ not a bug.

------
LordLandon
It should probably have random words in all capitals in each question. The way
it is, if a question has a word in all caps, that's the answer.

~~~
rarestblog
...and also has 1 in 1 chance of automated recognition.

------
Jencha
This may have issues with non-native speakers. You have to know language
fairly well to answer those questions.

------
9ec4c12949a4f3
Lovely, but I could spend $500 and have the matching answers in a nice
database I could resell.

