Q:Which word from list "duckweed, commendations, civilised, receptionists, flours" contains the letter "v"?
Q: What word from "interdisciplinary, disorientating, decamps, moan" begins with "i"?
Q: Egg, pink, white, green and pub: the 1st colour is?
Q: The day of the week in dog, Wednesday, penguin, trousers, hotel or lion is?
Q: Hand, jelly or Jennifer: the person's name is?
Q: What is fifty three thousand seven hundred and one as digits?
Q: Cake, rice, snake, head and jelly: how many body parts in the list?
Q: If yesterday was Tuesday, what day is today?
Q: Which of these is a body part: lion, jelly, church, butter or nose?
Q: Which word from list "duckweed, commendations, civilised, receptionists, flours" contains the letter "v"? A: FAILURE
This one fails because it recognizes 'duck' as a word and doesn't know what the trailing characters are, so it fails to parse. Adding it to let a word be followed by extra characters makes it twice as slow, but makes it pass it, and so I didn't do that.
Q: Hand, jelly or Jennifer: the person's name is? A: jennifer
This is one of the tricky cases, because I convert to lower case when I pass to the parser. So if parsing fails and the word 'name' is in the string, then I return the first capitalized word and hope for the best.
Q: What is fifty three thousand seven hundred and one as digits? A: 53701
This one was an interesting exorcise in parsing as it is grouped ((fifty three) thousand) (seven hundred) and (one). I assume that all numbers are of the correct form, because it will parse 'one one and two' as the number 4 (1+1+2=4).
Q: If yesterday was Tuesday, what day is today? A: wednesday
This was just putting the dates in order and doing a lookup into the list.
I haven't made it recognize all of the cases. It still gets a few wrong, but usually it'll just say FAILURE when it can't solve it.
As the page says, it's just a ~300 line grammar file and a parser-generator with syntax directed translation to solve the questions.
sigh So close.
This is definitely the kind of thing that I could see doing with REBOL's parse operator, but Python doesn't (seem to) come with such a tool built in.
Granted, it wouldn't be hard for someone else to do the same as me, but I just don't want my work used to do harm to someone else.
Though they may take it wrong and try to start an arms race with your code, to which I'd say: An arms race in the CAPTCHA space is already a win for the attackers. In case of tie, spammers win.
But then it occurred to me that someone could solve one by hand, and then use that answer to solve five more captchas that I asked for, and then recurse, eventually solving thousands of a site's captchas.
Just curious, thanks.
I just got 1,000 questions from the demo page and looked how they were similar. For example, there are many questions of the form "[list of something]: the [ordinal number] [type] is?" so I put that as a rule in the grammar file. I did this for about twenty different types of questions, each of which has three or so different phrasings.
And yes, they may have 180 million questions, but in reality they have only a hundred or so modulo the specific words they pick (for the items in the list, or for the day of the week, or for the letter that the word starts with).
> I put that as a rule in the grammar file
Name the third item in the list: book, fork and spoon.
Q: What is two times 8?
Which is a number: yellow, seven, book, flag?
Aircraft fly through the?
Q: A beetle has how many legs?
It does well when it recognizes the questions, but there's still plenty of room for writing new ones. It does show that it's an arms race, as someone else noted.
There is, however, an inherit limitation to question-asking, and that is it must be understandable to a general population. So I would switch from using a parser to just looking for key words and then matching from there. (Which is, for example, how I find names in a list.)
Even if they made it so I only get one fifth the number of correct answers, a 20% success ratio would be high enough for concern. The attackers can just keep trying and trying and if 20% of the time they can register a new bot, then they win. The defender must win a much larger percent of the time.
I would probably fail both, so I guess this isn't the kind of question that would realistically appear.
Then I realized if somebody went through all this trouble to design a new app like this, they deserve to have me answer captcha's for a bot somewhere.
Q: What nonsense word do the letters G-B-R-D spell?
Q: Which three letter word starts with the letters TH and ends with the letter E?
Actually, the reason those are solved isn't exactly what you'd expect. When it fails to find an answer, it checks to see if any word is upper-case. If it is, it cuts out all non-letters (to remove the comma at the end, for example) and then returns that word. The reason it does this is because it will occasionally asks questions such as "Which word IN this sentence is in all caps." So that's why G-B-R-D gets turned into 'gbrd', and why it says 'th'.
There have been so many 'captcha alternatives' on HN recently, all of them forget that there is a very good reason why current captcha's are extremely distorted images.
This is one reason why I'm partial to reCAPTCHA: there is a lot of experience in OCR systems, and we know what the current state of the art -- and we know what kind of things foil it.
Desired Answer: >>>
Answer Given: None.
If the community is profitable enough or becomes big enough some spammer will spend 15 minutes or 50 dollars on mechanical turk and find all the answers. As others have mentioned it is an arms race in the end. Back and forth each side upping the challenge.
I'm assuming you know what parsing is, and what syntax-directed translation is. If not, the wikipedia articles (http://en.wikipedia.org/wiki/Parsing and http://en.wikipedia.org/wiki/Syntax-directed_translation) offer a reasonable explanation.
The grammar file is generally laid out as follows:
Question ::= Phrase1 | Phrase2 | Phrase3 ...
Phrase1 ::= 'if' 'the' Noun 'is' Color 'what' 'color' 'is' 'it' [and now return the word which matched Color]
Noun ::= [sequence of characters]
Color ::= 'red' | 'blue' ...
The trickiest part of it was getting it to correctly interpret numbers like 'twenty one thousand eight hundred and ninety nine'
That part is basically laid out as
Number ::= Number QuantifiedNumber | Number 'and' QuantifiedNumber | QuantifiedNumber
QuantifiedNumber ::= NumberGroup | NumberGroup Quantifier
NumberGroup ::= SingleNumber SingleNumber | SingleNumber
SingleNumber ::= 'one' | 'two' | ... 'twenty' | 'thirty' | ...
Q: How many thousands is a million?
Q: How many millions is a billion?
Q: There is a dog, cat and a goat: First animal?
Q: There is a dog, cat and a goat: How many humans?
Q: Black, White: Which is dark?
I only had so much time yesterday and so I didn't put in all the different types of questions. Maybe later I'll go back and get all the questions that it can't solve.
Please join these two "words" together (without spaces): zrtvyoav and ekozuarn
What starts with "bow" and ends with "ser"?
Here's the only one that broke:
The 6th letter in "aviator" is?