'"Jeopardy!"'s answer snippets are not like normal questions that, say, a tax so...

timwiseman · on June 16, 2010

But in some ways, Jeopardy is much more constrained than tax questions. You know that the question will have a relatively short statement, the answer will be in the form of a question with probably no more than 5 words being actually relevant with the rest being used to make it into a question.

Furthermore, you know that breadth of knowledge is generally more significant depth. Having a database of every nation's capital and a few significant things about it is most likely more useful than being able to provide an in depth discussion of quantum electrodynamics.

Tax questions on the other hand often have lengthy and detailed statements and require an essay style answer. Worse, in some cases, detailed tax advice may actually require judgement and advice. Of course, tax software takes short cuts in this respect. It is not designed to handle complicated situations with nuances. It is designed to handle your average consumer, and even there is constrains the problem by being the one that asks questions and then producing forms rather than answering ad hoc questions.

(edit: fixed grammar)

alextp · on June 16, 2010

Not only that but you can probably parse a jeopardy "answer" into a series of roughly independent clauses, and then try to predict classes that rank high in these clauses. For example, in the "answer" "This action flick starring Roy Scheider in a high-tech police helicopter was also briefly a TV series" you can get it right just by looking for things that correlate highly with "action flick", "Roy Schneider", "police helicopter", and "TV series".

alextp · on June 16, 2010

But they must surely be doing something fancier than this naive-bayes-style model, otherwise they'd have no use for a roomful of supercomputers.

zach · on June 16, 2010

Well, naïve Bayesian inference is supercomputer-level when you use it on a huge universe of data.

As Peter Norvig often points out, these kind of tasks are highly data dependent. The supercomputers are probably more used for data access as for raw computation. I can totally imagine Peter writing a forty-line Python app that runs on Google's infrastructure that does about as well.

Jun8 · on June 16, 2010

You're right, Jeopardy questions are much more contrained than standard, domain-based questions; however, as the article also points out, these constraints may be very hard for the computer to pick up. In fact, the main reason that Watson is slow compared to humans competing against it is precisely this, that it cannot effectively prune the search space in most cases. Look at how the answer to the Michael Jackson video is generated; the final answer is correct. The runner ups are also relevant but in a very weird sense, surely nothing that a human would come up with.

joe_the_user · on June 17, 2010

But is it something a human brain would come up with and filter out? We don't know because we only know what our minds do, not what our brains do.

jbarham · on June 16, 2010

Speaking as a US resident, I know I'd much prefer to memorize general trivia than the US tax code!