

Computer program to take on 'Jeopardy' - gabrielroth
http://www.nytimes.com/2009/04/27/technology/27jeopardy.html?hp&pagewanted=all

======
dkokelley
Very interesting application, but I think they'll have a hard time attempting
to approximate the amount of information a human competitor would have
(assuming they want to limit the machine's knowledge to better represent human
memory).

I would rather have them give it access to Google's entire cache, since the
real feat is processing and understanding the questions and then giving an
appropriate answer. In theory, if the machine has 100% question recognition
and can research an answer as fast as Google serves its pages, it should wipe
the floor with the human contestants.

I do hope they find a way to keep the contest competitive and interesting.

~~~
abossy
The problem that Google solves and the problem that Watson solves are
completely different. Sure, there is some overlap, because you might be able
to type a Jeopardy question (answer?) into Google, and, _as a human_ , quickly
and easily find the correct answer. This is a side effect of very well-tuned
retrieval algorithms that search for just the right words in your query, take
into account user click-throughs for similar queries, and rank information
based on billions of documents and eleven years of experience.

The Jeopardy problem is concerned natural language processing and knowledge
representation as opposed to a tumult of data. I would wager that Watson's
internal database is structured more like a single ontology than a highly
distributed grid. The challenge here is mapping key parts of the question to
this ontology and returning a single, highly precise result.

------
dmix
> The way to deal with such problems, Dr. Ferrucci said, is to improve the
> program’s ability to understand the way “Jeopardy!” clues are offered.

After that statement I agree with Norvig's comment. This is pretty much a toy
problem for IBM's language processing software - being customized for a
specific application - although very elaborate.

------
ryanwaggoner
I was prompted to login at NYT, but not if I went there from Google search
results:

[http://www.google.com/search?q=Computer+Program+to+Take+On+%...](http://www.google.com/search?q=Computer+Program+to+Take+On+%E2%80%98Jeopardy!%E2%80%99&);

~~~
nostrademons
That's because of Google's anti-cloaking policy. Webmasters are not allowed to
show a different screen to visitors than they show to the GoogleBot; they get
banned from Google's index if they do. So several sites will waive
registration requirements for visitors coming from Google, because otherwise
they'd never show up in search results.

There's always BugMeNot, which is probably an easier and more long-term
solution.

~~~
RK
_That's because of Google's anti-cloaking policy. Webmasters are not allowed
to show a different screen to visitors than they show to the GoogleBot; they
get banned from Google's index if they do._

Unless of course the site is a scholarly journal. Google lets them go wild
with cloaking.

~~~
nostrademons
Scholarly journal with a useful abstract. There are rules even for them.

------
gojomo
I wonder how they'll treat the race to ring-in, which in the human game begins
just after Trebek finishes reading the question. (Apparently there's an in-
studio light indicating to contestants when ringing-in is possible.)

The program might want to make its ring-in decision based on a snap decision
of whether the question is likely to be amenable to its analysis, even if the
exact answer isn't yet know -- human contestants do this, and then take a
second or two to cogitate before giving their actual answer.

~~~
diego
The computer has the opposite problem than humans have. It has all the
information available instantly but it can't understand the question. Parsing
a question can be very quick, the problem is being sure that it was understood
correctly. If it was, the answer will be known instantly and the computer
should ring. If it wasn't, it shouldn't.

My guess is that the computer will ring either when it knows the answer beyond
a certain confidence threshold (faster than a human being can hit a button) or
not at all.

~~~
gojomo
I suggest there may be ways to predict "this is the kind of question I can
usually parse and answer" long before a confident parse-and-settled-answer
could complete. Thus it could make sense to ring in before any potential
answers have been formulated or ranked.

~~~
abossy
I agree. As an example, if it is using traditional natural language parsing
techniques, then it should know whether there are less than ten syntax trees
or less than one thousand syntax trees by ring-in time. In this case, the
former can be determined with a much higher degree of confidence. Even short
phrases can have a multitude of possibilities, the canonical example being,
'Time flies like an arrow.' Of course, I have no idea how they're actually
approaching this problem, but the point is, heuristics can be used in the
process of question-answering to indicate enough of a confidence level for
ring-in.

