
IBM's Watson AI trumps humans in "Jeopardy!" - Jun8
http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html?hp
======
grellas
From reading this fascinating piece, I (as one who is not technically
proficient in any of the relevant areas) would conclude that AI-based
approaches to computing can work well in the following areas: (1) you can
program something in accordance with a strict set of limited rules and achieve
amazing results, such as in computer chess, where today the advanced machines
can basically outdo humans in playing that particular game, e.g., Deep Blue;
(2) you can program something to use layered algorithms to find high degrees
of probability that a particular objectively knowable fact is correct in
response to someone inquiring about that fact, e.g. I.B.M. Watson; (3) you can
custom-program something to enable someone to determine other forms of
objectively verifiable relationships (such as mathematical comparisons or
outcomes), e.g., Wolfram Alpha; and, (4) you can (at least theoretically)
hard-wire a database to yield objective facts drawn from a vast body of custom
knowledge manually programmed into a machine, albeit at an enormous cost in
time and effort, thus gaining the benefits of mass storage, instant retrieval,
and rapid replication but without drawing on the iterative power of computing
devices and potential algorithmic solutions to the problem.

To me, all such approaches appear to throw in the towel on the question, "can
AI-based computing ever achieve the equivalent of exercising human judgment?"

For example, in law, all sorts of questions can arise to which there is no
"mechanical" answer - what is the best strategy to employ in a particular
legal fight or litigation? what, among an array of complex alternatives
involving tax, business risk, liability risk, and human factors, is the
optimum way to merge two companies? or even the simpler but still judgment-
based decision, is it best for me to set up my company in the form of an LLC
or a corporation or some other form and what domicile should I choose? or
involving personal decisions such as what techniques do I use to raise and
train my kids in terms of education, morals, setting life goals, etc.

Are there AI proponents out there who believe that AI-based computing can ever
handle such issues? To me, it seems evident that no algorithmic approach will
ever be able to rise to the level of addressing such problems but this may
just be based on my ignorance and lack of imagination.

Going back to law, for example, the article suggests that IBM may one day
capitalize upon using a Watson-type machine to help people answer bureaucratic
questions. I wonder about this if the approach is based purely on
probabilities because, no matter what the data set, no one could ever know for
sure that the answer is the correct one. At most, the AI-device could say,
"this likely is a good starting point" and, beyond that, you are on your own
to confirm whether it is accurate or not (which, of course, could make for a
tremendously helpful resource in itself, as long as it is used properly).

I would think any such method would become hopelessly confused, though, in
dealing with some knotty tax question, as for example, "determine my
unrealized built-in gain in my C corp so that I know how much tax I have to
pay on converting to S-corp status" (see here for the methodology on this,
which is mind-numbing for all but CPAs steeped in tax minutiae:
[http://www.taxalmanac.org/index.php/Treasury_Regulations,_Su...](http://www.taxalmanac.org/index.php/Treasury_Regulations,_Subchapter_A,_Sec._1.1374-3)).
I could imagine a custom-programmed approach dealing with such questions but
only one that is very specific to the problem at hand.

Don't mean to go on - I truly found this piece quite intriguing since, before
reading it, I would (out of ignorance) have laughed at anyone who would have
claimed that an AI-based machine could play "Jeopardy" in any meaningful way.

So my question to those who are knowledgeable in the HN community is this: are
there other conceptual approaches that, given sufficient time and resources
will potentially be capable of rising to the level where they can address the
higher-level (judgment-based) sorts of issues I identify above or is this
basically it?

For anyone interested, the Hollywood view of this sort of thing appeared in
the 1957 movie "Desk Set," where a "Miss Watson" administered a machine called
EMMANAC, which could give instant answers to questions expressed as normal
people would ask them. This clip highlights the view of AI-computing as
depicted in that movie:
[http://www.youtube.com/watch?v=Rdl9ynODxbk&feature=relat...](http://www.youtube.com/watch?v=Rdl9ynODxbk&feature=related).
The broad theme is that of a group of researchers who resented the idea that
their jobs were about to be eliminated by the impersonal beast, which Miss
Watson, however, endearingly referred to as "Emmy."

~~~
ahk
I think at some point the AI goalpost will move beyond "capable of human
judgement" as well :) (see Asimov's stories of R.Daneel & R.Giskard for
example).

I'm one of those who believes we'll crack AI (all aspects of it) at some
point. I believe that the algorithm that our brains (and that of most animals)
use is a very simple one that just needs tremendous parallel processing. I
don't think anyone including IBM is yet thinking at that level (they work at
too high a level for my tastes) and they do not consider enough of biology in
their work (too computer-science-y, which prevents problem solving at the
beginning).

I think it's just a matter of time before the neuroscientists have most of the
details of brain and neuron functioning and we're able to decode the algo and
replicate it on computers. And then it will be upto hardware progress to match
first animal and then human intelligence.

~~~
tocomment
If it's simple algo why haven't we come up with anything promising?

~~~
ahk
I'm hoping it's something like the crypto algos where computing the hash is
easy but getting the original message back (without already knowing the key)
is not so easy.

In our case, the "simple" algo would be something trivial to implement but
tough to actually figure out what it is in the first place.

------
Jun8
Watson will go against former best "Jeopardy!" players this Fall. However, I
think it's a bit overarching to label this as an application of a natural
question answering system, as is done in this article. "Jeopardy!"'s answer
snippets are not like normal questions that, say, a tax software would
encounter, I think.

"When one I.B.M. executive suggested taking on “Jeopardy!” he was immediately
pooh-poohed. Deep Blue was able to play chess well because the game is
perfectly logical, with fairly simple rules; it can be reduced easily to math,
which computers handle superbly. But the rules of language are much trickier.
At the time, the very best question-answering systems — some created by
software firms, some by university researchers — could sort through news
articles on their own and answer questions about the content, but they
understood only questions stated in very simple language (“What is the capital
of Russia?”); in government-run competitions, the top systems answered
correctly only about 70 percent of the time, and many were far worse.
“Jeopardy!” with its witty, punning questions, seemed beyond their
capabilities. What’s more, winning on “Jeopardy!” requires finding an answer
in a few seconds. The top question-answering machines often spent longer, even
entire minutes, doing the same thing."

Still, it's a stunning achievement. And IBM will definitely recoup all the
millions it put into this project from the PR money it has saved. Other big
tech companies, take note please.

~~~
jerf
'"Jeopardy!"'s answer snippets are not like normal questions that, say, a tax
software would encounter, I think.'

I'm not sure, but are you saying that you think it's _easier_ to answer
Jeopardy questions than tax questions? I wouldn't think so. The tighter you
constrain the domain, the better the computer will do.

~~~
timwiseman
But in some ways, Jeopardy is much more constrained than tax questions. You
know that the question will have a relatively short statement, the answer will
be in the form of a question with probably no more than 5 words being actually
relevant with the rest being used to make it into a question.

Furthermore, you know that breadth of knowledge is generally more significant
depth. Having a database of every nation's capital and a few significant
things about it is most likely more useful than being able to provide an in
depth discussion of quantum electrodynamics.

Tax questions on the other hand often have lengthy and detailed statements and
require an essay style answer. Worse, in some cases, detailed tax advice may
actually require judgement and advice. Of course, tax software takes short
cuts in this respect. It is not designed to handle complicated situations with
nuances. It is designed to handle your average consumer, and even there is
constrains the problem by being the one that asks questions and then producing
forms rather than answering ad hoc questions.

(edit: fixed grammar)

~~~
alextp
Not only that but you can probably parse a jeopardy "answer" into a series of
roughly independent clauses, and then try to predict classes that rank high in
these clauses. For example, in the "answer" "This action flick starring Roy
Scheider in a high-tech police helicopter was also briefly a TV series" you
can get it right just by looking for things that correlate highly with "action
flick", "Roy Schneider", "police helicopter", and "TV series".

~~~
alextp
But they must surely be doing something fancier than this naive-bayes-style
model, otherwise they'd have no use for a roomful of supercomputers.

~~~
zach
Well, naïve Bayesian inference is supercomputer-level when you use it on a
huge universe of data.

As Peter Norvig often points out, these kind of tasks are highly data
dependent. The supercomputers are probably more used for data access as for
raw computation. I can totally imagine Peter writing a forty-line Python app
that runs on Google's infrastructure that does about as well.

------
yanowitz
How is this possible? I was very impressed but could guess at how it worked
with the first several answers/questions listed, but this one blew my mind:

“Classic candy bar that’s a female Supreme Court justice” — “What is Baby Ruth
Ginsburg?” [Of course, Google now knows the answer :) ]

Can someone explain how it can do that? Can it solve cryptics too (way harder
than crosswords)?

Amazing...

~~~
jcl
As albertni suggests, they have a series of heuristics, some of which do word
matching and others which are specialized to common Jeopardy idioms -- such as
the before-and-after clue. Each heuristic gives a list of candidate answers
and probabilities, and Watson replies with the highest-probability answer, if
its certainty is high enough.

How does Watson "know" with high probability to apply the before-and-after
heuristic in this instance? Because the category is _explicitly_ "Before and
After", assuming they're recycling the clues from the following show:

<http://www.j-archive.com/showgame.php?game_id=3258>

(I see a wag on a discussion board suggested "Pay Day O'Connor" as an
alternate solution, which is awesome. Curiously, Watson had the same solution
as the contestant that day; Alex seemed to be expecting "Baby Ruth _Bader_
Ginsburg".)

So, no, Watson probably wouldn't do too well on cryptics, but a similar
approach with the right set of heuristics would probably work.

~~~
csmeder
"which are specialized to common Jeopardy idioms" Exactly, any new/novel idiom
would probably easily stump Watson.

~~~
zach
That's probably better than the typical contestant who simply has categorical
weak spots like sports or opera. Although not being able to grok audio or
picture clues on top of that would be a real problem.

Every season has more clever, non-traditional categories to make Jeopardy!
more playful. For example, it wouldn't be unusual to have a category like
"Monopoly Colors," where responding to a Daily Double clue of "The $1 Bill"
with "green" could be catastrophic.

This seems really fun, though. I'm very glad to see this kind of high-profile
project that has the potential to rouse the curiosity of potential computer
scientists.

------
Qz
"The only way to program a computer to do this type of mathematical reasoning
might be to do precisely what Ferrucci doesn’t want to do -- sit down and
slowly teach it about the world, one fact at a time."

I'm wondering what the resistance to this is. Each and every one of us has
been through exactly that process. Do we really think that we can create a
knowledgeable mind out of whole cloth?

It seems what we need to do is teach an artificial-mind everything we know,
slowly, once, and then it can teach all the artificial minds everything it
knows in the blink of an eye.

~~~
Jun8
I completely disagree. The classical approach you mention has been tried many
times, most famously by the Cyc Project (<http://en.wikipedia.org/wiki/Cyc>)
without great results. AFAIK, nowadays the "no-model, pure statistical"
approach is the norm

~~~
Qz
I don't think something like that (pure statistical) qualifies as a mind.
Neither would the Cyc project you linked to.

I wasn't trying to say anything qualitative about what I think the underlying
mechanism of what an artificial-mind will be, but rather that I don't think we
can just conjure up a mind that already _knows_ stuff. I think that whatever
we end up creating, it will still be something that has to be _taught_ about
the world.

------
jcl
Watson reminds me greatly of Proverb, a crossword-solving program. It works in
a similar way: a number of different heuristics come up with possible clue
solutions, then a ranking algorithm prunes them by probability. Like Watson,
some heuristics just do raw text search, while others are specialized to
common kinds of wordplay.

Proverb is able to solve about 90% of the clues in an average week of New York
Times puzzles. Of course, Proverb has an advantage over Watson: the letters of
intersecting clues are dependent, so the probabilities of different clues
reinforce each other, giving better estimates of likely letters.

<http://www.oneacross.com/proverb/>

(<http://www.oneacross.com/> has a limited version of Proverb online which can
guess at individual crossword clues -- invaluable if you get stuck on one.)

------
CWuestefeld
_Andrew Hickl, the C.E.O. of Language Computer Corporation, which makes
question-answering systems, among other things, for businesses, was recently
asked by a client to make a "contradiction engine": if you tell it a
statement, it tries to find evidence on the Web that contradicts it. "It’s
like, ‘I believe that Dallas is the most beautiful city in the United States,’
and I want to find all the evidence on the Web that contradicts that."_

I think that's the most useful application suggested.

~~~
gjm11
> I want to find all the evidence on the Web that contradicts that.

"Gordon's great insight was to design a program which allowed you to specify
in advance what decision you wished it to reach, and only then to give it all
the facts. The program's task, which it was able to accomplish with consummate
ease, was simply to construct a plausible series of logical-sounding steps to
connect the premises with the conclusion. [...] The entire project was bought
up, lock, stock and barrel, by the Pentagon." -- Douglas Adams, Dirk Gently's
Holisitic Detective Agency.

~~~
CWuestefeld
I was thinking of the scores of email I get from friends and family, of the
sort saying "OMG! The sky is falling because...". In my experience, these
exclamations are universally wrong in both research and reasoning, and I wish
there was an easier way to find those errors to say "No, because...".

------
Synthetase
As a person who played Quiz Bowl in high school, it was pretty apparent that
it was domain that would be mastered by AI in time.

Answering questions correctly was often a function of how much material you
had studied and were able to recall. I get the feeling that the computer might
have a slight advantage at pure recall but a disadvantage at associative
recall. Even so, the sheer speed of computation from the computer far outpaces
a human's ability to slam down on a buzzer.

On an unrelated note, this is beginning to more closely resemble the prophetic
computers characteristic of Asimov stories.

------
10ren
As a comparison, here's how google did on the first one (in .39 seconds):
[http://www.google.com/search?q=Toured+the+Burj+in+this+U.A.E...](http://www.google.com/search?q=Toured+the+Burj+in+this+U.A.E.+city.+They+say+it%E2%80%99s+the+tallest+tower+in+the+world%3B+looked+over+the+ledge+and+lost+my+lunch)

The first two results are from this article itself, but the next one's snippet
begins “Downtown Dubai”

Yes, Google has access to the internet; but it answers queries by consulting
its internal copy/index of it, just as Watson has an unrestricted internal
database. Google is basically statistical AI.

------
MikeCapone
The NYT article includes a video produced by IBM that shows Watson in action.
You can also see it here:

<http://www.research.ibm.com/deepqa/>

I like how they show probabilities for each possible answer that the computer
came up with. I'm only halfway through the NYT piece, but it's interesting so
far.

------
izendejas
How long before we have a tournament with Wolphram Alpha vs. Google vs. Bing
vs. Watson, etc?

------
BoppreH
The article is split into 8 pages. Eight. Pages. The worst part is that only
one every several sentences had meaningful information. And that is the New
York Times.

Did someone else had the same thoughts or am I acquiring ADHD?

~~~
Mathnerd314
There's a "Single Page" link in a box near the beginning.

~~~
BoppreH
Thanks, but that's only half the problem.

------
gokhan
Two days ago, for some reason I was thinking that we would see IBM dead in the
following couple of years. You know, SGI, Sun, IBM...

It appears that I might be clueless. Maybe they will revamp the company into
the default search engine of the internet. Maybe they will own a cloud service
for companies to automatic data mining.

~~~
benkant
IBM isn't going anywhere. For a start their consulting arms (Global
Business|Technology Services) make up the bulk of their revenue. And that
business is booming.

IBM has always been at the forefront of research. IIRC they have more patents
than anyone.

Trust me, they're not going anywhere soon. In fact, I'd wager we'll see the
end of Microsoft before we see the end of IBM.

------
DanielBMarkham
I think the coming decade will see the evolution and widespread adoption of
true question-answering machines. If we could combine that with auto-drive
cars, it could mean an incredible leap forward in magnifying the human brain.

But auto-drive is going to take quite a bit longer, I think.

------
s3graham
Awesome! The failure mode seems pretty brutal still, but it still sounds
amazing from that description all the same.

The Singularity Is Near. Aunt Edna's going to see it coming on the Tube pretty
soon.

(I think it ought to have to speech recognize Alex for the game though)

------
pavs
You can play against Watson online:
[http://www.nytimes.com/interactive/2010/06/16/magazine/watso...](http://www.nytimes.com/interactive/2010/06/16/magazine/watson-
trivia-game.html?hp)

~~~
mikexstudios
The flash game is self-contained and Watson's answers are pre-calculated. So
it isn't querying Watson in real-time.

------
ax0n
In 2006, I was working at a startup that was trying to nail this space. It's
amazing how far this sort of tech has come.

~~~
Estragon
What's the market?

~~~
ax0n
I don't know. I don't think many people do, or the market simply isn't that
big and that's why the startup folded.

------
Herring
Is it actually doing speech recognition, or is it being fed the questions as
text?

~~~
kpich
The latter

