IBM's Watson AI trumps humans in "Jeopardy!"

grellas · on June 17, 2010

From reading this fascinating piece, I (as one who is not technically proficient in any of the relevant areas) would conclude that AI-based approaches to computing can work well in the following areas: (1) you can program something in accordance with a strict set of limited rules and achieve amazing results, such as in computer chess, where today the advanced machines can basically outdo humans in playing that particular game, e.g., Deep Blue; (2) you can program something to use layered algorithms to find high degrees of probability that a particular objectively knowable fact is correct in response to someone inquiring about that fact, e.g. I.B.M. Watson; (3) you can custom-program something to enable someone to determine other forms of objectively verifiable relationships (such as mathematical comparisons or outcomes), e.g., Wolfram Alpha; and, (4) you can (at least theoretically) hard-wire a database to yield objective facts drawn from a vast body of custom knowledge manually programmed into a machine, albeit at an enormous cost in time and effort, thus gaining the benefits of mass storage, instant retrieval, and rapid replication but without drawing on the iterative power of computing devices and potential algorithmic solutions to the problem.

To me, all such approaches appear to throw in the towel on the question, "can AI-based computing ever achieve the equivalent of exercising human judgment?"

For example, in law, all sorts of questions can arise to which there is no "mechanical" answer - what is the best strategy to employ in a particular legal fight or litigation? what, among an array of complex alternatives involving tax, business risk, liability risk, and human factors, is the optimum way to merge two companies? or even the simpler but still judgment-based decision, is it best for me to set up my company in the form of an LLC or a corporation or some other form and what domicile should I choose? or involving personal decisions such as what techniques do I use to raise and train my kids in terms of education, morals, setting life goals, etc.

Are there AI proponents out there who believe that AI-based computing can ever handle such issues? To me, it seems evident that no algorithmic approach will ever be able to rise to the level of addressing such problems but this may just be based on my ignorance and lack of imagination.

Going back to law, for example, the article suggests that IBM may one day capitalize upon using a Watson-type machine to help people answer bureaucratic questions. I wonder about this if the approach is based purely on probabilities because, no matter what the data set, no one could ever know for sure that the answer is the correct one. At most, the AI-device could say, "this likely is a good starting point" and, beyond that, you are on your own to confirm whether it is accurate or not (which, of course, could make for a tremendously helpful resource in itself, as long as it is used properly).

I would think any such method would become hopelessly confused, though, in dealing with some knotty tax question, as for example, "determine my unrealized built-in gain in my C corp so that I know how much tax I have to pay on converting to S-corp status" (see here for the methodology on this, which is mind-numbing for all but CPAs steeped in tax minutiae: http://www.taxalmanac.org/index.php/Treasury_Regulations,_Su...). I could imagine a custom-programmed approach dealing with such questions but only one that is very specific to the problem at hand.

Don't mean to go on - I truly found this piece quite intriguing since, before reading it, I would (out of ignorance) have laughed at anyone who would have claimed that an AI-based machine could play "Jeopardy" in any meaningful way.

So my question to those who are knowledgeable in the HN community is this: are there other conceptual approaches that, given sufficient time and resources will potentially be capable of rising to the level where they can address the higher-level (judgment-based) sorts of issues I identify above or is this basically it?

For anyone interested, the Hollywood view of this sort of thing appeared in the 1957 movie "Desk Set," where a "Miss Watson" administered a machine called EMMANAC, which could give instant answers to questions expressed as normal people would ask them. This clip highlights the view of AI-computing as depicted in that movie: http://www.youtube.com/watch?v=Rdl9ynODxbk&feature=relat.... The broad theme is that of a group of researchers who resented the idea that their jobs were about to be eliminated by the impersonal beast, which Miss Watson, however, endearingly referred to as "Emmy."

ahk · on June 17, 2010

I think at some point the AI goalpost will move beyond "capable of human judgement" as well :) (see Asimov's stories of R.Daneel & R.Giskard for example).

I'm one of those who believes we'll crack AI (all aspects of it) at some point. I believe that the algorithm that our brains (and that of most animals) use is a very simple one that just needs tremendous parallel processing. I don't think anyone including IBM is yet thinking at that level (they work at too high a level for my tastes) and they do not consider enough of biology in their work (too computer-science-y, which prevents problem solving at the beginning).

I think it's just a matter of time before the neuroscientists have most of the details of brain and neuron functioning and we're able to decode the algo and replicate it on computers. And then it will be upto hardware progress to match first animal and then human intelligence.

tocomment · on June 21, 2010

If it's simple algo why haven't we come up with anything promising?

ahk · on June 21, 2010

I'm hoping it's something like the crypto algos where computing the hash is easy but getting the original message back (without already knowing the key) is not so easy.

In our case, the "simple" algo would be something trivial to implement but tough to actually figure out what it is in the first place.

aristus · on June 17, 2010

Leibniz's folly was essentially that, the "Calculus Ratiocinator". He supposed a machine that could settle questions of law. The problem is that no one agrees on the meanings of the inputs.

It's a common fallacy to believe that computers can produce "correct" answers to questions which a sample of 1,000 (or even 2) humans don't give the same answers to. If you have nothing unambiguous or even statistically reasonable to compare your answers against, the experiment is unfalsifiable.

Human scientists (if they are such, and not just bottle-washers and button-sorters), pose questions and then experiment against reality, which is and always will be the final appeal. A computer "scientist" cannot avoid that -- they still have to simulate and experiment. Several do, actually. The best seem to be based on "genetic algorithms", ie, simulation and experiment, just faster.

So there are many things computers can do and will do. But expecting a computer to tell you whether abortion is right or wrong, or to run for president, or be a perfect oracle, is missing the point. We, as humans, are very probably incapable of building something more intelligent than ourselves, because, by definition, we wouldn't recognize it if we did. Eliezer will probably jump in at this point, and I'm curious what he'd say.

Retric · on June 17, 2010

The place where this system would be useful would be loading up a company’s email and then asking what problems they knew with a new drug X.

The software would then look at all the emails and work though the chains where they say "Let's call Sample 37b, Drug X" and another one that said "Sample 37b is causing issue Z in 7% of patents."

The software would then say: *It was known that: Drug X causes issue Z in 7% of patents. See: Email: A and B

Think of it like grep for logic.

Jun8 · on June 16, 2010

Watson will go against former best "Jeopardy!" players this Fall. However, I think it's a bit overarching to label this as an application of a natural question answering system, as is done in this article. "Jeopardy!"'s answer snippets are not like normal questions that, say, a tax software would encounter, I think.

"When one I.B.M. executive suggested taking on “Jeopardy!” he was immediately pooh-poohed. Deep Blue was able to play chess well because the game is perfectly logical, with fairly simple rules; it can be reduced easily to math, which computers handle superbly. But the rules of language are much trickier. At the time, the very best question-answering systems — some created by software firms, some by university researchers — could sort through news articles on their own and answer questions about the content, but they understood only questions stated in very simple language (“What is the capital of Russia?”); in government-run competitions, the top systems answered correctly only about 70 percent of the time, and many were far worse. “Jeopardy!” with its witty, punning questions, seemed beyond their capabilities. What’s more, winning on “Jeopardy!” requires finding an answer in a few seconds. The top question-answering machines often spent longer, even entire minutes, doing the same thing."

Still, it's a stunning achievement. And IBM will definitely recoup all the millions it put into this project from the PR money it has saved. Other big tech companies, take note please.

jerf · on June 16, 2010

'"Jeopardy!"'s answer snippets are not like normal questions that, say, a tax software would encounter, I think.'

I'm not sure, but are you saying that you think it's easier to answer Jeopardy questions than tax questions? I wouldn't think so. The tighter you constrain the domain, the better the computer will do.

timwiseman · on June 16, 2010

But in some ways, Jeopardy is much more constrained than tax questions. You know that the question will have a relatively short statement, the answer will be in the form of a question with probably no more than 5 words being actually relevant with the rest being used to make it into a question.

Furthermore, you know that breadth of knowledge is generally more significant depth. Having a database of every nation's capital and a few significant things about it is most likely more useful than being able to provide an in depth discussion of quantum electrodynamics.

Tax questions on the other hand often have lengthy and detailed statements and require an essay style answer. Worse, in some cases, detailed tax advice may actually require judgement and advice. Of course, tax software takes short cuts in this respect. It is not designed to handle complicated situations with nuances. It is designed to handle your average consumer, and even there is constrains the problem by being the one that asks questions and then producing forms rather than answering ad hoc questions.

(edit: fixed grammar)

alextp · on June 16, 2010

Not only that but you can probably parse a jeopardy "answer" into a series of roughly independent clauses, and then try to predict classes that rank high in these clauses. For example, in the "answer" "This action flick starring Roy Scheider in a high-tech police helicopter was also briefly a TV series" you can get it right just by looking for things that correlate highly with "action flick", "Roy Schneider", "police helicopter", and "TV series".

alextp · on June 16, 2010

But they must surely be doing something fancier than this naive-bayes-style model, otherwise they'd have no use for a roomful of supercomputers.

zach · on June 16, 2010

Well, naïve Bayesian inference is supercomputer-level when you use it on a huge universe of data.

As Peter Norvig often points out, these kind of tasks are highly data dependent. The supercomputers are probably more used for data access as for raw computation. I can totally imagine Peter writing a forty-line Python app that runs on Google's infrastructure that does about as well.

Jun8 · on June 16, 2010

You're right, Jeopardy questions are much more contrained than standard, domain-based questions; however, as the article also points out, these constraints may be very hard for the computer to pick up. In fact, the main reason that Watson is slow compared to humans competing against it is precisely this, that it cannot effectively prune the search space in most cases. Look at how the answer to the Michael Jackson video is generated; the final answer is correct. The runner ups are also relevant but in a very weird sense, surely nothing that a human would come up with.

joe_the_user · on June 17, 2010

But is it something a human brain would come up with and filter out? We don't know because we only know what our minds do, not what our brains do.

jbarham · on June 16, 2010

Speaking as a US resident, I know I'd much prefer to memorize general trivia than the US tax code!

fragmede · on June 16, 2010

Tax processing software aside, how might this do on the NY times crossword puzzle? On the one hand, the answers are more constrained because they have to fit, but good clues are impossibly cryptic for the average human to parse.

yanowitz · on June 16, 2010

How is this possible? I was very impressed but could guess at how it worked with the first several answers/questions listed, but this one blew my mind:

“Classic candy bar that’s a female Supreme Court justice” — “What is Baby Ruth Ginsburg?” [Of course, Google now knows the answer :) ]

Can someone explain how it can do that? Can it solve cryptics too (way harder than crosswords)?

Amazing...

jcl · on June 16, 2010

As albertni suggests, they have a series of heuristics, some of which do word matching and others which are specialized to common Jeopardy idioms -- such as the before-and-after clue. Each heuristic gives a list of candidate answers and probabilities, and Watson replies with the highest-probability answer, if its certainty is high enough.

How does Watson "know" with high probability to apply the before-and-after heuristic in this instance? Because the category is explicitly "Before and After", assuming they're recycling the clues from the following show:

http://www.j-archive.com/showgame.php?game_id=3258

(I see a wag on a discussion board suggested "Pay Day O'Connor" as an alternate solution, which is awesome. Curiously, Watson had the same solution as the contestant that day; Alex seemed to be expecting "Baby Ruth Bader Ginsburg".)

So, no, Watson probably wouldn't do too well on cryptics, but a similar approach with the right set of heuristics would probably work.

csmeder · on June 16, 2010

"which are specialized to common Jeopardy idioms" Exactly, any new/novel idiom would probably easily stump Watson.

zach · on June 17, 2010

That's probably better than the typical contestant who simply has categorical weak spots like sports or opera. Although not being able to grok audio or picture clues on top of that would be a real problem.

Every season has more clever, non-traditional categories to make Jeopardy! more playful. For example, it wouldn't be unusual to have a category like "Monopoly Colors," where responding to a Daily Double clue of "The $1 Bill" with "green" could be catastrophic.

This seems really fun, though. I'm very glad to see this kind of high-profile project that has the potential to rouse the curiosity of potential computer scientists.

albertni · on June 16, 2010

It said in the article that the machine analyzes the category. My speculation is that if it correctly deduces that the category is one of those "answer-hybrid" ones, then it can try to break down the question into two parts, answer each, and cross-reference to find an overlap. In this case it seems pretty reasonable that such a sophisticated machine could break the problem down to "classic candy bar" and "female Supreme Court justice", combined with the knowledge that they share a word this would actually become "straightforward".

Qz · on June 16, 2010

"The only way to program a computer to do this type of mathematical reasoning might be to do precisely what Ferrucci doesn’t want to do -- sit down and slowly teach it about the world, one fact at a time."

I'm wondering what the resistance to this is. Each and every one of us has been through exactly that process. Do we really think that we can create a knowledgeable mind out of whole cloth?

It seems what we need to do is teach an artificial-mind everything we know, slowly, once, and then it can teach all the artificial minds everything it knows in the blink of an eye.

Jun8 · on June 16, 2010

I completely disagree. The classical approach you mention has been tried many times, most famously by the Cyc Project (http://en.wikipedia.org/wiki/Cyc) without great results. AFAIK, nowadays the "no-model, pure statistical" approach is the norm

Qz · on June 17, 2010

I don't think something like that (pure statistical) qualifies as a mind. Neither would the Cyc project you linked to.

I wasn't trying to say anything qualitative about what I think the underlying mechanism of what an artificial-mind will be, but rather that I don't think we can just conjure up a mind that already knows stuff. I think that whatever we end up creating, it will still be something that has to be taught about the world.

Simucal · on June 16, 2010

Each and everyone of us has been through this but we as humans are already "Strong" intelligent. We possess the ability to learn and form connections which is what makes it effective.

The resistance to the approach you mentioned comes when you you proceed to build up a giant scaffolding of predicate logic without there being any true "intelligence". This was tried in the 80's with the promise of Strong AI around the corner. We all know how that turned out.

Admittedly, these expert logic systems can make what sometimes seem like surprisingly intelligent inferences based on the rules you give it. However, they only know the facts you give them and they don't seem to possess the ability to "learn". It turns out intelligence isn't just a summation of facts, rules and hierarchies.

I personally think that our best chance of creating a Strong AI entity is through whole brain emulation. If we can mimic the neurons, synaptic connections, chemical interactions and other processes of the mind to a certain level of accuracy I think we could achieve some level of AI. Kurzweil is always talking about brain-scan resolution and how it is advancing at an exponential rate (as is all technological progress).

So, if we were one day able to scan a persons brain and emulate it at a reasonable speed in a detailed enough model I think we could have a reasonable "copy" of that persons persona. Alternatively, we may just model the processes without directly taking a scan of a human first. In this way, we may truly "teach" the AI as you suggest. It would start out like an infant with no ability to communicate coherently. We would provide it with stimuli and it would advance as a human does (although perhaps not at the same rate).

tkahnoski · on June 16, 2010

There's a little company over in the UK called True Knowledge http://www.trueknowledge.com/ which is taking this exact approach to Question Answering.

elblanco · on June 16, 2010

http://en.wikipedia.org/wiki/Cyc

jcl · on June 16, 2010

Watson reminds me greatly of Proverb, a crossword-solving program. It works in a similar way: a number of different heuristics come up with possible clue solutions, then a ranking algorithm prunes them by probability. Like Watson, some heuristics just do raw text search, while others are specialized to common kinds of wordplay.

Proverb is able to solve about 90% of the clues in an average week of New York Times puzzles. Of course, Proverb has an advantage over Watson: the letters of intersecting clues are dependent, so the probabilities of different clues reinforce each other, giving better estimates of likely letters.

http://www.oneacross.com/proverb/

(http://www.oneacross.com/ has a limited version of Proverb online which can guess at individual crossword clues -- invaluable if you get stuck on one.)

CWuestefeld · on June 16, 2010

Andrew Hickl, the C.E.O. of Language Computer Corporation, which makes question-answering systems, among other things, for businesses, was recently asked by a client to make a "contradiction engine": if you tell it a statement, it tries to find evidence on the Web that contradicts it. "It’s like, ‘I believe that Dallas is the most beautiful city in the United States,’ and I want to find all the evidence on the Web that contradicts that."

I think that's the most useful application suggested.

gjm11 · on June 17, 2010

> I want to find all the evidence on the Web that contradicts that.

"Gordon's great insight was to design a program which allowed you to specify in advance what decision you wished it to reach, and only then to give it all the facts. The program's task, which it was able to accomplish with consummate ease, was simply to construct a plausible series of logical-sounding steps to connect the premises with the conclusion. [...] The entire project was bought up, lock, stock and barrel, by the Pentagon." -- Douglas Adams, Dirk Gently's Holisitic Detective Agency.

CWuestefeld · on June 17, 2010

I was thinking of the scores of email I get from friends and family, of the sort saying "OMG! The sky is falling because...". In my experience, these exclamations are universally wrong in both research and reasoning, and I wish there was an easier way to find those errors to say "No, because...".

heysig · on June 17, 2010

Andy Hickl is a snake oil salesman. I wouldn't believe him if he said the sun was hot. Let's hope his "client" does due diligence.

Synthetase · on June 16, 2010

As a person who played Quiz Bowl in high school, it was pretty apparent that it was domain that would be mastered by AI in time.

Answering questions correctly was often a function of how much material you had studied and were able to recall. I get the feeling that the computer might have a slight advantage at pure recall but a disadvantage at associative recall. Even so, the sheer speed of computation from the computer far outpaces a human's ability to slam down on a buzzer.

On an unrelated note, this is beginning to more closely resemble the prophetic computers characteristic of Asimov stories.

10ren · on June 17, 2010

As a comparison, here's how google did on the first one (in .39 seconds): http://www.google.com/search?q=Toured+the+Burj+in+this+U.A.E...

The first two results are from this article itself, but the next one's snippet begins “Downtown Dubai”

Yes, Google has access to the internet; but it answers queries by consulting its internal copy/index of it, just as Watson has an unrestricted internal database. Google is basically statistical AI.

MikeCapone · on June 16, 2010

The NYT article includes a video produced by IBM that shows Watson in action. You can also see it here:

http://www.research.ibm.com/deepqa/

I like how they show probabilities for each possible answer that the computer came up with. I'm only halfway through the NYT piece, but it's interesting so far.

izendejas · on June 16, 2010

How long before we have a tournament with Wolphram Alpha vs. Google vs. Bing vs. Watson, etc?

BoppreH · on June 16, 2010

The article is split into 8 pages. Eight. Pages. The worst part is that only one every several sentences had meaningful information. And that is the New York Times.

Did someone else had the same thoughts or am I acquiring ADHD?

pavs · on June 16, 2010

If you use Safari with "reader" function.

If you use any other browser in the world. Go to print[1] view and read from there or use Readability[2] on print view.

1. http://www.nytimes.com/2010/06/20/magazine/20Computer-t.html...

2. http://lab.arc90.com/experiments/readability/

Mathnerd314 · on June 16, 2010

There's a "Single Page" link in a box near the beginning.

BoppreH · on June 17, 2010

Thanks, but that's only half the problem.

GavinB · on June 16, 2010

If you're on Firefox, Auto-pager will make it all better.

https://addons.mozilla.org/en-US/firefox/addon/4925/

gokhan · on June 16, 2010

Two days ago, for some reason I was thinking that we would see IBM dead in the following couple of years. You know, SGI, Sun, IBM...

It appears that I might be clueless. Maybe they will revamp the company into the default search engine of the internet. Maybe they will own a cloud service for companies to automatic data mining.

benkant · on June 16, 2010

IBM isn't going anywhere. For a start their consulting arms (Global Business|Technology Services) make up the bulk of their revenue. And that business is booming.

IBM has always been at the forefront of research. IIRC they have more patents than anyone.

Trust me, they're not going anywhere soon. In fact, I'd wager we'll see the end of Microsoft before we see the end of IBM.

dasht · on June 16, 2010

I am perplexed why the comment above ("revamp the company into the default search engine") got down voted.

I saw this research as evidence that IBM is quite interested in search. I'm not so sure they want to go head to head with Google or Bing or the others but if this research isn't about search then I don't know what would be.

DanielBMarkham · on June 16, 2010

I think the coming decade will see the evolution and widespread adoption of true question-answering machines. If we could combine that with auto-drive cars, it could mean an incredible leap forward in magnifying the human brain.

But auto-drive is going to take quite a bit longer, I think.

s3graham · on June 17, 2010

Awesome! The failure mode seems pretty brutal still, but it still sounds amazing from that description all the same.

The Singularity Is Near. Aunt Edna's going to see it coming on the Tube pretty soon.

(I think it ought to have to speech recognize Alex for the game though)

pavs · on June 16, 2010

You can play against Watson online: http://www.nytimes.com/interactive/2010/06/16/magazine/watso...

mikexstudios · on June 16, 2010

The flash game is self-contained and Watson's answers are pre-calculated. So it isn't querying Watson in real-time.

ax0n · on June 16, 2010

In 2006, I was working at a startup that was trying to nail this space. It's amazing how far this sort of tech has come.

Estragon · on June 16, 2010

What's the market?

ax0n · on June 16, 2010

I don't know. I don't think many people do, or the market simply isn't that big and that's why the startup folded.

Herring · on June 17, 2010

Is it actually doing speech recognition, or is it being fed the questions as text?

kpich · on June 17, 2010

The latter