Hacker News new | past | comments | ask | show | jobs | submit login
Watson crushes the competition in second round of 'Jeopardy' (yahoo.com)
88 points by mjfern on Feb 16, 2011 | hide | past | favorite | 122 comments



David Ferrucci, the manager of the Watson project at IBM, on why he thinks Watson got the Final Jeopardy question wrong:

"First, the category names on Jeopardy! are tricky. The answers often do not exactly fit the category. Watson, in his training phase, learned that categories only weakly suggest the kind of answer that is expected, and, therefore, the machine downgrades their significance. The way the language was parsed provided an advantage for the humans and a disadvantage for Watson, as well. “What US city” wasn’t in the question. If it had been, Watson would have given US cities much more weight as it searched for the answer. Adding to the confusion for Watson, there are cities named Toronto in the United States and the Toronto in Canada has an American League baseball team. It probably picked up those facts from the written material it has digested. Also, the machine didn’t find much evidence to connect either city’s airport to World War II. (Chicago was a very close second on Watson’s list of possible answers.) So this is just one of those situations that’s a snap for a reasonably knowledgeable human but a true brain teaser for the machine."

http://asmarterplanet.com/blog/2011/02/watson-on-jeopardy-da...


Lame excuses! Watson is impressive, but I'm disappointed by the lack of any hint of comprehension behind its answers. When it's wrong, it's often nonsensically out-to-lunch, and its 2nd/3rd best answers are also often batty.

If it's just trained-up on statistical correlations between trigger phrases and likely answers in the constrained Jeopardy domain, then 90 32-core/512GB RAM servers seem like overkill.


Watson also generates confidence estimates and minimum confidence bars for questions. It may sometimes have batty "answers" but usually it knows they are batty. The rate of incorrect answers that Watson has a high confidence in is fairly low.

What's remarkable and important to not take lightly is the result that it's possible to generate answers to often vague and indirect clues without understanding. That likely means that it will be possible to build useful systems for automating research and the synthesis of large amounts of data without needing to build artificial human-level intelligence.


...presuming that human-level intelligence entails any sort of understanding that is fundamentally deeper than what Watson is doing.


I think that's a fair bet. The new wave of "probablistic everywhere" NLP models, though even the very simplest strictly dominate older grammatical methods, are not often capable of taking advantage of a lot of the structure of language and topic that humans are wont to do. It's a cutting-edge accomplishment when NLP algorithms learn prediction of long-range word pairs such as how you almost certainly will see "law" or "marriage" somewhere in a sentence containing the word "annulled" even if that local area of the sentence doesn't seem to call for it. Humans on the other hand are more likely to forget that it's possible to annul pretty much anything else.

I don't own a TV and plan on watching the Jeopardy match later online, so I'm just going to guess about Watson's performance. I think that humans abuse discovered patterns and structure in language and meaning to search through possible interpretations very quickly. Watson on the other hand uses far less structure and a room full of 200 cores to search through everything is knows much less efficiently. I feel like Watson's "strange" answers probably aren't nearly so strange when you realize it's simply being more fair to any possible answer than a human would.

What's scart is this sort of thing---a willingness to consider out of context answers---sounds pretty similar to the kind of behaviors we humans praise as creative!


I think that humans abuse discovered patterns and structure in language and meaning to search through possible interpretations very quickly.

Right, but does that structure really represent a "deeper" understanding or just vast and meticulous optimizations of statistical algorithms similar to Watson's? Or is there a difference?

We feel like we know how we think, but we can't actually explain it in enough detail to reproduce. Humans have a bad history of rationalization and tunnel vision. And now we discover that all the "wrong" ways to think deeply are actually the right ways to make a working AI.

If the AI can fool us into believing that it "understands" then maybe we can fool ourselves in the same way.


I don't honestly feel like we know how we think at all. I do think that statistics is a pretty good bet for the "math of learning" in that it's a sensible way to track how information flows through a model. Furthermore, the combinatorial problems involved need to be tackled just the same by humans so we can maybe try to say that we're studying similar phenomena as the workings of the brain.

Of course, the implementations we build will always be vastly different from their appearance in the brain since the architectures are so extraordinarily different!


The confidence estimations are most impressive to me, and of course absolutely crucial in a game where you have to risk points to make points.

But the fact that bulk correlation mining can answer even some 'vague and indirect' questions isn't that remarkable. Jeopardy clues are a very constrained domain: short clues in English with some distinctive idioms, and short answers that are drawn from some well-defined and constantly recurring classes.

With true natural language understanding, offline searchable copies of Wikipedia and Wiktionary – 64GB, tops? – could be used to answer almost every question. Instead Watson uses 15TB of RAM and 2880 cores.


If you forced a person to give four answers to a question and rank them in probability of being correct, then at least 3 and possibly 4 of them will look 'batty'


Not at all! 'Batty' means even a person who doesn't know the right answer can tell they're wrong. Even very bad people at Jeopardy¹ usually draw candidate answers from a more plausible set than is shown in Watson's logic/errors.

The best case I can make for Watson is that perhaps the alternatives shown are each actually the top option based on totally different understandings of the question. So in fact many other plausible answers are folded up just behind its right answer. The shown alternatives are meant to be: if the question means something else entirely.

¹ eg Wolf Blitzer: http://www.youtube.com/watch?v=DVC28oemocA


For all the talk of the difficulties of playing Jeopardy! due to the "nuances of natural language" and "puns and double meanings in the clues", that did not really seem to be a factor in the second round -- most of the questions were quite plainly worded with answers easily discoverable just by searching. Accordingly, Watson performed dramatically better today than yesterday, when a larger portion of the questions did have nuance and plays-on-words in the phrasing. Note too how spectacularly badly Watson performed on the Final Jeopardy! question, where nuance _did_ play a much bigger role.

So today, we learned that machines can push buttons faster than people, and search is a great way to find answers for trivia questions. I doubt the former is a surprise to anybody alive in the past 50 years; the latter shouldn't surprise anybody who's ever used Google.


This. A thousand times this. Watson absolutely CRUSHED the human players on pretty much every question that was basic facts. I know Watson can probably generate answers faster than humans on simple search stuff, but it seemed so bad at some points that I wondered: is Watson not wired in with some sort of delay that mimics the delay that humans have between deciding to buzz in and actually buzzing in? A lack of such a system would seem to skew the results somewhat.


I was at a watching party with a couple of IBMers who worked on Watson, and one thing they said is that it's not a question of speed, but of timing. Players time their pressing to an estimate of when Alex will finish the question, and Brad Rutter in particular has been clocked at under 2ms with shocking regularity. The advantages Watson has are consistency and the emotional perturbations in its opponents. You could see them getting frustrated, and that likely only served to harm their ability to hit that window between the end of the question and Watson's button press.


You're right: consistently being 6x faster on the buzzer than the common case for your opponent is going to let you destroy them. Their only hope is that you can't come up with a response before Alex finishes reading the question.

I arrived at the 6x approximation by googling around for avg. ethernet latencies. I'm consistently seeing numbers of .3 - .35 ms for an ethernet ping/pong. I think it's fair to assume that with the money IBM has invested in this, Watson is on at least ethernet quality connections.


When a question appears on the screen, Trebek reads it. A human decides when he's 'done' reading it and pushes a button, which makes a light appear. Contestants aren't allowed to buzz in until they see the light, and are penalized if they buzz in too quickly. Watson is also notified of this metaphorical gunshot to start the race, and won't try to start buzzing in before that.


My point is that a human nervous system is MUCH slower than Watson's equivalent. It is probable (don't know, but it seems likely) that it is more variable, as well. As such, under the current rules, Watson has an overwhelming advantage. If all the contestants know the answer before the question is "read", then Watson will consistently beat the humans to the buzzer. The only hope for the humans is that Watson hasn't decided an answer by the time the question is finished being "read".

Without some way to account for the fact that the human nervous system CANNOT beat Watson to the buzzer on any sort of consistent basis, the game is far less compelling than it should be.



He seems to be arguing that you shouldn't handicap the computer by removing one of its advantages, but they've already handicapped the humans by removing lots of their advantages: at least in the parts I've seen, there have been no audio or visual clues, and it's my understanding that Watson gets the text of the question over the wire, not even having to OCR the same text that the humans get to read while waiting for Alex to finish reading the question.

I'm an IBMer and I think Watson has been extremely impressive, but as a Jeopardy fan it gets tiresome to see Watson win the race to the buzzer this often.

Finally, I'm hoping that tonight features more of the wordplay-heavy clues that I hear were present on the first night (why oh why was that on Valentine's Day???), because that is the element that excited me most when I heard about the challenge.


http://lesswrong.com/lw/im/hindsight_devalues_science/

That essay focuses on social science, but I think it's still relevant here. Experts in the area did not think they could do this, and even the people involved weren't sure. It's easy to dismiss this as "yeah, it's just a big search engine" once you already know it's been done. Besides not accurately characterizing the approach Watson takes, that sentiment misses the fact that this was an open question.

Paradoxically, people would probably be more impressed if Watson did worse and the game was more competitive. It's like watching an NBA team play a high school team. The NBA team is so good that it looks easy despite the fact that they're that good because of decades of practice.

(Disclaimer: I work at IBM Research, and have associated biases.)


So today, we learned that machines can push buttons faster than people

I'm not gonna lie... it was pretty entertaining watching Jennings squirm every time Watson beat him to the buzzer.

Also, there wasn't any mention of Watson having adaptive artificial intelligence but I would guess it's safe to say IBM was smart enough to include something like that. That in itself would be crazy hard to implement given the magnitude of what it's already doing... but not impossible. Maybe there are a few corrective algorithms in there somewhere.


It doesn't do any training during the game, except that it uses the correct answers already seen in a category to give weight to a specific interpretation of the category title.


So is there any information on how they actually implemented watson? My understanding is it's a bayesian machine learning system, but I still don't know how it parses answers, or really does its magic.

Also, if there is anyone who thinks silicon valley has the smartest people around, this type of stuff should change your mind. Facebook is short trousers compared to this. and it's just a tech demo.


Some cool stuff here: http://www-943.ibm.com/innovation/us/watson/

The real challenge behind Watson is the natural language parsing. Instead of abstracting information away from their sources(like a graph), sources seem to have been left intact in sentences in Watson's memory. Watson would read through this information in a way alike to how it interprets a question, and it would try to create links and possible answers based on connections in sentences from many sources(this gives thought on why pun questions are difficult for Watson). I can't speak on behalf of the mathematical implementation of the answer choices, but this is the high level way that Watson finds answers. Those videos talk about the cool stuff behind the algorithmic challenges of Watson.


You aren't joking. I took a few minutes and wrote up a bayesian engine in mathematica. I've got a pretty good start on that already, and as the IBM stuff notes it's embarrassingly parallel. It seems to me the entire problem is parsing. If you can parse well and feed a well formed input to your data layer (and you've fed it enough data) you're golden.

So who wants to build a real Q/A site based on this? Call it hal-18000.


> So who wants to build a real Q/A site based on this? Call it hal-18000.

You'd have to learn it to deal with thick accents like this one: http://www.youtube.com/watch?v=5FFRoYhTJQQ . Honestly, I don't know if that's possible, no matter how much training you'd put into the machine.


Natural Language Processing != Voice Recognition.


> Natural Language Processing != Voice Recognition

This is what I don't get, why should be "language processing" tied to written text? Part of the answer I know, because it's easier for computers to parse, but other than that it doesn't make sense.


Speech recognition is speech recognition. Different problem entirely.


There's a high-level overview in this paper from AI Magazine, Fall 2010: http://www.stanford.edu/class/cs124/AIMagzine-DeepQA.pdf


Also, if there is anyone who thinks silicon valley has the smartest people around, this type of stuff should change your mind.

Watson is an impressive achievement, but there are quite a few companies in Silicon Valley whose engineers could pull this off. It's more a matter of how much money management feels like throwing at it. It's great publicity for IBM, which has to put in a lot more effort than most Silicon Valley companies in order to look cool, but can afford it.


Entirely true. But the point I was trying to make is that the current batch of dotcoms isn't even in the same league as the bigboys. IBM, HP, SGI(?), part of oracle, google, parts of oracle.

There's driven and then there's Smart.


Also, does anyone else really love that architecture at the Watson research center? That just looks like a place I'd want to work. Lots of wood, stone and glass. Love it. Wired had some details on the building and the cafeteria is right out of mad men. So much better than "open plan" (We're too cheap to give code monkeys space).

Here are the pictures: http://www.wired.com/epicenter/2011/02/watson-jeopardy/?pid=...


As you approach it, it looks like an airport. (Dulles, specifically.) I work at the Hawthorne location, so I don't see the imposing architecture everyday. But, we do have a similar board of IBM Fellows, and I often pause at it on my out at the end of the day. It is humbling and inspiring.


Interesting match.

Seems like Watson was able to ring in (clicker) much quicker than Ken or Brad. Any unfair advantage?


For anyone who understands the true dynamics of Jeopardy!, Watson simply isn't impressive as a Jeopardy! contestant as it's being made out to be. I'm not minimizing the awesomeness of Watson's language processing, because it's great, but Watson's performance on Jeopardy! is only as impressive as a hypothetical competition where Watson and two humans fill out a worksheet and compare who got the most answers right.

During tournaments of champions, Jeopardy! is not about how many correct responses you can come up with, as all competitors will know the vast majority of them. It's all about timing. I don't know the exact statistics, but it seemed like Watson knew about 75% of the correct responses. I strongly suspect the two human contestants knew a greater percentage. On day 2, it just came down to timing. Watson was only beaten to the buzzer three times when it knew the correct response.


Watson correctly knew 84% of the answers (26 of 31). It actually answered 74% (23 of 31).

It didn’t know the correct response or was not confident enough to answer 16% of the time (5 of 31). Nobody at all could give an answer to 6% of all questions (2 of 31).

If the two questions we know nobody could answer are excluded, Watson knew the right answer 90% of the time (26 of 29).

Do individual Jeopardy contestants correctly know more than 84% of the answers? Two questions were left unanswered correctly by everyone, 93% (29 of 31) is already the upper bound for the human performance in this round, that’s not very far from 84%.

Looking at the six rounds of Jeopardy the J! Archive has of games with Ken Jennings and Brad Rutter [0] we see that the average upper bound in those six rounds was 87% or 26 questions (of 30). The minimum was 23 and the maximum 28 (of 30). That’s, again, very much an upper bound.

Looking at all this, I’m pretty confident that Watson would be doing well even if it had human reaction times. (This is how I would change the game: Within 200ms or so – whatever the human reaction time on a Jeopardy buzzer is — after the buzzers are open, the player to answer is randomly selected from all who managed to buzz in. After those 200ms everything stays the same. Players are also not punished for buzzing in too early. This is a minimal change to the game, not a completely different game which I think is important.)

[0] http://www.j-archive.com/showplayer.php?player_id=7206


Maybe simpler version of your proposed rule change: Buzzing in before the buzzers are activated is just treated as buzzing in at the exact moment the buzzers are activated. (And, as you say, break ties randomly.)


With humans, you don't know how many correct responses they knew. You only know how many triple stumpers there were and how many incorrect responses a human gave. When two or three human champions are playing, it's unlikely for one of them to buzz in first with the consistency that Watson was able to. Hypothetically, if two humans A and B both knew the same 95% of correct responses, you would probably see something like A buzzing in 30% of the time and B 70%. You couldn't possibly determine how many correct responses either A or B knew.


I didn’t try to find that out, it’s, as you say, impossible to find out. Triple stumpers set an upper boundary for the humans, the theoretical maximum. That’s what I calculated. About three triple stumpers per round seem to be the norm, even among the best, that puts the upper boundary – the theoretical maximum – at 90%. The humans might still be worse than 90% but they are definitely not better (when there are three triple stumpers).


Isn't the real problem that Jeopardy just isn't hard enough?

And making it harder (more obscure trivia) would only help Watson.


It is certainly unfair and has pretty much ruined the competition for me. Watson can instantly detect when he's allowed to ring in. This is no different than a car "racing" a human. In theory, a human contestant can try to time his trigger to coincide with the end of Trebeck speaking the question, and catch the light just as the timing person releases the question, but in practice this is fraught with peril--if you're off you can't ring in for several tenths of a second--and requires much more of a human's energy, which he can no longer devote to figuring out the question. You could see Jennings attempt this on the first day, with mixed results. These episodes are commercials for IBM and it's clear who is supposed to win.


Not certain what's unfair about it - if we accept that the human contestants would have scored comparably to Watson (if they had the opportunity), then the competition really does boil down to who has the fastest thumb. I can't imagine it would be any more entertaining to watch IBM throw the match by intentionally crippling their player.


Instead, the game has been crippled. Watson has to press the button, naturally, but they're not going to require Watson to write out the answer with a pen, of course. And forget about audio clues, much less a picture-based Final Jeopardy clue.

I mean, if they had changed the ring-in system to work in a manner befitting a man-versus-machine match, you're telling me that wouldn't be entertaining? I would be fascinated by that match. This one was a letdown.


Watson would likely lose if this was instead of jeopardy, a simple trivia test where contestants were scored for the most correct answers. Thinking about that really calls the contest into question for me. We let the machine off the hook for things people are good at but it is not (understanding speech, reading and vision) but we make no allowance for a machine's ability to press a button faster than humanly possible. If you instead phrased the contest as "machine presses button faster than humans" it doesn't seem impressive.


Your implicit assertion that natural language processing is something "a machine is good at" really calls the rest of your post into question for me. In particular, your demand that the researchers solve half-a-dozen Hard Problems instead of only one seems mildly mean-spirited.


It's not unfair exactly, but it certainly ruins the impressiveness of Watson's ability to come up with the right answers. I highly doubt Watson's accuracy is even close to Brad or Ken's, but the buzzer seals the deal.

It's less impressive, just like a computer sorting 1000 integers faster than a human is less impressive.

edit: added "not" after the first word


From the descriptions I've read, Watson only buzzed in after it believed it had a solid answer. If it was significantly less accurate than its competitors, it wouldn't have spent the entire match buzzing in before them.

I'm afraid I don't really understand the decision to trivialize the fact that we now have a computer that can answer general-knowledge natural-language queries quickly and about as accurately as a clever person. That's a Big Deal.


For every single clue, there is an overlay showing Watson's top 3 answers, and it shows whether Watson was confident enough to buzz in. In day 2, I counted only 3 times when Watson had a confident response but was beaten to the buzzer.

None of this really minimizes IBM's accomplishment, but it absolutely means this specific presentation (Jeopardy!) lacks weight for those of us who understand what the game dynamics of Jeopardy! are. This is nowhere near as impressive a "man vs. machine" victory as was Deep Blue vs. Kasparov.


Even considering that, I don’t know what exactly makes it less impressive than Deep Blue. Watson correctly knew 84% of the answers in the second round. Looking at past games, that’s very much competitive with humans. To me, this is much more impressive than Deep Blue, even considering the fast reaction times.


Here's the thing, a tiny microchip sorting numbers at a speed that a human can barely comprehend IS impressive.

Just like flying through the sky in a metal tube, talking to someone on a different continent in realtime, or converting lightning into high-energy photons and cooking your food with it.


Well said. I must say, I don't really relate to the "man vs. machine" narrative here. Maybe it's just because I'm not taking the long view (the one in which the machines inevitably rise up and enslave us? or in which "we" become machines?), but all I see is wonderful positives: as you say, the fact that we can now fly in metal tubes doesn't diminish us as humans for not having wings and jet engines. Look at us humans: we can fly now!

I really enjoyed this article from Garry Kasparov in the New York Review of Books. Spoiler: it's [partially] about how the best chess player in the world is a really good human paired with a really good computer. http://www.nybooks.com/articles/archives/2010/feb/11/the-che...

So let's all join hands with the machines and sing Kumbayah ... all watched over by machines of loving grace... of course, yes, this is a medium-term view. The long-term, I suppose, probably belongs to the machines.


You're completely right, and I'm sick of people equating minimizing Watson's Jeopardy! success with minimizing the software/hardware accomplishment behind Watson. IBM has done something great here, but the buzzer dynamic renders Watson's Jeopardy! prowess unimpressive.

A good analogy, similar to your car vs. human analogy, is this: Hold a competition where two contestants much first pass some sort of Turing-like test. Upon passing that test, both contestants must sort one million 32-bit integers. The first contestant to finish both tasks wins.

In that hypothetical competition, it would certainly be impressive for a computer contestant to pass the Turing-like test. (The human contestant would probably have no trouble doing so.) But the sorting task would seal the door, and the computer would win every time it is able to pass the Turing-like test. The sorting task is known to be dominated by computers, just like buzzer reflexes are known to be dominated by computers.


I don’t know what exactly makes Watson’s performance unimpressive. Being able to answer 84% of all questions correctly is impressive, reaction times or not.


As I've said many times, Watson being able to come up with correct responses that often is impressive. Watson winning Jeopardy! is what's not impressive.


That seems to be a contradiction. Being able to answer 84% of the questions seems to be about champion level.


It might be close to champion level, but I strongly suspect that it's not as good as Watson's two human contestants.


I would be interesting to know how the human contestants would play on their own. We only see them in competitive matches, so we don’t really know how many of the questions they would and could answer. (I heard that contestants tend to push their buzzer as soon as they hear one of their competitors pushing the buzzer regardless of whether they know the answer, so you can’t even know from buzzer presses alone whether the would answer if there were no competitors.)

84% is, to my mind, very high. I’m skeptical of claims that even the best humans are better than that.


Of course. This isn't a match of equals, this is one of man versus machine. As fans know, most responses in a game of Jeopardy! are known by multiple players. That's especially so in a game of this caliber, so the knowledge aspect is really quite minimal compared to the ring-in factor. Both these guys slaughtered their opponents by being quick on the buzzer.

Watson is being granted first crack at the questions 90% of the time because of its electromechanical advantage. IBM may not have the mean brainpower that Google has, but they can clearly build a computer that can press a button quicker than Ken Jennings.

Knowing that a computer can consistently beat even the best to ever play the game to the buzzer, the IBM team could be pretty well assured of success once they got Watson performing well enough.


the comparison to google seems, um, awkward?

"IBM holds more patents than any other U.S.-based technology company and has nine research laboratories worldwide. Its employees have garnered five Nobel Prizes, four Turing Awards, nine National Medals of Technology, and five National Medals of Science."

admittedly, they've been around longer, but they're not exactly playing with crayons over there.


Come on, outside of these academic exercises, IBM is now just a giant consulting firm.


umm yeah giant as in $100 billion in revenue and about $15 billion in net income. Not a bad business at all. I should hope to be so boring.


With over $10 billion a year in hardware sales.


$30M to build Watson to buy all this PR is brilliant. As was Deep Blue.

I just hope the next challenge isn't "defeat a standing army"


academic exercise? you call a turing prize or a nobel prize an academic exercise?


I'm just cracking wise. Watson is a very impressive project. Nonetheless, throw four to seven Google researchers and engineers at this project and you'd achieve comparable results. They've been cornering the market in this area.

But instant button pressing is not terribly impressive and yet is Watson's critical advantage. This match is framed as a battle of knowledge, not a struggle for humans to overcome the massive disadvantage of their meat-based nervous system, which it is.

This is largely glossed over. Watson would lose horribly if one of its engineers was pressing the button. If its electronic advantage on the button was taken away, it would be the kind of fascinating match I and a lot of people were expecting.


Google actually does do open-domain fact-based question answering for queries like:

"what is the capital of china"

which are probably closest to Jeopardy questions (technically speaking.)

The team that does this is a lot more than seven Google engineers. Yet, it is not the top selling features of Google, because -- it's not that easy.

There is a fine line between "seems easy" and "nearly impossible". A Google engineer does not equate to infinite skills.


Yes, watson can play Jeopardy. Jennings and Rutter can also do every other thing that a human being can do. The meat based system is still, on the measure, much more impressive.


> the IBM team could be pretty well assured of success once they got Watson performing well enough.

That shouldn't minimize the accomplishment of actually performing well enough. If Watson were fast on the buzzer but couldn't answer accurately, he's be pretty far in the red about now.


Sure, but I believe the average Jeopardy contestant could probably win this match handily if they had Watson buzzing in for them.

Therefore they've made a machine that plays Jeopardy about as well as an average Jeopardy contestant.

That's still an impressive feat but not quite one I would have thought was out of reach five years ago.


Once they got Watson performing well enough... understatement of the year.


Is it heretical to say that for a project that researchers have been working on for six years, Watson performs as well as I would expect?

They have a ton of priors, a ton of specific optimizations, and a lot of resources.

It's pretty impressive but in a they-clearly-worked-a-lot-on-this way, not a what's-their-secret kind of way, you know?


It's pretty impressive but in a they-clearly-worked-a-lot-on-this way, not a what's-their-secret kind of way, you know?

If this isn't a "what's their secret kind of way" then what is, in your opinion? Or are you just more of the pragmatic, if its been done then obviously it could be done, duh.


It just seems like what I would expect, although I give them full credit for tackling this challenge in the first place. I'm not the only one:

http://www.madpickles.org/rokjoo/2011/02/14/ibm-watson-vs-go...


Well this guy is a Google employee, so his bias is obvious. But I'm not optimistic about Google if most of their employees can't differentiate between the difficulties in doing database queries and constructing an answer from a question. It's a FAR bigger jump than when we went to Google from Alta Vista -- especially since, coincidentally, IBM had the same virtually the same technology that Google had at the time, but they weren't focused on search -- they just thought it was interesting technology.

And let me add, if this is so incremental, I'd love to Google just turn this feature on. Let me ask it any question and just have it return the answer. at the top of the results page w/o me having to click a link. I don't think we'll be seeing that from Google any time soon.


It is an unfair advantage, and it's frustrating to watch. For most of the answers, both Ken and Brad were trying to buzz in, but Watson always had better timing and buzzed in first. I'm sure that Jeopardy's buzzing system didn't take robots into account when it was designed, so it technically isn't against the rules. But it does give Watson a huge tactical advantage.


I've been trying to think of a way to redesign the buzzer system to make it more fair, but I haven't been able to come up with anything. Tricky problem.


They could revert to the old rule where you could buzz in as soon as you thought you knew the answer. It's hard to tell how long it takes Watson to come up with answers right now, but switching to the old rule would introduce an interesting dynamic where Watson would have to decide whether to spend more time chugging data or buzz in.

Also, this would be pretty artificial (no pun intended), but they could analyze previous all-human Jeopardy! episodes and figure out average buzzer response time, and perhaps incorporate that into Watson.


How to make the buzzers fair: If two people buzz in within time T of the question being read, randomly select who gets to answer. The value of T would be somewhere on the order of 200 ms, to distinguish between the two cases of "waiting until the question is finished before buzzing in" and "racing to figure out the answer the fastest".


It would probably be better to just linearly go through who gets to answer.


That wouldn’t be Jeopardy anymore. Timing matters. I like the 200ms solution.


I wonder when Watson gets the textual clue: the instant the clue appears on screen? The moment Alex finishes reading it? A midpoint? Is it trickled word-by-word at about Alex's speaking or average human sight-reading rates?

If Watson has confidence at the moment ringing-in is allowed, it seems it will always win. So how much time it has to achieve that confidence, as a function of how the text is fed, may be more important than buzzer mechanics.

Based on the same general idea of a common starting point that motivates waiting until after the entire question has been read before allowing any ringing in, I could see there being a tiny 'common period' where all buzzes are considered as coming in simultaneously, with the person chosen to answer then being chosen in round-robin fashion (or for maximal drama, favoring whoever is behind). After the tiny common period, it would be strictly based on first-to-buzz.

It would take away a twitch-timing factor that has been important for human champions, too, but offer more fairness with regard to computers and even people with slight ticks or timing problems.


The clue is given to Watson electronically the moment the clue appears on the screen.

While watching, I was actually hoping for really short questions which would cut down on Watson's time to process and possibly put Watson on more even footing with Jennings and Rutter.


You could just change the game completely, and allow multiple people to answer simultaneously if they buzz in within a couple of seconds. It looked to me like Watson would be pretty competitive in that game, too.


You could just give Watson a hard 200ms handicap.


So the game turns from "the absurdly smart computer stomps everyone" to "the absurdly smart (but deliberately crippled) computer may or may not stomp everyone, depending mostly on random chance."

That doesn't sound like a great improvement, imho.


What about just removing the timing element altogether, effectively making every round into (possibly iterated) Final Jeopardy?


That would be more "fair," by which I mean that it would remove the obviously computer-favored dynamic of buzzer reflexes. Of course, it wouldn't make good TV, it wouldn't really be Jeopardy!, and I suspect Watson would lose. Against normal season contestants Watson would probably fair well (in a buzzerless competition), but against human champions Watson wouldn't stand a chance.


Do you know that? How? Stats?


There's no way to know what portion of the clues the human contestants knew the correct responses to. I've watched Ken and Brad play extensively though, so I very strongly suspect that they easily knew at least as many as Watson.


I would like some stats about that. As I said, Watson could have correctly answered 84% of the questions, nobody could answer 6% of the questions. That leaves a very thin margin for the humans, if past games are any indication one that is probably not large enough.


That would give a more precise idea of just how many of the questions Watson was able to answer correctly, but at that point the discussion starts to feel like people proposing changes to the rules of chess in order to screw Deep Blue.


You're glossing over the fact that Watson knew the correct answer most of the time. That's what this is about, and that's why this is a significant result.


It looked like Watson had the correct response about 75% of the time. That's probably better than most normal season contestants and home viewers, but I am highly confident that Ken and Brad knew significantly more than 75% of the correct responses.


Doesn't matter, you're discussing a different set of goals. Which might make an interesting game, but ignores the purpose of this exercise. The question on everyone's mind was "can a machine beat humans at Jeopardy". The answer appears to be: yes, very much so.

Whether or not we can make a more interesting or "fair" game of Jeopardy involving such a machine is an entirely separate, and to my mind far less interesting, question.

Where do we go from here? It would seem silly to say "we make a more interesting Humans vs. machines Jeopardy game." Rather, it seems more prudent to figure out ways to expand on this research and use the underlying technology to solve more interesting, and practical, problems.


84% of the time.


Today, I learned that there are 7 cities in America named Toronto.


I wonder how many of them partially fit the mold of having an airport named after WWII battles/heroes?


I have 31% confidence there is something there...


And, as I predicted, it only came to buzzer reflex, which computers unsurprisingly excel at. On day 2 (today), Watson was only beaten to the buzzer three times when it had the correct response above its confidence threshold.


[deleted]


I don't think so. I would hope the clue writers were purposefully made unaware which clues were being written for the Watson episodes. I don't think they seemed unfair, either. They seemed like normal modern Jeopardy! clues.

Also, I am highly confident that if you simply had Ken, Brad, and Watson fill out a worksheet with all the clues written on it (thus bypassing the central game dynamic of buzzer reflexes), Watson would get the lowest score.


I seem to be the only one missing this, so just wondering how it's been concluded that it was strictly due to timing? Watson didn't miss many and the one's it didn't answer, it had the correct answer most of the time. Just wondering how that means it would have gotten the lowest score but managed to beat everyone to the clicker. Am I missing something?


Watson has no clicker advantage. There is a human backstage that decides when Alex Trebek has finished speaking, and he lights up a board so the players know when they're allowed to click in. Watson gets the same signal and has to also press a physical button. There is a 300 ms delay penalty if you click in too early, and there was a question in tonight's show that seemed very much like Watson knew the answer but couldn't click in. He may have been estimating the clicker and getting it wrong.

While his mechanical finger might be a few ms faster than a human hand, it does not provide any significant advantage to him.


Watson's advantage is clear, significant, and game-deciding. If you were familiar with the two human contestants you would know this. Barring an unlikely malfunction, Watson can't possibly incur the delay penalty from buzzing in early. Watson has no real-time audio or visual input, so it is certainly not anticipating the "clue finished" signal.

The instant before the "clue finished" signal is sent, Watson already knows whether or not it will attempt to buzz in. When the signal is sent, you've got a few dozen nanoseconds of wire propagation delay, plus the propagation delay Watson's "clue finished" signal-to-solenoid logic incurs, plus the delay of the solenoid itself. The final two steps almost certainly take less than 10 ms each.

Even with a very forgiving estimate of a 50 ms "clue finished" signal-to-buzz latency, it would be extremely unlikely for even the most anticipatory human to beat that. Granted, it does happen. In day 2 of the IBM challenge I counted 3 times when a confident and correct Watson was beaten to the buzzer by a human contestant.


These aren't two dim guys its playing against. For virtually every question asked, at least two contestants knew the answer. So no matter how good Watson is at knowing the answer (namely really really good), this isn't an advantage against a panel that is also really really good at knowing the answer. Watson's Jeopardy advantage must be elsewhere.

To quote Ken Jennings: "As Jeopardy devotees know, if you're trying to win on the show, the buzzer is all. On any given night, nearly all the contestants know nearly all the answers, so it's just a matter of who masters buzzer rhythm the best.

"Watson does have a big advantage in this regard, since it can knock out a microsecond-precise buzz every single time with little or no variation. Human reflexes can't compete with computer circuits in this regard. "

He goes on to say that the game should not be changed to account for this.


Human reaction time to visual stimuli is approximately 150 to 300 ms [1]. I don't know how Watson was configured, but even with the same visual cue I would guess it could press the button in a few milliseconds. That's a massive advantage, enough to win the buzz-in every time both a human and the machine know the answer by the time the light goes on.

[1] http://en.wikipedia.org/wiki/Reflex


Only if the contestant waits until the light goes on (what Watson has to do). If the human anticipates the moment the light will go on he can be _much_ faster. Or he can be too fast and get the delay - well, that's the risk.

The whole discussion looks fishy. Yesterday, the consensus seemed to be that Watson was at a disadvantage because of it's build in buzzer delay and the fact that it has to wait for the visual clue. Today, when Watson performed better, the consensus (at the moment) seems to be the opposite. So, the system hasn't changed, but the better Watson performs the more opinions are brought forward why this is an "unfair" contest.


There was no such consensus. I predicted (based on the practice videos that were released) that Watson would excel due to its buzzer reflex advantage. In fact, I never saw any predictions that Watson would have a disadvantage because of its buzzer system.


Ken Jennings recently said this, which sounds very much like the argument I've been making:

As Jeopardy devotees know, if you're trying to win on the show, the buzzer is all. On any given night, nearly all the contestants know nearly all the answers, so it's just a matter of who masters buzzer rhythm the best.

Watson does have a big advantage in this regard, since it can knock out a microsecond-precise buzz every single time with little or no variation. Human reflexes can't compete with computer circuits in this regard. But I wouldn't call this unfair...precise timing just happens to be one thing computers are better at than we humans. It's not like I think Watson should try buzzing in more erratically just to give homo sapiens a chance.

http://live.washingtonpost.com/jeopardy-ken-jennings.html?hp...


To make it a true test of brains, and remove the mechanics of button-pressing speed from the question...

Place all three contestants in isolation from each other.

All three hear the question read, and buzz-in just as they do now.

Allow ALL contestants who buzz in to answer the question, but do not allow them to know about their opponents' performances.

Record all contestants' buzz-in reaction times.

At the end of the game, compare only the accuracy of answers to determine the winner.

At the end of the game, compare buzz-in reaction times to see how thumbs fare against relays.



I received a few complaints when I posted the results of round #1 and it hit the homepage. You might want to change the title to something ambiguous about who won.


Part 1 of the second round on YouTube here: http://www.youtube.com/watch?v=PHhDLUVAtqU


Thanks. Part 2 available anywhere?



I had to leave the article to search where the actual building was located because all they gave was, "suburban New York". I'm still not sure where it is.


is there a replay of jeopardy?


IBM will have it up in a couple days: http://twitter.com/#!/IBMWatson/status/37223337453158400


They are out there. Be industrious, you'll find them.


Normally, Jeopardy shows new shows on Weeknights.

Popular shows are reshown on weekends. Also, around holidays and other non-normal weeks of broadcasting, older shows are re-aired.

I don't believe episodes are available legally online.


the fact that watson has good nlp isnt nearly as impressive as the fact that it has a huge knowledge base, how the hell did it get all that knowledge, if it is just from browsing the internet by itself that makes me afraid.......very afraid


Why the spoiler? Do you think everyone watches it live? We are in the age of the DVR.


<SPOILER ALERT>

Mubarak's not president of Egypt anymore!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: