(I get a "one box" answer saying "November 22, 1963
John F. Kennedy, Date of death"). You'll note that Google had to derive that I meant "John F Kennedy" by JFK, that I wanted a date, and then retrieve the answer.
It's worth clicking the "More info" button under one of these "One boxes". Google will tell you what it regards as "facts" and where it is deriving them from.
> Google has been working on question answering for years now, but only exposes it when it is sure of the answers.
I find this only works for very particular, seemingly whitelisted, cases. For example, it does date of death well, but try to ask it the date of any other sort of historical event. It doesn't do the date of surrenders or famous battles, the day Columbus discovered America (it won't even tell me that date if I straight up search for "when is Columbus Day"). "When was the declaration of independence signed" gets me nada, and it can't tell me what year the Magna Carta was issued. It can tell me the day that Hitler died, but can't tell me when victory in Europe was declared. It can't even tell me when Enron went bankrupt, despite the first words in snippet under the first result for "when did enron go bankrupt" being "Before its bankruptcy on December 2, 2001"
I'm not picking and choosing examples here, everything that I tried besides birth and death dates did not work for me.
What Google is doing really seems like nothing but a parlor trick compared to what Watson can do. Maybe they really are doing all the same fancy math behind the scenes, but what is ultimately surfaced to me, the end user, is not impressive.
(Note that although I can't get coherent responses from Wolfram Alpha with natural language questions, Wolfram Alpha can answer most of these things if you don't ask it questions that way. "Battle of Hastings date"? Wolfram Alpha can do that, Google can't.)
I'm also continually surprised at how good google is at queries of the type "what was that that movie where the guy did that thing" and "what was that music group that has that gimmick".
I have another example. I live in Mexico and I watch movies in Spanish/English depending on the channel. The titles are always in Spanish and a lot of times they don't translate clearly to English or they don't make any reference to the title in English. If I am watching a movie I usually try a few Google queries to find about the movie I am watching, to see if I like it or just to know the end. Even with out knowing the title I am usually able to quickly find about the movie even if it is not a blockbuster.
horror thriller where woman goes to russia meets brother
I think it is ludicrous to think that Google can be displace from it position as the global leader of web search in the short or mid term. To displace Google a new search engine wouldn't just have to be better it would have to make Google appear like last century tech. With this said I have to make clear that I am no Google fan or Google hater but I am glad there are other search engines that can at least compete with Google on their national markets (Yandex, Baidu, Naver, Seznam & Yahoo Japan).
In my particular case I watched a movie late one night and didn't know the name of it. Typed the following into google: "movie where a guy plays a vampire and house sits and (spoiler)" and the first result in my case, Body Double, is correct.
This is one of the most ridiculous stories I have ever read. How do they so quickly dismiss Google's search algorithm as something to fall back to? Microsoft have been chasing their tails for years trying to be as relevant as Google and never achieving it.
It sounds like the author is suggesting dynamic web pages built by Watson that would answer questions (summarise court cases etc). The underlying problem is, where does all this data come from? It sounds like Watson would intepret multiple data points from around the web to compile the information. How can they say this information is correct? Google at least points you in the direction of the information then you make an informed decision yourself if it is correct.
This sounds like something we already have, we have Google and Wolfram Alpha. Problem sorted.
This is certainly not ridiculous. It is very legitimate to wonder whether systems involving more AI such as Watson are going to take over standard search techniques or not. Regarding Bing, it is as you said: Microsoft has been chasing Google but Watson is a totally different approach so the question is still open.
The concept of the AI is not what I am disputing, it's this particular story. There is no expansion on how it will replace it and equally dismisses Google's algorithm as simple to replicate.
I don't see it as replacing search either, people search for websites. What is the end result, that Watson replaces all sites on the internet with it's own dynamic page filled with its own information?
How would Watson choose how to display the information it returns to me? Am I getting the full story?
This article breezes over so many specifics I cannot take it seriously.
Neither Google nor WA are good at understanding the semantics behind a natural language question - something that Watson has been designed to do. Also, showing the percentage of certainty, as they do for medical queries, could be useful for many fields, particularly for academic research, given that they include citations, something that should be easy to include. I'm seeing Watson rather as a tool that could make a dent in specialist fields and trickle down from there, as the hardware to do that becomes cheaper.
And I completely agree with this, there is definitely a niche. Very much in the same way that Wolfram has a niche. It's never going to replace search but it definitely plays an important part in the way we look for information.
If anything, IBM could replace / merge with Wolfram
I completely agree. Google has a huge, possibly even insurmountable advantage over any competitor - the sheer quantum of user queries and clicks that helps it continuously refine the context of the information gathered by its spiders.
Information <--> Actual User Questions <--> Actual User Clicks
It is the continuous feedback loop between the three that makes Google what it is. Without the latter two, Watson is seriously disadvantaged.
Without diminishing the props that need to be given to the team at IBM Research which developed Watson, the author of this article has significantly overestimated the general AI capabilities of Watson, while downplaying what is involved with Google Search.
On the Google Search side, its algorithms are a lot more than just "the PageRank algorithm". The knowledge graph and its ability to remember and take advantage of context from previous searches are examples of this.
On the Watson side, a huge amount of what it was doing involved identifying keywords and finding relevant information from the clue words. It did not really reason about questions or have a deep understanding of the semantic content of the query. It was hand optimized for the sort of questions that tend to be asked on Jeopardy, which was an impressive feat, but in terms of being able to create new knowledge, as the author suggests? The state of the art in AI is a long way away from that.
The reason Google Search works well is because through many of it's iterations it actually uses crowdsourced data from real humans to answer questions. Whether it is the simple initial vote of confidence by links, to google's ever changing internal data about clickthrough rates, personalized results, search volumes and general user behavior, or simply from the many hardcoded questions in certain format ("What is the capital of Spain?" or "distance from the sun to jupiter"), Google is continuously getting better at judging intent and answering questions. Similarly for a lot of the other technologies, such as translations, the real work is being done by consumers and users who generate the gigantic corpus of data.
The question is -- does IBM have access to the same amount of data?
If Watson were to ever dethrone Google, I am willing to almost bet it would be because it was programmed to consume data via Google. It all comes down to the data in the end, unless IBM have managed to gain access to the same size and or bigger trove of data than Google currently has (which is a lot), there is no way anyone can beat Google. And lets face it, people don't just use Google because they have the biggest database of knowledge and data, people use it because they trust it, they know it works, and Google as a company have established a rapport with people that has taken years. IBM to me will always be a company focused on corporate and enterprise offerings, not offerings for the general consumer.
Actually, I needed the answer, since I was reaching into my desk drawer for some wedding gifts and needed to buy some gift boxes (Google helped with that, too).
Watson may have a head start on the AI stuff, but Google is a quick study.
[What years have Pixies toured England?] is a nice example of a search that doesn't work. You don't get a historical list, you get very many sites selling you tickets or sites telling you about a tour this year or an upcoming tour.
Given past experience with other MPI-based software, I'm not sure that Watson would scale without extensive retooling. MPI tends to be extremely chatty (MPI - Message Passing Interface), we ran ours off a hypercube (network topology) switched Infiniband setup. MPI depends on broadcast/scatter/gather semantics. This squarely lands in the 'what if' category for now.
Infiniband sucks. IBM's or Cray's proprietary interconnect will scale your code because it removes the chatter. On a Cray our MPI latency is 20x better and has almost no discrepancy between PEs.
Interesting, I went to a super computer conference in Germany last week and was curious to see many papers using 10GbE and vendors talking about infiniband. Can you point me to any resources? Also have you any insight as to what the current thinking is about Hadoop clusters - are people making the move to 40GbE to try and get good throughput or is it pointless. Our tiny 8 node cluster has recently got in a fluster due to having a 1GbE switch, an obvious fix is to get a 10GbE one - but will this help?
MPI programs and Hadoop will scale until the MPI latency becomes too high to effectively perform a data swap. When you look at the profiler on a good machine you see that 25% of the time is spent in MPI_recieve. When the time spent in MPI_receive goes up you are done - the task simply can't scale. Vendors like Cray and IBM sell machines that have low MPI latency. This happens in both hardware and software. The interconnect is fast (I think Intel baught it) and the MPI layer performs optimization for all the traffic. OpenMPI doesn't even come close to optimizing the traffic. Ethernet doesn't do direct DMA likes the propriety interconnects - this adds latency and jitter. I don't believe the Ethernet strategy can scale for such applications, primarily because the time to complete the step is the highest latency on the network. If one node jitters and takes 100micros, it doesn't matter much that the rest of the guys took 20micros. Anyways, benchmarks would help.
The smallest machines Cray sells cost about $500,000. If you want scalability you gotta pay. 8 nodes isn't a real machine.
Every few years the idea of a natural language / semantic / question answering search engine crops up again.
Natural language understanding is quite relevant for the crawling and indexing part of information retrieval systems and Google is very good at that. Just look at their quite formidable automatic translation software, which is a by-product of their ability to correctly map natural language concepts to strings.
The thing is: People just don't want to converse with a search engine as if it was a human being. Some library / scientific information retrieval systems tried to go in that direction, which resulted in retrieval systems that were just cumbersome to use.
Google nailed the search engine user interface quite some time ago and Peter Norvig is absolutely right when he says that users simply don't want to ask questions when searching. They're much faster at entering keywords relevant to their search intent because they've learned how to efficiently converse with search engines in their 'native' keyword language.
Hence, passing the Turing test is completely irrelevant for search and information retrieval in general. Even in mobile environments where due to the device's constraints a natural language user interface makes a lot more sense than on the desktop, software like Siri more or less is just some gimmick that in most cases is easily outperformed by more traditional input methods. Sure, asking Siri to 'Show me the way to the next whisky bar' might be fun at first but simply entering 'pub' and
the name of the town you're in right now is still a lot more efficient. Again, I think Google nailed the user interface part with Google Now for mobile information retrieval as well. I don't want intelligent machines to pretend they're human. I want them to take a back seat and present me with the right information once it becomes relevant.
Google nailed the search engine user interface quite some time ago...
If there is one area of search technology where Google has contributed almost nothing, it is the search interface. The subjective user experience is virtually unchanged since the days of Alta Vista circa 1996.
Google nailed the search engine interface given current tech. However, the interface is precisely why I changed my default search engine to DDG. ( Since I can always fall back to Google with !g and DDG gives me better control over the search.) So I can think of a lot of situations where a Watson like search would be more appropriate than a Google based one. For example many Wikipedia searches, most of the time I don't look for a in depth article, but only for an answer to a specific question. But I know that skimming Wikipedia is probably faster than looking for the specific answer in the open web.
The biggest obstacle to Google's progress in search is Google. They make virtually all of their money from advertising, including the same spam that pollutes their search results. Targeted advertising can only help so much, because in the end, advertisers don't just want to reach people already interested in their product, they want to attract people who aren't. Google's dominant position in the online advertising market is an unreconcilable conflict of interest with their search users. They continue to dominate search not because they are technologically unbeatable, but because no one has figured out an alternative business model that makes search pay without corrupting it with advertising.
we should stop thinking in monolithic terms - as if one company will be able to do it all in the global scale. Google seems to have mastered data centres and spiders, and will probably form a base layer for part of the next step towards humanity-AI. Maybe Watson will consume Google data, but maybe a thousand individual tuned AI like services will arrive - for booking flights and for negotiating mining contracts and ...
I think we shall soon see the end to the idea one organisation (based in SV) will be able to service all the worlds needs on a march towards AI - I suspect Google is as an organisation already straining. why not let markets flourish ?
I agree. Certain search terms relate to a single concept with a single authoritive source of information. A fireman searching for plans for a building on fire only really want a single answer (contrived example). Google is bad at domain specifics searches where you are looking for one answer. It is so easy to get trapped in obscure business listings, local news and press releases.
Google works well when there is no clear source of authority and the algorithm can make a judgement. Some searches have a definitive thing that you are looking for and the algorithm would need to know that source to give the right answers. We need more human input which can make search engines that give results that are shamelessly partial to particular sources of information (like patent searches).
Quality of search results has very little to do with market share. Had it been the case, Bing should have ad near equal share with Google and Ask should have had zero.
Google owns Chrome and Android which is basically 50% of web traffic.
When I first heard about Watson, I expected IBM put a version of it online for people to experiment with. If it's as good as they claim, it could be a useful tool and create good PR for them. I know the searches are much more resource intensive than Google's, so I wouldn't have minded if they needed to limit it something like 1 question/IP/hr.
My knowledge of the actual workings of Watson is sketchy, but doesn't it devote a massive amount of resources to understanding natural language queries? Can it scale its ability to handle one question at a time to handling a billion queries?
And Google is getting much better at understanding syntax and will continue to improve.
Google search itself is becoming Watson like, with the all the additions in the knowledge graph, and I think with Google Now google search will also have a lot more context. Watson can of course be a classical disrupter in search but Google isn't exactly inactive in this arena.
The problem with search is that nowadays, people judge results quality by their similarity to Google results. So anything that doesn't look like Google output looks "off".
At this point, I judge search quality by the words I'm searching for appearing on the page the search link sends me to. I am starting to pine for the days of Alta Vista so at least I could include and exclude words with some thought it might work.
"What if" a marketing strategy. Its a weasel way of stating something while giving outs to yourself without allowing the target to reply with a quick "no".
This sounds like it was written in 2007 or something.
Google has been working on question answering for years now, but only exposes it when it is sure of the answers.
https://www.google.com.au/search?q=when+did+jfk+die
(I get a "one box" answer saying "November 22, 1963 John F. Kennedy, Date of death"). You'll note that Google had to derive that I meant "John F Kennedy" by JFK, that I wanted a date, and then retrieve the answer.
https://www.google.com.au/search?q=how+old+was+jfk+when+he+d...
"46 (1917–1963) John F. Kennedy, Age at death"
https://www.google.com.au/search?q=how+did+jfk+die
"Assassination John F. Kennedy, Cause of death"
It's worth clicking the "More info" button under one of these "One boxes". Google will tell you what it regards as "facts" and where it is deriving them from.
(Also, http://www.wolframalpha.com/input/?i=how+old+was+jfk+when+he... is just as impressive)