Hacker News new | comments | ask | show | jobs | submit login
Things Not Strings -- Google Launches Knowledge Graph (googleblog.blogspot.com)
284 points by rryan on May 16, 2012 | hide | past | web | favorite | 139 comments



As someone who has done research in the field, I can tell you - this is the future of search, and the key to true AI. Once we have a knowledge graph of the world, you will be able to ask questions and get answers, not documents. This will allow so much more automation - personal assistants like Siri will be able to "book me a room in San Francisco with a view of the ocean" or "give me a list of the schools all the children of American presidents went to" and so on.

There is an arms war here between Bing and Google - we can thank Microsoft for pressuring Google into making this happen sooner than they otherwise would have.


Isn't this exactly what people thought in the 80s? Building knowledge into a computer of the relationships between entities in the world will let you write expert systems that work like humans?


I read at some point that a lot of interesting theoretical AI research was abandoned in the 80s, because the results (like expert systems) got good enough that they had commercial applications, and the focus changed from research to perfecting existing products. No idea if that's true, but it suggests that there might be room to pick up where people in the 80s left off. Especially given that computing resources and data archives are somewhat improved since then.


Research funding was also cut heavily in late 80s: http://en.wikipedia.org/wiki/AI_winter


Might be true, but it sounds unlikely for two reasons:

(1) If new theories of AI proved to be commercially viable, you would expect that people would continue to develop new theories in the pursuit of money and fame, not "abandon" them because the results are "good enough."

(2) There are still lots of smart people working on AI in academia today that were around in the 80s. (And the 70s. And the 60s.) They remain keenly aware of the breakthroughs that occurred then, because they were the ones making them. And they're familiar with the technology of today, because they're using it in their new research. So if any big rocks were left unturned prematurely, I'm sure they're being examined.


I think the idea is, it's like the difference between searching for new cancer treatments and bringing a promising treatment to market. If it takes the same kind of expertise to do both, then spending more time on one means less on the other.

Still no idea if that's true of AI, but I found a couple of interesting cites:

Patrick Winston, director of MIT's Artificial Intelligence Laboratory from 1972 to 1997, echoed Minsky. "Many people would protest the view that there's been no progress, but I don't think anyone would protest that there could have been more progress in the past 20 years. What went wrong went wrong in the '80s."

Winston blamed the stagnation in part on the decline in funding after the end of the Cold War and on early attempts to commercialize AI. But the biggest culprit, he said, was the "mechanistic balkanization" of the field, with research focusing on ever-narrower specialties such as neural networks or genetic algorithms. "When you dedicate your conferences to mechanisms, there's a tendency to not work on fundamental problems, but rather [just] those problems that the mechanisms can deal with," said Winston.

http://www.technologyreview.com/computing/37525/

They don't go into detail about how the early attempts to commercialize contributed to the problem. This site does, but seems less trustworthy:

In the early 1980s, dark clouds also settled over the MIT Artificial Intelligence Lab as it split into factions by initial attempts to commercialize Artificial Intelligence (AI). In fact, some of MIT's best White Hats left the AI Lab for high-paying jobs at start-up companies.

http://computer.yourdictionary.com/golden-age-era

So it sounds like some smart people in academia in the 80s think that some stones were left unturned, or turned too slowly, and that part of the problem was a refocus on making money on existing discoveries. According to that AI winter link, the tech mostly wasn't ready for primetime yet, presumably making it even harder to raise funds for new research.


Winston is still teaching and doing research at MIT. In fact I took two of his classes a couple years ago, and he's exactly who I had in mind when I mentioned researchers with experience and knowledge from decades past continuing to work :) Even if some previous research wasn't fully fleshed out, we can be confident that it hasn't been forgotten.


Yes, and I - for one - am not convinced that that approach has been proven wrong so much as it just hasn't been realized yet, for whatever reason. If I had to guess, I'd say that "whatever reason" is some combination of not having sufficient hardware, and not fully understanding knowledge representation and reasoning algorithms well enough yet.

Of course I could be wrong, but I think there's a good chance that a lot of "Good Old AI" stuff is still valid, but that it was just too early for it then. Maybe it still is, time will tell.


It's not wrong. Past failure is not guarantee of future failure.

Elon Musk and the electric car industry is one example.


"past failure ... the electric car industry is one example."

Actually, electric cars were popular before gas cars were.

http://en.wikipedia.org/wiki/History_of_the_electric_vehicle...


Interesting thanks!

But I think your argument falls down, when we inspect the merit of the world "popular".

Electric cars, have never been mainstream.


Yes, but the "web of knowledge" folk were planning (ish) hand-added metadata to allow knowledge extraction by AI programs, with no real incentives or network effect to promote people doing that. Google got microdata (which are semantic annotation, if a lot less complicated than the old schemes) added to websites by saying they would include them in Pagerank calculation.


When did google say that they would factor microdata into SERP ranking?


After doing some hunting, apparently I imagined them saying that. What I did not imagine was that it impacts score in results for their "recipe search" product - and it caused some consternation among the recipe blogging community at the time.


The Semantic Web is truly the all of those things -- but this is not that. My guess is that this will just end up as a sharpened version of the data section of Wikipedia results that we have now.

Why? Because they are selling it without mentioning why they won't fail where every other attempt has. There are huge difficulties in this. Have they turned a corner on the research that changes something? I don't see it in what they've so hinted at so far.


I think that there are two things in their favor here:

First, they are Google, and therefore possess huge quantities of data and the ability, courtesy of their uber map reduce prowess and ultra-fast custom hardware, to make sense of it.

Second, they bought Metaweb (makers of Freebase) and with it some of the best semantic expertise out there. Toby Segaran is a brilliant dude. His O'Reilly book "Programming the Semantic Web" explains in 20 pages what most books take 150 pages to do: the concept of a URI based graph database and how it enables data to be merged from multiple sources and reasoned over with applications.

I only hope Google open-sources some of their research here for the rest of us.


Thanks so much! Glad you enjoyed the book. I wanted to point out that Colin Evans and Jamie Taylor were also authors (and still work on the Knowledge Team at Google) and should get some credit.


Haha! Thanks for pointing that out. I didn't mean to leave them off. I met you guys in 2009 at a semantic tech conference, back before Metaweb was purchased. So glad to see your work being pushed to the most popular website in the world.


There's little need. It's a direct application of concepts that are well treated academically in all the various datalog papers.


Taking academic research to production ready code is far from trivial.


I think that generalization is false as often as it is true.

But also, have you read any of the papers involved? Datalog is pretty simple. It's a restricted, forward chaining prolog. Once you know that, you can recreate most of it from that description alone.


Google is sitting on an enormous pile of semantic data that dwarfs anything that any AI project could possibly use 20 or even 10 years ago. Of course it does not guarantee that they won't fail as well, but it sure gives them an edge.


It also appears that SEO is finally the thing that gets people to actually publish semantically-marked data, so they'll get some of this more easily in the near future.

http://bergie.iki.fi/blog/google-s_rich_snippets_will_lead_u...


Yes, like the HTML web this web of data grows by network effects. The "Semantic Web" was only ever a data model and exchange formats - a standard to guide independent efforts. Now, of course, a lot of that data will be protected by the internet giants for their competitive reasons, but the standards still provide the interoperation at the edges.


This is Google's search team we're talking about here. They might make the mistake of overestimating how much people will like something like this. They might make the mistake of underestimating how many resources something like this will take. However, they simply don't make mistakes of the kind you're talking about. I'd be willing to bet money that there's a bullet-proof theoretical model behind what they're doing. And they probably aren't ready to talk about it yet.


You are falling for the fallacy of infallibility. You assume they are successful because they have some magic secret, not just a lot of hard work.


Except that I named two areas where they can make mistakes. And Google's success is indeed because of the hard work they've put into building such a smart team of search engineers.


I agree. This looks to me to just be Wikipedia presented next to the search term. Much the same as DuckDuckGo http://duckduckgo.com/?q=taj+mahal Congrats Gabriel, you got to them :)

I thought that while flawed cpedia (from one of the cuil founders) was a much more interesting push on this idea then Googles one currently is.


while I share your enthusiasm for the bright future ahead, there is a missing link between this and "book me a room in San Francisco with a view of the ocean", namely that data is behind closed doors.

There is a bunch of data that google can use[1] because it is made explicitly available. But many sources don't want that.

As an example, consider "book me flights for the cheapest route between lisbon and kiev". It is a trivial thing to do, provided you can get airline data.

But you can't scrape ryanair's website because they willingly put counter measures in place (e.g. captchas) so you cant do that.

[1] e.g. http://richard.cyganiak.de/2007/10/lod/


Could the google bot take a captcha it finds and use it in one of it's own re-captchas? Essentially passing the burden of deciphering the text onto some unsuspecting human, it will be able to beat all captchas with ease!


My stack level got too deep reading your comment.


"I'm on it, riffraff." "I've found the cheapest flight between lisbon and kiev. Booking is possible thru ryanair.com . I've filled in all necessary details for you, however there is a captcha I can't wrap my cpu around. Also, there are some privacy agreements I'm not authorized to make for you. Could you take over from here?"

Imagine an integrated Siri with those kind of capabilities. It doesn't have to be fully automatic. Letting a secretary do stuff also isn't fully automatic, (s)he's there to optimize your time into doing only the important decisions (sign on agreements, clicking confirm after having seen the price..).


Once bots get smart enough there will be no captchas that can stop them. The only way will be to ban the IP where the bot is coming from, assuming you know which IP belongs to which bots. They could easily disguise themselves by using proxies.

I guess my point is that if we get to the point were a bot can do generic requests without the aid of a human a captchas will probably not be able to stop it.


You're assuming Google would do that and/or that it will be allowed to do that once sites find out. They are a lot of businesses who have zero incentive to give their data to Google.

This "AI" is just scrapping and replacing wikipedia while serving ads.


I think Google bought a company related to this technology 2-3 years ago. It's not like they saw Bing's Facebook integration last week and then decided to do this.



I'm not sure if I would want to say "a knowledge graph of the world" is the "key to true AI".

I think the situation is closer to "true AI is the key to a usable if situationally depend knowledge graph". Because the world doesn't have single knowledge graph that you can learn and use in all situations. Certainly, you can find a lot of common instances where the average works but once you're past that, you need the kind of understanding of language that present day systems are far from having.


"Because the world doesn't have single knowledge graph that you can learn and use in all situations."

The logical way to overcome this via a "data first" brute force approach is to build personalized knowledge graphs of every potential customer. Which is in effect what every statistically sophisticated large business is attempting.


Just to be nitpicky, don't you mean "the key to true", expert-system based AI? Or, are you going the "human intelligence is a clustering system; therefore, true AI would have a similar method" route?

Both are valid, I'm just curious.


Bing? I Use it not much, have they done something in that direction?

I like WolframAlpha.


They bought powerset a few years back and they did that.


>As someone who has done research in the field, I can tell you - this is the future of search, and the key to true AI.

Having done research in this field doesn't qualify you to have authoritative opinions on search or AI. I don't necessarily disagree with you on search, but I don't see why a dataset would be the key to AI. AI is a function, not a dataset.


Actually Cyc is panned pretty hard as a complete waste of a perfectly intelligent researcher's time.


Unless the hotel booking system has a captcha...

I always try to mention his when people get over excited a out the semantic web. Most of the important stuff that we would like to automate we also insist that a bot not be allowed to do it. It's pretty scitzo in my opinion.


Presumably the booking would be done though an API.


I'd say thank Metaweb as it seems to be building directly upon their ideas. I'm not sure how much Bing has pressured Google to innovate in the search space (though I think Bing has got Google thinking more about design).


I hope you enjoy your very pleasant, if initially surprising, stay in San Francisco, Cebu. Watch out for sea snakes!

http://en.wikipedia.org/wiki/San_Francisco,_Cebu

As someone who has done research in the field, did you read the Gizmodo review of Siri?

http://gizmodo.com/5864293/siri-is-apples-broken-promise

The set of "knowledge graph" problems which is not, in fact, AI-complete, strikes me as much smaller than most doing research in the field would like us to think.

I hope you won't try to argue that "book me a room" etc. isn't AI-complete. There's an uncanny valley there, and it's deep. You can use Siri for lots of trivial tasks in which bizarre failures are hilarious rather than disastrous, but booking hotel rooms isn't in that set. She can be the best secretary in the world 9 times out of 10 or even 99 out of a 100, but the other times she's an insane robot who wouldn't at all mind sending you to Cebu to get kidnapped by the MILF...


These are still early days. Siri is slow and inaccurate. However consider it a preview of things to come.


These are certainly the early days. SHRDLU was slow and inaccurate. In many way, it was an even impressive example of "things to come" (it could seem to understand some fairly complex figures of speech for example). The problem is that we're not sure example when these "things to come" will actually appear.

http://en.wikipedia.org/wiki/SHRDLU


We keep hearing rumblings of the coming semantic web:

  2001: Tim Bernes-Lee publishes "The Semantic Web" in Scientific American Magazine
  2006: Microsoft acquires Powerset
  2009: Microsoft touts Bing as a "decision engine"
  2010: Google acquires Metaweb
  2012: Google introduces "Knowledge Graph"
Is this the big one?


  2007+: work on machine operable Wikipedia (DBpedia, Semantic MediaWiki, Wikidata)
  2010: Apple acquires Siri
  2010: Facebook introduces Open Graph
  2011: Google, Bing, Yahoo (schema.org) agree to index embedded data


you realize that the semantic web is not about the semantics of the data, but instead about the semantics of the schema, right?


you realize that the semantic web is not about the semantics of the data, but instead about the semantics of the schema, right?

I'm having trouble getting your point here. A schema can be thought of as metadata about the data to which it is applied; so "semantics of the schema" are also implicitly "semantics of the data" in a sense. I guess I'm missing some nuance about the point you're trying to make... would you care to elaborate?


I'd give good odds that joshu already knows this, but I'd like to point out that a lot of work has gone into making freebase's schema machine readable. The schema is stored and constructed in the same way as all other data in freebase (as are metaschema, such as http://www.freebase.com/view/type/object/type and so on).


Any linking provides context thus semantics. Linking to a schema makes this explicit. So it's both, right? Or are you making another point?


The semantic web allows computers to deal with the "content" of pages instead of just displaying them. http://www.w3.org/TR/owl-features/


you realize that the semantic web is not about the semantics of the data, but instead about the semantics of the schema, right?

The meaning of semantics (semantics of semantics?) can get philosophical real quick. Here I'm just rolling with the OP's notion of things, not strings.


Slight correction: Microsoft acquired Powerset in 2008.


You're right. Thanks.


I note that none of those screenshots contain ads. Perhaps a Googler can enlighten us: do Google employees not get served ads? Or is the author of this blog just using adblock? Or are these images just fake mockups where they left out the ads to make them more aesthetically pleasing and clean looking for PR purposes?


I just did every search (Taj Mahal, Marie Curie, Matt Groening) they did on my unaltered Firefox/Windows 7 install while logged in, am certainly not a Google employee, and got no ads. Also didn't get the Knowledge Base yet, if that is a factor. Famous locations/people searches don't seem to be stuffed with ads typically. I'd imagine more research style searches having ads would be a bad experience for both the user and advertiser compared to stuff like "cheap flights" which does have an amazing clutter of ads. And of course, picking research style searches presents the idea better. Lack of ads is a nice bonus.


Google search ads are really not in the same business as most traditional advertising. Traditional advertising wants to create desire, and inform people. You take a lot of people and show then things they might be interested in, and hope some of them spend money. What Google search ads do is trafficking intent. Think queries like "cheap hotels in Florida", which clearly communicate that the customer is willing to spend money.

They find customers who want something specific, businesses who will sell them that, and make them meet. And then give the people who sell very good analytics that show exactly how much value they are getting, so they can bid up each other for the ads until Google captures almost all the value. During the housing boom, some mortgage-related keywords climbed into mid three figures. That is, each and every time someone searched the matching keywords and clicked on an ad, several hundred dollars changed hands.


Many results pages contain no ads. For instance, I just searched for both "Taj Mahal" and "Matt Groening" and received ads for neither query.


Google employees get ads. But these new panels do in fact cover them.


Yes, we still get ads. It wouldn't really be dogfooding if we didn't. I guess devs working in search could turn them off if they wanted, or as you say maybe they just have adblock installed :)


I think it depends on the topic. Searching for Matt Groening, like in the blog post, shows no ads for me.


I guess they remove the ads - there's no reason to give free ad views.


This reminds me of zero-click info from Duck Duck Go, but deeper. They're showing a summary-like view of information related to your search, but instead of being generated by a bunch of hand-curated rules, they come from more generalized machine learning algorithms. This is real competition for one of my favorite DDG features.

I see this as a positive example of search innovation and competition. More globally relevant information, not just a tighter filter bubble.


"Marie Curie... had two children, one of whom also won a Nobel Prize, as well as a husband, Pierre Curie, who claimed a third Nobel Prize for the family."

Marie got two, one in physics and one in chemistry, the former jointly with Pierre. This is notable because it makes her one of the very few people to receive Nobels in multiple categories.


FYI, they're using an open API you can use at:

http://wiki.freebase.com/wiki/Freebase_API

Now let's hope they'll also have an API for all the extra semantic info they collect with search.


"I believe AI has an opportunity to achieve a true breakthrough over the coming decade by at last solving the problem of reading natural language text to extract its factual content. In fact, I hereby offer to bet anyone a lobster dinner that by 2015 we will have a computer program capable of automatically reading at least 80% of the factual content across the entire English-speaking web, and placing those facts in a structured knowledge base." --http://www.ai.rutgers.edu/aaai25/mitchell.htm


I saw an on-going project recently that was attempting just that. They were using texts from the web to infer relationships from recognizable patterns. Like (the head of X, Y) implies X is an organization, Y is a person, and Y is the leader of X.

It would also try to learn new patterns from facts it already knew, like if it notices Jobs and Apple in a sentence it can hypothesize that sentence pattern is about a leader of a company.

For the life of me I cannot remember the name of the project or even the university, hopefully someone else has heard of this.


Tom Mitchell was talking about his own research. NELL http://rtw.ml.cmu.edu/rtw/


How does this impact site traffic? They are taking popular queries and providing the answer to them on a google.com hosted page. Does that mean less click-throughs to actual result pages? Does that bother anyone?


Since Google makes money on the ad clicks both on SERPs (AdWords) and content (AdSense) it's safe to assume they won't be answering everyone's questions.


Google gets 100% of Google.com ad money vs 30% or so on Adsense. A major difference I'd say. Plus, some sectors are worth a lot more and you can bet Google is aware of that.


I doubt google is making much off these terms anyway, your ROI would be pretty bad because there is no specific intent.


"Things, not strings" does not seem like an effective tagline for marketing this to the public. Are non-programmers used to the term "string" in the sense of "text"? And even for those who are, I still think the tagline is too abstract. I would prefer something like "Understanding the concepts behind the words you search" (or "Understanding the things").

IMO the linked video does a better job at introducing the feature than the article.


From a marketing perspective, its probably a case of tailoring messages to audiences.

Non-programmers will likely not be poking around on Google's engineering blogs, and will pick up the Press Release that will have been pushed to the likes of CNN and the BBC.


> Non-programmers will likely not be poking around on Google's engineering blogs

This is the central "Official Google Blog", not an engineering blog. From the other articles on the blog (most of which are written by marketing and other non-engineering disciplines -- see the last lines), it's pretty clear that the intended audience is anyone who's interested in Google, not just programmers. I know lots of non-programmers who like to read these, especially since a lot of news sites link to and quote these articles.


Personally, I would be more concerned with explaining the term "Knowledge Graph" than I would be concerned with explaining the term "string".


people who don't understand what a string is don't read blog posts like this. they'll see the extra info on the side in their search results and say "oh, that's cool", or "change is scary".


Interestingly there is no mention of G+ or Circles or personalized queries in the post.


Quick, everybody download a copy of the freebase dataset before they kill it!

On the other hand, I'm glad something is coming out of the metaweb acquisition.


Freebase is not going anywhere but please help yourself to our complete data dumps: http://wiki.freebase.com/wiki/Data_dumps


Since the dataset has an open-source (CC-BY) license, I am at a loss to imagine how someone could kill it.


The data that has already been released will be legal to redistribute forever (if there are still copies floating around). But google could decide at any point to stop making new dumps, and to stop hosting the existing ones. That's what I meant by "kill it".


I know you guys usually get all hot and bothered over anything labelled innovative, but wasn't search good enough by 2002? Seriously, when is the last time you struggled to find something using google search?


My Google searches actually fail very often. Outside of a few topics (programming and music, mainly), and excluding queries that lead me directly to Wikipedia, I'd say the success rate for "exploratory questions" (not "facebook home page") is around 50%. But since other search engines generally do worse, I can't say if the information doesn't exist, or if Google fails to find it.

Search felt good enough at some point. It no longer does, at least for me. I don't know if my expectations have become too high, or if search engines have become worse, though.


What sort of queries are failing for you? My e-mail address is in my profile - I'm doing some research (for Google) that's along the lines of "exploratory questions", and additional use cases obviously helps.


I think a lot of it is that most topics outside of programming/tech have information locked up in books or other non-free things. For instance, I can't find a decent, in-depth article on how to make my own leather from deer hides. There are lots of general howtos, but many of them hint that the difficult and special knowledge is available in these books.

Maybe it's just that other knowledge areas don't just put every little thing on the web to be indexed and easily searched, or maybe I just grock programming well enough to read between the lines and follow implications, I don't know.


Just today there were two things I couldn't find with Google:

Finding how/why the lighting switch on a gas stove could cause a (mild) electric shock. I read the first 3/4 pages of results, got one or two very low quality forum postings and that is all. Somebody in the world must have written about this problem, e.g. in a manual for gas repair engineers. Google didn't find it.

Where wild ducklings sleep at night. I found lots of articles about how to look after pet ducklings, which aren't relevant. I found one article about how wild adult ducks sleep at night, although it was in my view of dubious provenance. Nothing about wild ducklings. I spent only a few minutes searching as I was using my mobile phone (in a park), so had higher needs for finding the answer in the first few results than the above.

I suspect that people get different experiences of google according to: a) what kinds of knowledge they tend to search for, b) their skill at using it.

I increasingly find myself using Quora, and asking new questions on it, for the kinds of queries above.


It keeps you on Google's site, and thus provides more exposure to their ads. Let's take the example Google give, a search for 'Matt Groening'. Without this, you'll need to click through the first result (a Wikipedia entry) to find out his date of birth. But with the Knowledge Graph that's right on the results page. You don't need to leave Google at all.


Sorry, but how does a single search for Matt Groening keep you on Google more than a search for Matt Groening followed by a click on Wikipedia?

Also, I think these types of queries (people, places, things) rarely trigger ads. Based on the example queries from the post (Taj Mahal, Marie Curie, Matt Groening) there are no ads at all.


Have you used google lately? Thanks to SEO, the answer to your second question is "about 5 minutes ago."


If you have the query handy, I'd love to check it out to debug.


Please don't associate SEO with spam.


It's too late for that.

Fortunately, white-hat SEO is very easy to describe without mentioning search engines at all: copy editing, fact checking, designing for accessibility, and so on are valuable skills regardless of what algorithms search engines happen to use for ranking today. Write content for your users and you don't have to worry about optimizing for search engines.


SEO is like spam in that the world would be a better place if there were less of it.


How can Google answer that question now?:

Find all blogs of motorcycle journies 50 km nearby my current position.


I get poor search results daily, and half of them get "incorrected". "Did you mean UIView?" No, Google, I am working on OSX. I actually did mean NSView, that's an actual thing.


Apparently Google is chasing after a knowledge based web now; a semantic web of things. Google may end up sucking Wikipedia albeit categorized and also Wolfram Alpha.


Wikipedia is already available in a categorized and semantic format: http://dbpedia.org/About


All due respect to Google, but I give primary credit for this to Danny Hillis. He was gathering and processing the data for this project years before going public with it and before merging with Google. It's yet another Google acquisition that is probably going to be viewed by many Google users as another amazing Google innovation.

"Standing on the shoulders of giants"

Do they still use that slogan?


Google is more about "burying the giants in piles of money, and standing on that."


I wonder if [http://googledocs.blogspot.com/2012/05/find-facts-and-do-res...] is related. It looks similar, but I can't verify as docs is blocked at my workplace.


I can't try the search, but having tried the one in gdocs: yes it appears to be the same, to the point of showing search results when no factual data is available.

(I am baffled that someone would block google docs, you have my sympathy)


Sidenote: interesting how the guys in the video don't look directly at you, but do that lookaway thing, which I associate that with Apple videos ("we're launching this incredibly awesome piece of technology, now available to all of mankind enjoy, you're welcome").

Google, I expected them to look right at the camera and talk to me (for example, like Matt Cutts here: http://www.youtube.com/watch?v=ofhwPC-5Ub4).


A piece of the embarrassment of a number of "Google killers" has been the claim to supply something better than search. It will be interesting to see how Google itself looks under the kind of scrutiny that sort of claim inevitably brings.



Google has power to encourage site owners to make semantic markup for their pages which could lead to higher positions in search results for them. But Google doesn't do it. Why?


Google does encourage marking up pages with Microdata, Microformats and RDFa. http://support.google.com/webmasters/bin/answer.py?hl=en&... Google publishes tools to help with these. Additionally, better informing Google what your page contains leads to searches for your content being more likely to get to you.


Theoretically, yes. However, vast majority of site owners don't do it, even if they are interested in higher positions in SERP. There must be a reason for this.


More clutter.


I have to agree and even add it's more dangerous than that.

Never mind the fact that what attracted me to Google was its sober interface, its minimalist approach of results on the web (that's also why I like Hacker News). Never mind that Google tries to fit more information per inch square for no good reason and sacrifice readability of the "normal result", after all they are still a search engine.

But more worrying is that they are going to make assumptions without putting sources[1]. It's a very Orwellian approach to answers and that's something we should grow out of. There's a reason why we need sources: because what's written is just one of the way to see an event or somebody.

And Google, really your new Blogspot is awful, useless eye-candy.

[1] At least from what's shown on the blog.


None of this seems to be accurate. Their 'good reason' is that you searched for this information. That's why they're displaying it, in the most integrated way they can.

Your use of 'Orwellian' indicates you are just using buzzwords.


>Your use of 'Orwellian' indicates you are just using buzzwords.

It's legitimate. The point is that showing answers without sources makes Google the de facto "arbiter of truth". When Google's database updates its version of truth, that now becomes what is (or what has always been). There is no indication of different perspectives, or any analysis for how its version of "truth" was derived. That is very Orwellian.


I don't believe it is. Google is not the arbiter of truth, they are not dictatorially selecting the truth for the public. They are /searching/ for information and displaying the results of the search. They remain a neutral party in the middle.

It's hard to use 'Orwellian' when the entity you're accusing is entirely dependent upon other sources and exercises no editorial control.


> It's hard to use 'Orwellian' when the entity you're accusing is entirely dependent upon other sources and exercises no editorial control.

There's a lot of trust in Google with that statement. If this changes, how would you know? That's what's Orwellian about it.

It's not hard to imagine the results being silently tweaked by Google - not to say that they will do this, but it's a real danger, because it'd be very bad and hard to detect if they did do this at some point in the future, after we'd all gotten complacent and learned to implicitly trust the results.


That isn't Orwellian. If it was, it describes any resource which isn't instantaneously transparent. Lets say your clocks retrieve their time via radio signal broadcast. By your logic this is Orwellian because without checking external sources you wouldn't know if they changed the time!

Of course Google could use this for political gain or some other nefarious purpose, but they rely absolutely on user trust and so it would be an incredibly risky move.

Not to mention that looking at your watch or using bing or ddg or similar tools would show you the deception. It's just silly invoking Orwell over this I think.


The bold, black fact headings (Born, Died, Spouse, Children, etc) are actually links that you can click on to do a search for sources for that fact.


> after all they are still a search engine.

No, they are not.

"Google’s mission is to organize the world’s information and make it universally accessible and useful."


More relevant results.


And, if I understand this correctly, we can express more accurate and/or complex questions for which we will get those relevant results.


Or more useful information.

So far, Google has done an awfully good job of hiding everything not immediately relevant, so I'm going to give them the benefit of the doubt on this one.


Clutter?! How could you call that clutter! Just because they're shoving more information down our throats that we don't want doesn't mean it's clutter!

Sure, you still have to find the source of the "summary" to understand it and determine how they got that information so it will make sense to you. But it's better than just giving you the search results which already have the summary! Right?

And yeah, the search results already have links to all the different kinds of [taj mahal]. Now you can filter down the top pages to a specific type instead of clicking on the link that matches what you want! It's so easy it takes an extra click!

You just don't understand, man. Google knows what you want, even if it isn't what you want. And you'll take it and like it. Psht... clutter.


Relationships between facts are also facts. Having more facts enables you to do a slightly better job but I suspect that is all there is to it.


Where is this knowledge graph, how do I try it out?


I can't see this yet, but it looks like Google pull most of the structured info from Freebase (which they acquired in 2010). You can play with that at freebase.com.


How does this relate to OpenCyc?


Google's own DuckDuckHack.


Cool. Maybe Google could acquire Novamente or Numenta next.


This is just Wikipedia in-line in results...


Abandon social. Pursue this.


come on.. the magic is, they simply scrapped Wikipedia!!! :(


I wish people would integrate the punchline into programming languages.


didn't they try to launch knowledge graph some ten times after buying metaweb or something already?


As someone with a brain, I was wondering probably at least as far back as a decade ago why Google wasn't already doing this. The twenty questions approach, in general, is a powerful and simple way to refine a human query in an interactive way.


And that's where Wolfram Alpha became useless. It was just a matter of time after all.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: