Hacker News new | past | comments | ask | show | jobs | submit login
Google acquires Metaweb (Freebase) (googleblog.blogspot.com)
154 points by aschobel on July 16, 2010 | hide | past | favorite | 40 comments



This is huge for many reasons, but namely, this could finally lead to the "semantic web." Metaweb's video, which is linked to in the article, explains part of the "how".

The problem with the semantic web is that many need to embrace it. Many people need to tag text with these "bar codes" (uniquely identified entities). That can take a big effort and there has to be a ROI for this big undertaking. The other is that there is no standard. Well, Google just solved those. With a dominant market share, you don't need someone to agree on a standard, you just force them to--or else they lose out to competition. And as far as the ROI in tagging web pages? Well, what's the ROI on SEO? This will bring about a new form of SEO, except that Google can now undercut many of the search results and answer many of the queries directly--so that'll get interesting... and I'm sure Wolphram Alpha certainly agrees.

Google was also smart to buy Metaweb in order to give web app developers a good reason to use their entities and just FB's open graph entities.

Congrats to the Metaweb team! Freebase + Wikipedia are two of the best gifts to humanity.


The problem with the semantic web is you need a universal ontology. i.e. you need everyone to agree on the same thing. Cory Doctorow's Metacrap explains more http://www.well.com/~doctorow/metacrap.htm


That's a strawman. You in fact don't need a "universal ontology," you just need people to agree on first principles (e.g. URLs are unique, there are things called triples, etc.)


Yes. Absolutely correct. I worked at a semweb company for years, it's all about making a small functional ontology for a particular purpose.

Of course, that leads to the question : is that ontology actually relevant or is it just important that the data is structured?


I think you do need a universal ontology if you want to make the kind of progress the semantic web people talk about. If you just have a bunch of small, separately created ontologies, the situation can indeed seem great until each expands. Then the intersections and ambiguities become huge.

Sure, if you weren't concerned exactness and lack of ambiguity, you could expand the world of triples into a giant, poorly organized collection of information. It would be kind of like the web. The approach "works" but we, uh, already have the web.

Also, the Doctorow document excellent. Anyone expected naive metadata to be extensible should have a reply to it.


If you just have a bunch of small, separately created ontologies, the situation can indeed seem great until each expands. Then the intersections and ambiguities become huge.

Inferencing solves this problem.


I think even small agreed upon ontologies in different areas would be a big step up from the virtually none that we have. Current services like the experimental google squared currently have nothing to work with. Even a complicated mesh of partially interlinking ontologies would be better.

What about Facebook's new metadata? What are people in the semantic web area saying about that?


That's a reason why the Semantic Web won't wake up and become conscious, like many bad sci-fi stories.

However, if you think of it "vocabularies for people to tag their own stuff in a structured way", which Google can then index and traverse, then it's more realistic.

That said, the Semantic Web people are indeed guilty of hyping this technology as being able to "reason" on its own.


Many people need to tag text with these "bar codes" (uniquely identified entities).

This sort of thinking seems to be common in people who like the idea of the semantic Web but who are pessimistic about its implementation. I'm not sure it's going to be the case.

As we've seen happen with other technologies, I suspect we'll see a MetaWeb style approach of "deriving the barcodes" from existing and unformatted content. This will not be a 100% accurate process, but will be "good enough" to make the semantic Web a realistic and large scale underpinning to the next generation of search systems.


Deriving the ontology potentially has more value because it can help avoid spammers. If anyone can just assert that their content is of a specific type without any kind of verification, we end up with the meta tag keywords attribute all over again, where none of that data is trust-able.


Agreed. It won't be an entirely manual process. There are services out there like Zemanta who do a good job of automatically tagging text.


Good thing is that several different unique ids can be interlinked to the same entity. I was using Wikipedia page_ids and IMDB's titles when playing around with it previously. It's a little bit messy, but it works.


Exactly! And freebase has a lot of this data mapped. This data is so rich it holds huge potential.

For example (for those that are unfamiliar with the richness of this data), visit: http://www.freebase.com/view/en/y_combinator

And there you'll note that Y Combinator is mapped to the official site, it's Wikipedia page, it can tell you who the founders are, etc. If you link to Paul Grahm, you can then find out what his personal site is, and so on.


It's even more -- you can do really cool queries with the data. E.g., ask for all people linked less than three degrees of separation from PG and whose ages are between 20 and 40 (as a rather lame example).

I do hope Google will do something cool with this acquisition.


Congrats to Metaweb and Applied Minds. Those guys are uniformly brilliant and it's great to see Google sharing the deep interest Metaweb has in curating a great, accessible repository of semantic data.


Imagine if we could do this with all of Google's data (A very cool search engine built on top of Freebase):

http://news.ycombinator.com/item?id=1497100


That is so cool (I think). I have been waste deep in Freebase for a few weeks for a work task.

There is a lot of cruft in Freebase, but with some manual effort and some automation, it is a good source of a wide variety of information. Depending on application, DBpedia and GeoNames are other good resources for structured data.


The name didn't immediately ring a bell until I Googled it: Metaweb == Freebase.


I updated the title, thx


When I saw Metaweb in the headline what immediately sprang to mind was the excellent wikipedia they maintained for Neal Stephensons's "The Baroque Cycle". It was called Quicksilver wiki.

I myself learned about it from Stephenson himself during a presentation he did for the book in the now defunct Cody's Books in Berkeley. After 2-3 years of active growth the Quicksilver wiki disappeared from metaweb's site. I was wondering if Google will restore the wiki too?

Metaweb briefly mentioned at the end of this article in Wikipedia: http://en.wikipedia.org/wiki/Quicksilver_(novel)


So we are going to see a lot of zero click info on Google. Like DuckDuckGo. Good.


This seems very significant to me. So far, there has always been this opposition between algorithmic extraction of meaning and modeled structed data. It never made sense to me, because using both leads to so much better query results. I've been doing it for years and I was starting to wonder why the idea isn't catching on. It's been a really tough sell. I hope this is going to be a real breakthrough. Managing spam will be difficult though.


How much?


If they unlock the Freebase data and associate UPCs/EANs/other bar codes and structured data around it (which Freebase has done in a hap-hazard fashion), they could really do some pretty awesome stuff.


For those of us who have worked closely with Metaweb's products and had a professional relationship with them (Powerset, my previous employer pre-acquisition, was fairly close to them back when we started out) over the past few years, this is great to see. Metaweb has been providing several invaluable services to everyone interested in NLP, Search, and smarter software in general.

Glad you got your payout, guys. Hopefully now the full power of Google's infrastructure can make Freebase fast and enormous.


err interesting note Freebase powers some part of Bing Search


Definitely a big congratulations to these folks. They've done great work and it's definitely a great day for the promise of the Semantic Web.


This is great news - Metaweb always had pretty good APIs that made a friendly kid in the block. I can see this being immediately useful everywhere for google's offerings. Man, YouTube would be friggin awesome metadata for say music videos or a few hundred more layers for google earth. Sweet!


Are there anyone here that downloaded and played with their data set (PG dump). I have fiddled with one of their table to process their Wikipedia data. I think Google bought it for their supposedly wicked-fast GraphDB.


I haven't downloaded the data set for the fact that it doesn't look like the accompanying MQL server is available for download. Am I correct on this?

MQL is sweet (e.g. give me all of Tom Cruise's movies since 1995 that have cost over 10million dollars to make), but what good is the data dump if all I can do is resort to simplistic SQL queries? MQL support is just as important as the data itself.

I'm sure that I'm overlooking something simple as usual (shameless self-deprecation reference in an attempt to hold myself less accountable in the event that there's an easy solution available but I was just too lazy to find it).


AFAIK you need to process that data - put it into a graph db or a RDF store. SQL/MQL/RDF/OWL are all just query languages that can query the data. BTW there is no open source MQL implementation.


Freebase's blog post on graphd (their speedy, non-open-source tuple store):

http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/


the place where I'm now (http://cascaad.com) does use the freebase dumps as a primary source for out entity database (though more are in the works). It is pretty usable even if some interesting data is not available. The dev list and irc channel is also a very useful source.

We hope to be able to provide rdf/rdfa links at some point, because, hey, the semantic web is a darn cool thing :)


wow I was interested in joining Freebase, now I need to join google!


Maybe a smart pre-emptive strike against twitters annotations?


I have no idea what you mean by this. Twitter is not trying to become a database or Wikipedia. Tweets are short communications and signalling, which they want to enrich with more data for specialized applications.


That is an interesting idea!



Another try to compete with wikipedia? This time it might be successful. Wikipedia lacks structure and API to fetch different content types.


Search. Monopoly.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: