This is huge for many reasons, but namely, this could finally lead to the "semantic web." Metaweb's video, which is linked to in the article, explains part of the "how".
The problem with the semantic web is that many need to embrace it. Many people need to tag text with these "bar codes" (uniquely identified entities). That can take a big effort and there has to be a ROI for this big undertaking. The other is that there is no standard. Well, Google just solved those. With a dominant market share, you don't need someone to agree on a standard, you just force them to--or else they lose out to competition. And as far as the ROI in tagging web pages? Well, what's the ROI on SEO? This will bring about a new form of SEO, except that Google can now undercut many of the search results and answer many of the queries directly--so that'll get interesting... and I'm sure Wolphram Alpha certainly agrees.
Google was also smart to buy Metaweb in order to give web app developers a good reason to use their entities and just FB's open graph entities.
Congrats to the Metaweb team! Freebase + Wikipedia are two of the best gifts to humanity.
I think you do need a universal ontology if you want to make the kind of progress the semantic web people talk about. If you just have a bunch of small, separately created ontologies, the situation can indeed seem great until each expands. Then the intersections and ambiguities become huge.
Sure, if you weren't concerned exactness and lack of ambiguity, you could expand the world of triples into a giant, poorly organized collection of information. It would be kind of like the web. The approach "works" but we, uh, already have the web.
Also, the Doctorow document excellent. Anyone expected naive metadata to be extensible should have a reply to it.
I think even small agreed upon ontologies in different areas would be a big step up from the virtually none that we have. Current services like the experimental google squared currently have nothing to work with. Even a complicated mesh of partially interlinking ontologies would be better.
What about Facebook's new metadata? What are people in the semantic web area saying about that?
Many people need to tag text with these "bar codes" (uniquely identified entities).
This sort of thinking seems to be common in people who like the idea of the semantic Web but who are pessimistic about its implementation. I'm not sure it's going to be the case.
As we've seen happen with other technologies, I suspect we'll see a MetaWeb style approach of "deriving the barcodes" from existing and unformatted content. This will not be a 100% accurate process, but will be "good enough" to make the semantic Web a realistic and large scale underpinning to the next generation of search systems.
Deriving the ontology potentially has more value because it can help avoid spammers. If anyone can just assert that their content is of a specific type without any kind of verification, we end up with the meta tag keywords attribute all over again, where none of that data is trust-able.
Good thing is that several different unique ids can be interlinked to the same entity. I was using Wikipedia page_ids and IMDB's titles when playing around with it previously. It's a little bit messy, but it works.
And there you'll note that Y Combinator is mapped to the official site, it's Wikipedia page, it can tell you who the founders are, etc. If you link to Paul Grahm, you can then find out what his personal site is, and so on.
It's even more -- you can do really cool queries with the data. E.g., ask for all people linked less than three degrees of separation from PG and whose ages are between 20 and 40 (as a rather lame example).
I do hope Google will do something cool with this acquisition.
Congrats to Metaweb and Applied Minds. Those guys are uniformly brilliant and it's great to see Google sharing the deep interest Metaweb has in curating a great, accessible repository of semantic data.
This seems very significant to me. So far, there has always been this opposition between algorithmic extraction of meaning and modeled structed data. It never made sense to me, because using both leads to so much better query results. I've been doing it for years and I was starting to wonder why the idea isn't catching on. It's been a really tough sell. I hope this is going to be a real breakthrough. Managing spam will be difficult though.
That is so cool (I think). I have been waste deep in Freebase for a few weeks for a work task.
There is a lot of cruft in Freebase, but with some manual effort and some automation, it is a good source of a wide variety of information. Depending on application, DBpedia and GeoNames are other good resources for structured data.
If they unlock the Freebase data and associate UPCs/EANs/other bar codes and structured data around it (which Freebase has done in a hap-hazard fashion), they could really do some pretty awesome stuff.
This is great news - Metaweb always had pretty good APIs that made a friendly kid in the block. I can see this being immediately useful everywhere for google's offerings. Man, YouTube would be friggin awesome metadata for say music videos or a few hundred more layers for google earth. Sweet!
For those of us who have worked closely with Metaweb's products and had a professional relationship with them (Powerset, my previous employer pre-acquisition, was fairly close to them back when we started out) over the past few years, this is great to see. Metaweb has been providing several invaluable services to everyone interested in NLP, Search, and smarter software in general.
Glad you got your payout, guys. Hopefully now the full power of Google's infrastructure can make Freebase fast and enormous.
When I saw Metaweb in the headline what immediately sprang to mind was the excellent wikipedia they maintained for Neal Stephensons's "The Baroque Cycle". It was called Quicksilver wiki.
I myself learned about it from Stephenson himself during a presentation he did for the book in the now defunct Cody's Books in Berkeley. After 2-3 years of active growth the Quicksilver wiki disappeared from metaweb's site. I was wondering if Google will restore the wiki too?
I have no idea what you mean by this. Twitter is not trying to become a database or Wikipedia. Tweets are short communications and signalling, which they want to enrich with more data for specialized applications.
Are there anyone here that downloaded and played with their data set (PG dump). I have fiddled with one of their table to process their Wikipedia data. I think Google bought it for their supposedly wicked-fast GraphDB.
I haven't downloaded the data set for the fact that it doesn't look like the accompanying MQL server is available for download. Am I correct on this?
MQL is sweet (e.g. give me all of Tom Cruise's movies since 1995 that have cost over 10million dollars to make), but what good is the data dump if all I can do is resort to simplistic SQL queries? MQL support is just as important as the data itself.
I'm sure that I'm overlooking something simple as usual (shameless self-deprecation reference in an attempt to hold myself less accountable in the event that there's an easy solution available but I was just too lazy to find it).
the place where I'm now (http://cascaad.com) does use the freebase dumps as a primary source for out entity database (though more are in the works). It is pretty usable even if some interesting data is not available. The dev list and irc channel is also a very useful source.
We hope to be able to provide rdf/rdfa links at some point, because, hey, the semantic web is a darn cool thing :)