I'm happy some Semantic Web proponents understand that blind crawling won't work. But TimBL disagrees:
> I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web – the content, links, and transactions between people and computers... The ‘intelligent agents’ people have touted for ages will finally materialize.
The debunking explicitly agrees with Shirky's conclusion, and should have given more serious scrutiny to his premise. The RDF format deals with "triples" precisely to enable inferences ("syllogisms"). Syllogisms are the only thing the SemWeb brings to the table that wasn't there before. If we have to pick sources and massage data by hand as you say, then I'll go with CSV files.
Where does TimBL say that "intelligent agents" will be blindly crawling? Certainly agents have to follow links they haven't seen before (there wouldn't be much point if they didn't), but following links provided by trusted sources is vastly different from what Google does.
> The RDF format deals with "triples" precisely to enable inferences ("syllogisms").
As far as I know, this is not and has never been true.
RDF deals with triples because they're a small unit of data, which makes it easy to take the chunks you want from one dataset and graft them onto another set.
I suppose you can call matching URIs to graft one triple onto another a syllogism, but it would be a stretch; if that's a syllogism then so is joining two tables in a relational database. It has nothing in common with the ridiculous examples Shirky uses.
> If we have to pick sources and massage data by hand as you say, then I'll go with CSV files.
Have fun merging data from multiple sources. RDF can't make this completely painless, but it can make it easier than CSV files.
Your third article doesn't make much sense to me. How is RDF "semantically committed"? An individual RDF vocabulary is "semantically committed", but so is an individual XML schemas or a documented use of JSON. RDF (like XML and JSON, and the generic tools for all 3) doesn't care what you put in it.
> sometimes it is less than evident why one should bother to map an application in RDF. The answer is that we expect this data, while limited and simple within an application, to be combined, later, with data from other applications into a Web. Applications which run over the whole web must be able to use a common framework for combining information from all these applications. For example, access control logic may use a combination of privacy and group membership and data type information to actually allow or deny access. Queries may later allow powerful logical expressions referring to data from domains in which, individually, the data representation language is not very expressive.
I'm not sure if this quote supports my point of view or yours, or even if there's any factual difference between our views.
When I talk about merging data, I'm talking about taking two independent documents:
<brian> parentOf <bct>
<brian> name 'Brian'
<bct> name 'Brendan'
and being able to join those graphs on the <bct> node, to say that a person named Brendan has a parent named Brian. This is what TimBL means by combining data from multiple applications (IMO).
This is trivial for software to do and takes a lot of the effort out of merging datasets. It's what makes the semantic web a web; you're linking different datasets together. I don't see how Shirky's arguments apply here.
Good point. I think you've changed my mind about the utility of inferencing :).
The difference between querying and inferencing isn't what I was trying to emphasise, though. My point was the difference between being designed for making queries/inferences within a dataset, and being designed for joining distinct datasets.
Querying within a dataset is easy: SQL, XPath, XQuery, LINQ, etc. You can write rules for transforming any data model that you can query.
RDF isn't anything special in these areas (though I do think that SPARQL is an awfully nice query language). What it gives you is a way to link and merge datasets.
Something like this is really what we need, and the thing that would be really revolutionary about the web. Otherwise, even though people talk about the information singularity and such, there isn't a really high useful information signal. Too much useless or mediocre information is worse than useless because it makes everyone more stupid.
But, this is also not only a technology solution. People have to make a choice themselves to filter and promote good information.