
Why is RDF so old, complicated, unpopular and still not discarded? - cosmohh
http://www.semanticoverflow.com/questions/2918/why-is-rdf-so-old-complicated-unpopular-and-still-not-discarded
======
jashkenas
As a preamble, when RDF was conceived, databases drove many sites on the web,
but their data tended to only be exposed as HTML, instead of a more machine-
friendly format.

Now, there are two perspectives on what RDF is.

To an idealist, RDF is the universal data format. There are no semantics
baked-in, and you can write arbitrary subject -> predicate -> object triplets
to express any possible relationship. To an idealist, it's the perfect format
for exposing all the structured data on the web in a machine readable form.
The dream has always been for automatic agents to crawl the semantic web for
you, understanding the meanings of the RDF triplets, and using them to reason
out the solution to your query.

To a pragmatist, that dream has always sounded like a bunch of bull. Absent
the presence of strong AI, it's a complete pipe dream that a piece of software
will ever be able to infer the "semantic meaning" of interlinked RDF, just
because it happens to be defined by triples. At the end of the day, you're
going to have a programmer writing rules against specific terms in RDF, and if
that's the case, than RDF is nothing more than an _extremely_ awkward API.

Fortunately for the web, the pragmatists won. APIs are everywhere, and RDF is
nowhere.

Unless strong AI happens to be right around the corner, the web dodged a real
bullet there. Personally, I'm of the opinion that any web agent that could
possibly puzzle through RDF triplets should have no problem understanding our
APIs, in any case.

~~~
lukev
Not quite. While RDF certainly isn't all that the idealists claim, you can
still get some benefit of it without strong AI.

Mainly, it provides a consistent model for handling the notion of a "field".
Non-RDF apis typically return fielded JSON or XML, the structure of which is
only specified within the documentation. In order to integrate two services
not originally designed to inter-operate, you have to write lots of custom
glue code.

RDF is at least amenable to writing generic "rules" to govern field mapping
and inference, rather than one-off glue code (which usually ends up being a
hacky script). So sure, if you're integrating one service, a hacky script is
probably easier. But if you want a coherent _system_ for integrating large
numbers of services not originally designed to inter-operate, RDF makes things
a lot easier.

So there's some benefit, even if it isn't as dramatic as its proponents claim.

Plus, there's the fact that while strong AI isn't yet on the horizon, RDF is a
lot easier for weak AI (inference engines, data mining, etc) to ingest, and
weak AI is getting better all the time.

Actually, one of the biggest problems with RDF, to my mind, is that it's
structure makes it very difficult to get good performance with truly large
numbers of subjects and attributes - and unfortunately that's just the area
where it'd be most useful.

~~~
jashkenas
My claim is that your "generic rules" to govern mapping fields are _actually_
equivalent to hardcoding the names of JSON fields. Instead of seeing "title",
and deciding what to do with the data, you see:

    
    
        <!DOCTYPE rdf:RDF PUBLIC "-//DUBLIN CORE//DCMES DTD 2002/07/31//EN"
          "http://dublincore.org/documents/2002/07/31/dcmes-xml/dcmes-xml-dtd.dtd">
    
        <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
          xmlns:dc="http://purl.org/dc/elements/1.1/">
    
        <dc:title>
    

... and decide what to do with the data.

In both cases, instead of having a machine understand the (semantic) structure
of the data, you have a programmer writing a rule.

Can you provide a concrete example of where this isn't the case, and your RDF-
reader is simpler than reading the equivalent (well-designed) JSON of the same
data?

~~~
mindcrime
It's not just consuming the data though... it's when you go beyond that and
start doing inference and combining multiple databases, that the RDF approach
really shows it's value.

If you established a standard for doing that kind of field name exposure,
using JSON, and then sure, you could achieve the same effect. But, in the end,
you'd probably just wind up with a JSON encoding of RDF anyway. Define things
as subject/predicate/object is all RDF really is... the RDF/XML encoding is
just one way of expressing RDF.

~~~
jashkenas
Hey, I really wish that "subject/predicate/object" was all that RDF was, but
I'm afraid it's a good deal more:

RDF Syntax: <http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/>

RDF Schema: <http://www.w3.org/TR/2004/REC-rdf-schema-20040210/>

RDF Semantics: <http://www.w3.org/TR/2004/REC-rdf-mt-20040210/>

(Those are all current W3C standards)

~~~
ekidd
I agree that a lot of the standards surrounding RDF are ugly. I've always
particularly disliked the RDF-as-XML serialization, which took two fairly
simple ideas (triples and XML) and combined them into a complex mess. This is
why I always hate parsing RSS 1.0. Also, the full generality of OWL just
confuses me: It seems to be Prolog done badly.

But just as with XML, it's possible to ignore the cruft (XQuery, XLink, XML
Schema, the current SOAP flavor-of-the-month), and just use the useful bits. A
similar argument could be made about HTML: For every HTML 5, there's an XHTML
2.0.

~~~
lukev
Agreed. You can get all the benefits of RDF while eschewing the stupid parts.
Just because something has a spec doesn't mean you have to use it.

The full XML spec, for example, is _insanely_ complicated. But people still
derive value from it by utilizing a more or less sane subset.

------
ekidd
[I have a client who sells RDF tools. Here's the latest version of what I've
been saying to them.]

Let's look at RDF like a startup: The old RDF marketing from, say, 2003 was
hopelessly out-of-touch with reality. Users were never going publish their
metadata as RDF, and even if they did, you'd need strong AI to use it. Here
are two classic articles spelling out why classic RDF wouldn't work:

<http://www.well.com/~doctorow/metacrap.htm>
<http://www.shirky.com/writings/semantic_syllogism.html>

But things have been looking up in the RDF market lately. The complicated RDF
XML serialization is mostly ignored in favor of simple n-triples. Google is
making heavy use of RDFa metadata when searching for products, and something
like 3.5% of web pages now contain RDFa. The RDF conferences are booming.
There are cool projects like dbpedia that are organizing publicly-available
information as RDF.

So if the RDF tool vendors are going to succeed, they need to pivot (and many
of them are). They need to drop the AI hype, and focus on what their early
users are telling them. Some possible sales pitches:

1) RDF is useful as a distributed, schema-free graph database. Competition:
Neo4J, other NoSQL databases. There's a couple of very good sales pitches
here, including the fact that RDF databases are available from multiple
vendors, and that RDF inference can be used to normalize schemas between
different data sources.

2) RDF is useful for embedding small amounts of data in web pages.
Competition: Microformats.

------
antoniogarrote
RDF/XML serialization of RDF graphs can be painful. I completely agree that
other serializations like Turtle should be used in recommendations (e.g.
R2RML).

But the RDF model is a wonderful thing. The use of URIs as identifiers for
objects and properties makes possible for the first time to reuse knowledge
and share data, linking APIs in the same way we are alreay linking web pages.

RDF semantics are maybe harder, but most people can start using RDF without
caring about things like entailment.

I really think RDF has a future, specially since the steady growth of the LOD
initiative. The revision of the standard is also a good opportunity to polish
some aspects of RDF, for example, the use of named graphs.

------
Ixiaus
I see a lot of RDF bashing. Particularly from the "Web 2.0" crowd. It has
warts, no doubt (is anything borne of the human mind without warts?), but it
also has its strengths. A lot of people say it is a failed technology but it's
less the failure of the technology and more a failure of the people saying so
to properly understand what it can/does do.

Is that a failure of the technology? Because most people don't understand
_what_ they would use it for or _how_ they would apply it? I don't think so. I
think the claims that it would change the web were high-flown. I also think
its creators did a bad job of explaining it. However, you'll find the people
that do understand it and have a domain in which it is _clearly_ applicable -
love it.

One of my friends works for the library at UCSD and they use RDF, RDFS, and
OWL-DL _extensively_ \- I couldn't even imagine doing what she does with the
library's book ontology using JSON (as some have proposed replace XML and it's
vocabularies, even with a JSON "schema" language). I have another friend
working for a biotech company - and he uses it there, extensively. These are
only two examples and it excludes the other web projects and companies out
there that also use it and it's higher level vocabularies/ontologies (UMBEL,
etc...).

Is it a failure for the web? (a topic in another thread a few days ago) I
don't think it is, I think it is a failure on the part of developers to
understand it and apply it (Drupal has applied it).

------
david927
It was the right idea and almost the right implementation, but almost is a big
word -- like a rocket that _almost_ has escape velocity.

There's work being done in this space that is really exciting and I think
we'll soon see how much potential the semantic web really has.

------
_mql
A major problem with RDF is that it is almost impossible to build applications
on top of it. You'd need another layer of abstraction to handle the
complexity.

I really like the approach Freebase takes. It’s a proprietary format,
basically. They use JSON rather than XML and they have their own query
language MQL (also expressed using JSON). However their graph of entities maps
to RDF as well, so they use RDF (along with common ontologies) as an export
format. I think while their system is still complex under the hood, it’s less
verbose, less scientific, and more user-friendly w.r.t. the public interface.
And most important, thanks to MQL it's super easy to build applications on top
of it.

\---

For me modeling data as a graph (as RDF proposes) is a really great idea! What
I always wanted to do is building client applications (single-page web apps)
that can operate on a graph of data directly (instead of talking to a REST
service). That's why I'm putting efforts in the creation of Data.js, which
features a Data.Graph that can be manipulated in JavaScript environments (like
the browser or Node.js). Such Data.Graphs can be persisted (synced) at any
time. There's support for CouchDB as a backend.

Well, in the README I also pointed out why I decided not to use RDF and
instead took inspiration from the Metaweb Object Model (that Freebase uses).

<https://github.com/michael/data>

I'd enjoy some feedback btw. The lib is actually working, but the examples are
out of date. Have a look at the source or ping me if you want to try it out.

------
donohoe
Hands up anyone who uses RDF (when they could choose to do otherwise)? Anyone?
_Bueller?_

~~~
stevenbedrick
Plenty of people (including myself) in biomedical fields (especially in
bioinformatics, although I know of numerous medically-oriented users) use RDF
for a variety of purposes- it's a fabulously flexible way to express and
encode complex data models. With a little bit of coordination regarding
semantics (OWL, etc.), it's also a great way for people in related but
distinct fields to share their data with one another.

That said, RDF has historically had three main problems, IMHO. The first is
that its proponents have, historically speaking, done a crappy job of
explaining what it actually is, and a worse job of demonstrating what it can
do.

The second problem is that it's not really the most human-readable format
around, especially in its XML serialization format. This is largely because it
wasn't designed for human-readability, but in the age of the Web (where human-
readable formats have shown to have major advantages over non-human-readable
ones), this is a liability. They've added some more readable serialization
formats over the years, but a lot of the documentation and tutorials assume
that you'll be working in XML.

When I first tried to learn about RDF, years ago, all I found were tutorials
full of really gnarly-looking XML with minimal explanation of what was going
on. I spent so much time getting bogged down by the syntax that I missed the
point entirely.

The third, and bigger, problem with RDF, is that it's a little hard to "grok"
if you don't have a good grounding in predicate logic, description logics, and
some of its other theoretical underpinnings. Actually, wait, maybe I said that
wrong- I should say instead that most of the old-school RDF people _do_ have
solid theoretical backgrounds in description and predicate logic, and have a
hard time talking about RDF and other semweb technologies with people who
people who _don't_ have that background. So they use lots of jargon that,
while useful, isn't very helpful to somebody just trying to get their feet
wet.

~~~
szany
Have you found any good introductions?

~~~
stevenbedrick
Personally, I found Antoniou & van Harmelen's "A Semantic Web Primer" to be
very useful. I felt like it covered a lot of ground at just the right level-
enough to explain what was going on and why, but not so much that it got
bogged down in pointless detail. However, YMMV- I know some people who didn't
care for it.

It's a little bit dated at this point, but Shelley Powers' "Practical RDF" was
also helpful- but, again, I got a lot more out of it once I'd read Antoniou
and had internalized the whole RDF thing at a "30,000 foot" level.

BTW, I just Googled for the Antoniou book to make sure I had remembered its
authors correctly, and it turns out that one of the first results is a PDF
version of the entire text. I don't know what it's doing up online, but grab
it while it's hot. If it's helpful, Powell's has a couple of used copies:

<http://www.powells.com/biblio/1-9780262012423-2>

------
jerf
Many people here are answering the question of why RDF sucks, but that was not
the question asked. The question is why this suckage has still not managed to
bury the technology.

There is a very frequent problem people suffer from, which is mistaking
_goals_ for results. When you start looking for it, you'll see it a lot. A new
open source NoSQL database will pop up, post a long list of goals ("Fastest
performance, maintain some integrity, transactions, trivial sharding,
consistent available _and_ partition tolerant, and able to run on a TI-83 at
web scale!"), put up a benchmark that shows that if you have no features and
have no code written to ensure you don't fall down under real load you can put
up _way_ bigger numbers than the products with features, and suddenly you have
some set of very excited people. Why are they excited? It's not the code; the
code is worthless, the _only_ thing it can do is run that benchmark. It's the
promises.

You can see it in graphical programming. Graphical programming has a few
modest successes, but the _promises_ are about changing how everybody programs
and how even Granny will be able to program. The fact that it has never
happened despite immense effort for tons of smart people doesn't stop a
certain segment of people from still being True Believers.

And, today, we talk about RDF. It promises to organize the web, it promises
glorious wonderful search engines, it promises the world. It can't deliver,
because merely sticking URIs on some graph nodes is only the beginning of the
solution, not even remotely the end, you still have issues of agreement and
accuracy and all kinds of other things. But the promise is _so beguiling_ that
some people just can't give it up, if we just try harder it'll happen, it's
just that nobody has done it right yet, I'm smart enough to see what the
previous hundreds of smart people haven't and I'll get it right, oh, it'll be
_glorious_ when everybody gets their heads out of their ass and listen to me
and just start doing it _right_.

But RDF can't get us there. It's so general it's nothing at all.

There are all kinds of places where people become excessively bedazzled by
promises and never notice the concrete reality before them. Another
interesting example is Object Orientation. This has proved useful, IMHO, if
not the be-all end-all of development methodologies, but it is interesting to
contrast the _promises_ made by OO back in, say, the late 1980s, with OO
reality today. The promises were about how objects can represent things in the
real world and you can model the real world with them. This turned out to be
bunk. The real world has some place in OO but only carefully layered and
wrapped and mixed in with a lot of other not-real-world things, iterators and
factoryfactories and facades and data structs and ORMs and so on. The old
promises were interesting and wrong, but also so beguiling that even today you
will still hear this nonsense spouted about how this is the purpose of OO,
even though it is now well understood that writing your programs with an
excessively-strong tie to physical reality is asking for problems. Even as the
reality is actually useful the old beguiling promises are still around
screwing young developers up to this day.

(It is a tricky balance maintaining the proper level of skepticism because
conditions change and sometimes wild promises become practical, and sometimes
someone really does manage to pull off one of these things. The latest example
would be the commercial success of the iPad, because for a long time smart
money was on there being no market for tablets after numerous and repeated
failures in creating the market. But in general, "show me the code" or
appropriate manifestation is still the best way to avoid being trapped in one
of these marketing traps, you will miss out on a few hits but pass on dozens
of losers.)

(Also, I am aware there are still some True Believers using RDF. My point here
is not disproved by a couple people using it, even using it in a big way. My
point will only be disproved if someone brings about RDF Utopia, the actual
promises. Of course you can bash RDF into submission, but that doesn't prove
it was the _best solution_ for your problem.)

~~~
haberman
Great point. I think another example is XML. It was going to free us from
proprietary, binary data formats by making everything nice and understandable
pointy brackets. Everyone understands things written in pointy brackets,
right?

The reality is that we got OOXML and ODF, which are so enormously complex that
no piece of software except Microsoft Word and OpenOffice could hope to
_fully_ implement them. If there was an ACID3 test for either it would be
fully clear how bad and non-interoperable the situation is. We got XHTML,
which turned out to be a horrible idea (
[http://diveintomark.org/archives/2004/01/14/thought_experime...](http://diveintomark.org/archives/2004/01/14/thought_experiment)
) and is now abandoned. We got XML schema, because everyone realized that when
you have structured data you want data types, instead of everything being just
text.

And because XML seduces you into thinking it's much simpler than it actually
is, people write their own parsers and generators all the time that have no
hope of knowing what to do with a CDATA section, and you get parsers that will
fail if the whitespace isn't just exactly right. The whole thing is a giant
farce.

Just like you say, people got excited about the _promise_ of XML. This is the
perennial problem when people try to create standards in an area that doesn't
actually have any compelling implementations yet. Good standards refine and
codify existing practice. Bad standards try to invent something and
standardize it at the same time.

~~~
jerf
I would agree. XML is like OO, it's actually still useful in some cases, but
if you don't have marked-up textual content you're probably doing it wrong. I
wonder what fraction of XML in the world is actually marked-up textual
content, rather than data masquerading as marked up text?

