Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: why isn't RDF more popular on HN?
15 points by sktrdie on Mar 29, 2014 | hide | past | favorite | 23 comments
I've learned about the idea of storing data in the form of triples not too long ago. This is essentially what RDF allows and since I've started using this to model my data, I've never really looked back to less interoperable data models. I know RDF has a long history of misconceptions, but so does JSON or HTTP or many other standards out there. I'm wondering why isn't RDF more popular within the startup scene, because it's definitely popular and has shown its power in academia and life sciences; just look at the Linked Data cloud.

I personally stopped waiting for RDF to become relevant after years of fighting with impenetrable, mutually incompatible specs (page A refers to draft B which has subsequently mutated or 404ed, etc. and although it's years later nobody has updated anything and all of the examples predate either), and horrible tools — nothing release quality, examples don't validate, bugs unfixed for years in the issue trackers, etc. Vendors promise that everything will be sunshine and rainbows if you buy their commercial offerings but the costs are high enough to make any conceivable benefit dubious, etc.

It seems like the single most useful thing which could happen for adoption would be cleaning that up: a clear use case which isn't already well solved and a good tutorial outlining the benefits with non-trivial examples which follow the current standards, validate and introduce production quality tools. If any of that exists, it's managed to stay completely off the mainstream computing radar, which is a pity given the number of smart people who've poured considerably amounts of time into it.

As an example: a lot of the RDF community hated schema.org when it was announced because it used HTML5 microdata instead. If you were a web developer, the case for microdata was easy: add a couple of simple HTML attributes, use one of multiple high-quality validators to test your markup, and Google/Bing/etc. would return better search results for your data. At the same time, it was daunting to write an RDFa equivalent because there were no complete, current, non-contradictory docs and examples, the W3C validator had been broken for over a year (it was at least up, unlike all of the other online validators) and nothing actually supported it so you'd be investing hours or days instead of minutes in the hope that at some point in the future it would become useful. Most people with jobs to do are just going to wait until the demand side of that equation is more favorable.

The best example of linked data in widespread usage is Facebook's Open Graph markup but it appears that almost everyone simply stops as soon as they get the desired results in Facebook and, predictably, almost none of the examples actually validate because of tooling and cultural issues.

But how do you explain the large amount of research going into these topics? They must be linked to industry and therefore used otherwise there wouldn't be so many publications on things like RDF. Perhaps HN and the startup culture likes to concentrate on less academic things, but you can't just ignore an entire field of research simply because the tools weren't mature enough or minimal enough when you first used them.

Perhaps the Semantic web community is doing a bad job at making these topics exciting, but the concepts still stand or else there wouldn't be an entire branch of research dedicated to them.

So in my opinion we should be more positive about the very valuable knowledge being generated by these communities, and perhaps transform it or create tools around them to make them more user friendly and more exciting.

I guess overall what I'm saying is that the concepts of the Semantic web are valid no matter the cumbersome culture around them. We should judge the ideas not the tools.

There's a lot of literature on Turing machines, too, but nobody builds them in silicon, either.

RDF is for databases what Turing machines are for computation: the minimal model that abstracts away as much as is possible without sacrificing capabilities but ignoring anything performance (memory usage, execution time) related.

There are two separate thoughts here:

1. I'm not saying the sem web concept is useless but that poor quality and unnecessarily high costs make it a worse deal. More practical focus on specs and solid software engineering would avoid making that trade-off worse unnecessarily.

2. Many topics which are of interest to academics either aren't generally useful and too many academics are dismissive of mundane concerns like software engineering. My first point falls solidly into that latter case. The former case isn't proven – it's entirely possible that we'll eventually see something interesting develop with more time, although as another commenter pointed out, it's likely that we'll see something different which uses the good parts but doesn't give up as much for theoretical purity. Of course, it's also possible that they'll spend more time having academic debates over HTTP range-14 and what will actually be adopted will come from outside the current semweb community entirely – after all, the “Web” in SemanticWeb refers to a technology which saw rapid growth due to ease of adoption, even though the academic hypertext community wasn't impressed because it was too simple.

In every case, more attention to building useful tools would massively increase the number of people working with semweb technology and lower the cost for everyone doing research. Everyone who wants that class of technologies to succeed should be focused on lowering that barrier to entry.

Could not have said it any better. I kind of want to rm my posts now. This expresses my exact frustrations.

RDF isn't just XML! Look for this tool that does JSON serialization!

* looks for such a tool, finds a broken mess

I'm wondering why isn't RDF more popular within the startup scene, because it's definitely popular and has shown its power in academia and life sciences; just look at the Linked Data cloud.

It may be more popular than you realize. Perhaps people are using it but just make a big deal of it?

It could also be that RDF and the associate tech stack (OWL,SPARQL, reasoners, etc.) are a bit specialized and niche in terms of application, and have a learning curve that's just steep enough to put people off until they really understand why they need them.

Anyway, I can't speak for anybody else, but the semantic web stack, including RDF, OWL, SPARQL, FOAF, SIOC, etc., are a big part of what we're doing. In fact, you could say that our whole initiative is largely rooted in bringing Semantic Web tech into the enterprise.

Indeed. I'm in fact surprised of all the negative comments regarding RDF and the rest of the Semantic Web here on HN. Not to mention the huge amount of research that is pouring in these topics from academia. I guess you're right, they may be adopted but just not talked about? If you search on HN anything of the Semantic web there's still very little content. It could be that academia is not very good at making these topics exciting enough. Nonetheless there must be a reason why so much research goes into things like RDF otherwise they wouldn't be thought so extensively in university.

In simple terms:

1) Tools suck. Even today. Unmaintained tools from 2003 are still used as examples, which is a shame. 2) Having multiple vocabularies is a good thing in principle, but clashes with the "simple integration" selling speech. Have you tried to different data using different vocabs? (assuming they are using the vocabularies correctly, which is far from realistic in many cases). 3) On important selling point is the reasoning and inference part. Sadly it is not available by default in most tools I've used. In fact, there are really very few tools that deal with that and usually are a pain in the ass. 4) Despite all the academic hype regarding RDF-backed CMS, there are very few frameworks that actually work. Drupal 6 and 7 claim to use RDF but honestly, RDF doesn't add value at all

Just to make things clear, I've developed tools and being an advocate of Semantic Web for years, but I'm also a critic person. We have to acknowledge that in academia, tools reach a buggy, prototype level that make them unusable in real world environments. I know there are a few companies that do a great work building semantic web tools, but I'm afraid they are the exception to the rule.

This is exactly it. In my experience, using RDF has been underwhelming and a time sink. Reality doesn't comes close to the lofty vision.

What problems does RDF solve that:

1. Does exist in the real world and are not just data theoretical research problems?

2. Aren't already solved by more mature and widely accepted tools?

It solves the really interesting issue of data integration. When you need to combine and analyze data from various sources having things expressed as triples really makes a difference. You can merge two datasets that never knew about each other's model with ease.

How would you solve data integration issues at a large scale (the web)? The web is full of triples: in webpages (RDFa), in APIs (json-ld), and as large data dumps (freebase n-triples dump).

So it does solve a quite realistic issue that is currently driving thousands of different people around the world insane. That is people trying to combine and analyze large quantities of data.

If data providers exposed triples instead of their own data model, it would make life easier for everyone.

Ps: remember that RDF is not a format but a data model. You can express RDF using json, xml and even csv.

> You can merge two datasets that never knew about each other's model with ease.

Using triples helps you toss all of that data together but that's like saying CSV helps you combine data from different sources because you can keep appending columns. Once you actually want to reason about or link that data you have to have compatible models or invest the time into translating between them — sometimes that's easy but in many cases it requires intellectual reasoning to map between the way different groups view the world. That's the challenge, not the wrapper.

The difference is in the details. You could combine several CSVs and you could probably come up with a bunch of rules that would make data integration easier. But then you'd be reinventing exactly the triples model.

And it's different than combining different CSVs because you can't specifically combine a CSV that talks about cars and another that talks about houses. You could achieve this if you decide on specific column rules allowing you to probably express various types of data as CSV, but again you'd be reinventing the triples model and you already can express triples within CSV http://jenit.github.io/linked-csv/.

It's true that RDF doesn't magically solve data interoperability, and obviously requires you do reason upon the data you want to integrate before you can make sense out of it.

But the important part is that RDF does actually make the process easier, probably easier than any other model out there.

I'm not saying that namespaces aren't useful but simply that the value of having a standard namespace mechanism in a triple is severely undercut by the complexity of RDF and the poor quality of the tools.

Consider: I want to use data from two sources. Do I spend months not working on the hard part of the problem — reconciling the different data models — because I'm learning how to use RDF, configuring, fixing and tuning a bunch of niche tools or do I just pick one of many database options which have much higher performance, are well tested and highly durable, have great documentation and language support, and simply JOIN two tables (classic SQL) or add a namespace in a document (NoSQL or hybrids like Postgres)? Unless you're in one of the few semweb shops, you need to have a HUGE amount of disparate data for that not to be a grossly uneven trade-off, which should not be the case.

The RDF scene is much more mature nowadays. You could simply get an RDF dataset and start analyzing using a large variety of tools. No need to configure/fix/tune anything. It would be as simple as doing the tables JOIN of your example.

I guess if you're more comfortable with that, then perhaps yes, a CSV dataset would be better for you and it would make no difference for your use case. But for me, I could say the same: that with CSV I'd have to learn how to import it into an SQL database and figure out exactly what to JOIN because it's not a graph but a relation db made of tables.

So it really depends on what you're used to. But RDF adds more to the table really. Things like URIs for identifying things which could be directly deferenceable (if HTTP URIs). So you know what your columns actually mean and they're not simply a string of letters. And you'd know what it's in your cells and exactly what type of data it's in the cells.

These are important features that make data integration a lot simpler imho. But if you're used to CSVs and don't care to make your workflow more efficient with the features RDF offers, then I guess you're better off?

Also please check out this interesting answer by Jarven: http://answers.semanticweb.com/questions/19183/advantages-of...

So that's not really solving any specific problem and it's kind of hand waving to say RDF solves data integration problems for data on the web.

Let's break it down a bit.

We learned early on that people put information on the web in different ways. Sometimes it's encapsulated within the UI, sometimes people come up with there own formats for things like contact info and comments.

So how do we make a system where everyone can understand everyone else's data across the web?

Instead of rigidly defined relational models... Where you would need a schema for every type of data... We will attempt to create a universal model consisting of subjects predicates and objects (triples) as the 'schema' for all data. This is essentially RDF.

Like most stuff coming from the w3c, they took a pretty good idea and made it needlessly complex with a terrible initial XML syntax and confusing specs. The first implementations were horribly slow for storage and querying. SQL people were laughing at this joke. There is no reason why this entire concept and a simple implementation can't fit on the same sized sheet of paper as the brilliant JSON spec.

I've been involved with RDF in a major way before, around 2008. My experiences back then were pretty horrible, but I would be willing to give it another look.

Primary I dispise the Java and XML ecosystems and am way more interested in document, graph and columnar datastores these days.

That's a huge misconception. XML has nothing to do with RDF. It's merely a serialization. Please take a look at JSON-LD, it's another serialization of RDF. Or perhaps Turtle which is more human-readable. Or HDT which is a binary format of RDF making it extremely compact.

I've met lots of people that thought that RDF was something you could do with XML and along came all the despise they already had for XML. But it's not, RDF is essentially all about expressing and modelling data as statements of triples.

And this allows for all kinds of interesting interoperability scenarios such as data integration.

I know all of those things. But it is still the case that the initial serialization format was XML and most of the initial tools were Java.

Indeed the initial tools were a mess it's true. But they've grown a lot. There's a lot to choose now. From interactive tools, to performance improvements with really interesting things like HDT. The whole situation is much better.

I understand your grief with RDF's past, but please consider judging the actual things RDF currently brings to the table.

Otherwise it's like saying that HTML was initially very cumbersome and, simply because of that, you're willing to dismiss all the cool things happening with HTML5 nowadays.

> Otherwise it's like saying that HTML was initially very cumbersome and, simply because of that, you're willing to dismiss all the cool things happening with HTML5 nowadays.

But HTML wasn't cumbersome. It wasn't (isn't) the most elegant format either, but it was simple and fun to use, people were excited and enthusiastic about it and published their pages. IMO, these early adopters were critical for making the critical mass that eventually led to the WWW explosion.

If you make the parallel -- that critical mass has never happened with the Semantic Web / Linked Data. How many Linked Data people are really eating their dog food and using RDF today? A way too little, and that's what matters.

RDF is unpopular because it is generally misunderstood. This problem arises (primarily) from how RDF has been presented to the market in general.

To understand RDF you have first understand what Data actually is, once you cross that hurdle two things will become obvious:

1. RDF is extremely useful in regards to all issues relating to Data

2. RDF has been poorly promoted.


[1] http://slidesha.re/1epEyZ1 -- Understanding Data

[2] http://bit.ly/1fluti1 -- What is RDF, Really?

[3] http://bit.ly/1cqm7Hs -- RDF Relation (RDF should really stand for: Relations Description Framework) .

I haven't heard of RDF. From other replies in this thread, I'm guessing it's some kind of database schema thing. Or maybe a replacement for JSON / XML. I'm not really sure.

Can you point to a good, comprehensive tutorial that explains the problems RDF is meant to solve, and gives source code of a simple application using best practices?

If the answer is "no," that's the problem right there. If you love RDF, and that tutorial doesn't exist, why don't you write it?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact