SPARQL Protocol for RDF

rspeer · on June 21, 2017

SPARQL is what happens when you try to define an API for doing everything, from abstract principles, and end up with an API that does nothing.

It wraps your database in a worse database with a worse query language. It makes simple queries into kilobyte-long GET query strings that get morasses of XML-namespace nonsense as a response, or more likely, a server timeout.

While the W3C was screwing around with SPARQL, everyone else came up with JSON-based REST APIs. They work well. Not even the core W3C people use SPARQL anymore.

dragonwriter · on June 21, 2017

> It wraps your database in a worse database with a worse query language.

SPARQL doesn't necessarily wrap a database with a different underlying query language. And it's a much better query language than SQL for RDF data.

> It makes simple queries into kilobyte-long GET query strings

Don't do that. I mean, seriously, the protocol supports queries with GET, but it also supports t o different mechanisms for querying over POST.

> that get morasses of XML-namespace nonsense as a response

SPARQL does not mandate XML responses (for graph responses, it doesn't specify a non-XML serialization, but still on requires RDF/XML or a semantically equivalent structure. For other responses, there are defined JASON, XML, and CSV/TSV formats.

> While the W3C was screwing around with SPARQL, everyone else came up with JSON-based REST APIs.

SPARQL's defined API is more REST than most of those.

> They work well.

Not for the purpose SPARQL is aimed at, which is—admittedly—not a particularly common use case, and certainly not what your typical web service needs.

slaymaker1907 · on June 21, 2017

You just keep thinking that until you see the god awful mess of REST apis that happens when you need to make an endpoint for every single slightly different query.

linkmotif · on June 21, 2017

Yeah REST... it's for the truly patient and perseverant. Not to mention the pedantic. "REST principles"—heh. Also: JSON :(.

Thank God GraphQL is changing this paradigm and has useable and even powerful clients already. Only bad thing about GraphQL is that it targets JSON but at least it's typed and schematic.

But I guess we all benefited from REST, in the sense that it killed the whole 900-page XML book scene which was maybe the worst thing ever.

rspeer · on June 21, 2017

Yeah, that's kind of how APIs have to work. It turns out it's not feasible or desirable to just let arbitrary people make arbitrary queries. SPARQL doesn't make it any more feasible, it just ignores feasibility.

Vinnl · on June 21, 2017

> It turns out it's not feasible or desirable to just let arbitrary people make arbitrary queries.

GraphQL/Relay and Falcor are trying to make it feasible. And I think it's definitely desirable from a consumer point of view, at least.

xg15 · on June 21, 2017

> It turns out it's not feasible or desirable to just let arbitrary people make arbitrary queries.

It turns out where and why exactly?

rspeer · on June 21, 2017

Denial of service.

derefr · on June 22, 2017

You can just 408-response queries after they execute for a given number of CPU cycles. And then rate-limit per-account using evercookies + browser fingerprinting + IP hashing. You can often disallow anonymous queries altogether, while you're at it; if your data is valuable, people are usually willing to register at least a pseudonymous account to get at it.

lemmsjid · on June 21, 2017

I'm confused. Isn't SPARQL a query language for RDF? I've used SPARQL extensively against triple stores. It's quite handy and pretty easy to pick up once you grok how RDF is structured. I wouldn't even compare it with the concept of RESTful APIs--they're attacking two different problems.

rspeer · on June 21, 2017

What were your triple stores running on? A database, right?

Do your SPARQL queries turn into anything within orders of magnitude of an efficiently indexed SQL query when they get to the database?

mindcrime · on June 21, 2017

Triple-stores can, but don't necessarily, layer on top of SQL databases. Jena has (had?) a triple-store engine that used JDBC to store the triples in a traditional relational database, but it's been deprecated for years in favor of a specialized engine designed specifically for storing RDF.

I can't speak for other triple-stores as Jena is what we use, but I can say that comparing SPARQL queries against Jena TDB to SQL queries on MySQL or Postgres is like comparing apples and kangaroos.

nl · on June 21, 2017

comparing SPARQL queries against Jena TDB to SQL queries on MySQL or Postgres is like comparing apples and kangaroos.

In the sense that apples rot and Kangraroos are amazing animals which can do just about anything?

Because otherwise the comparison seems completely apt.

Jena works for RDF data. But the OP is correct in their broader point that RDF is rarely a good choice and SPARQL is a pretty horrible solution for querying it.

Note that the reply someone is about to write ("But RDF is a generalized self descriptive data model") means it is intended to solve the exact problem that a RDBMS+SQL solves. And if you add the "standardized fields" thing it also matches the REST interface+RDBMS+SQL comparison the OP made.

xg15 · on June 21, 2017

> And if you add the "standardized fields" thing it also matches the REST interface+RDBMS+SQL comparison the OP made.

The standardization in REST is basically "read stuff" and "write stuff". For everything else you're supposed to look up the API docs and write a specialised client.

I can query a SPARQL endpoint for a list of people and their friends, sorted by age - without knowing anything about that endpoint. I can also merge results from different endpoints without worrying whether they use slightly different data formats.

nl · on June 21, 2017

I've actually built this exact thing (it was for a question-answering over linked data thing).

It worked, sort of. But only after I mapped the many different representations of "age" used by the different end points.

I don't remember the specifics, but even in DBPedia alone you have to deal with the properties and the ontology namespace. Then YAGO uses that but brings in other sources and puts them in their own fields. Freebase does (did) its own things.. etc etc.

It was a long, long way from the "you don't need to know anything" utopia you describe.

In summary, there really is no advantage over mapping from a REST endpoint.

Plus, the database endpoints are slow (and even worst - have high variance in performance). I ended up downloading the dumps and hosting them all myself because the servers were so slow and unreliable.

rspeer · on June 21, 2017

"Standardized fields" here meant columns in a relational database, things you can index and look up quickly. "Birth date" would be such a field that you'd hope would be standardized, for example.

> I can query a SPARQL endpoint for a list of people and their friends, sorted by age - without knowing anything about that endpoint.

No you can't. I am certain that you can't.

To do this, you would need:

- A social networking service that uses SPARQL

- People to actually use that service

- Knowing the schema that would represent things like "friend" and "age"

- A model of permissions that indicates that somehow you're allowed to know the age of people's friends (seriously, how are you allowed to know this, that's creepy)

- A way to express that permission alongside your SPARQL query, which probably means you need to expand the query to include a representation of your identity and permissions

- A way for the SPARQL endpoint to authenticate that you have that permission (you will definitely need to look up API docs for this, as it will involve sending some sort of crypto token out-of-band)

- A container format for the RDF responses you get that can express things like "you don't have permission for that query"

xg15 · on June 21, 2017

> "Standardized fields" here meant columns in a relational database, things you can index and look up quickly. "Age" would be such a field that you'd hope would be standardized, for example.

Then please point me to that standard. In RDF, that woukd be FOAF for example.

> A social networking service that uses SPARQL

- People to actually use that service

Yes, for querying an endpoint, I need an endpoint. No way.

My point is that even if I have such an endpoint as a REST API, I can't directly go on to query it because I'll first have to write a specific client tailored to its API and data model first, then think about how I convert it into my own. If I want to match up accounts from Facebook, Twitter and Mom-and-Pop-BBS, I'll have to deal with three different APIs and three different data models. If those sites provided SPARQL endpoints, I'd only have one of them.

> Knowing the schema that would represent things like "friend" and "age"

Defined by FOAF, see above.

> A model of permissions that indicates that somehow you're allowed to know the age of people's friends (seriously, how are you allowed to know this)

That's the responsibility of the endpoint, not mine. I don't see why that would be a hard problem (I figure you'd define permissions on different RDF properties and types) but I admit I don't know much about it.

>A way to express that permission in your SPARQL query

I send my (authenticated) query and if I don't have sufficient permissions, the server will hopefully return "nope". Why would I need to send more?

Yes, some sort of authentication is obviously needed, but there are enough standards to use for that (any sort if HTTP auth method, OAuth, OpenID etc)

Note my point wasn't that I can query endpoint X out of the blue and expect to get all the data - but that I don't have to write specific code to deal with endpoint X. Obviously I have to get permission somehow, but ideally, the only endpoint-specific thing I have to do is to fill out a registration form.

Depending on the use-case, you might not even need auth at all if your endpoint is restricted. We also have authless, restricted REST endpoints today that seem to work well: They're called web pages.

dragonwriter · on June 21, 2017

> But the OP is correct in their broader point that RDF is rarely a good choice and SPARQL is a pretty horrible solution for querying it.

SPARQL is an excellent choice for querying RDF data (SQL is usable but awkward for querying EAV structured data).

> Note that the reply someone is about to write ("But RDF is a generalized self descriptive data model") means it is intended to solve the exact problem that a RDBMS+SQL solves.

RDF/EAV is more graph than relational structured. It doesn't solve the exact same problem.

nl · on June 21, 2017

SPARQL is an excellent choice for querying RDF data

It's really not!

There's a reason why Tinkerpop is what most graph databases standardize on, and why things like Neo4J, DGraph, Caley, TitanDB etc (ie, all the graph DBs which people use when they want to build something and not "do the semantic web") don't use SPARQL.

dragonwriter · on June 21, 2017

> There's a reason why Tinkerpop is what most graph databases standardize on

Tinkerpop is a Java API, not an independent query language, and RDF data (while it is a way of modeling a graph) is not the model of most graph databases (it's a lower-level model than most graph databases expose, and is about as far from them as it is from the table model of SQL databases.)

An optimal Java API for graph databases with a more typical model is not an optimal query language for RDF data, for pretty much the same reason SQL isn't.

Now, what you describe is probably a sign may that RDF isn't the right exposed data model for many use cases (EAV-style representations are often used for deep internals, but there's probably a reason that outside of RDF most systems which use the model internally expose something more similar to the conventional relational or graph model to application developers), not that SPARQL isn't the right language for querying RDF data.

nl · on June 21, 2017

Yes, Tinkerpop is a Java API which is unfortunate. But the Tinkerpop set of technologies isn't Java specific. There are Python and Javascript version of Gremlin (the rough equivalent of SPARQL).

Gremlin is very widely supported across graph databases.

what you describe is probably a sign may that RDF isn't the right exposed data model for many use cases

Well. that's exactly what my claim is, so that's good!

not that SPARQL isn't the right language for querying RDF data.

Have you ever tried one of the alternatives? Try GraphQL on DGraph (Or Gizmo/Gremlin on Cayley) against a Freebase or DBPedia import. That's exactly the equivalent of SPARQL against RDF, and it's so much better.

hipsters_unite · on June 21, 2017

> RDF/EAV is more graph than relational structured. It doesn't solve the exact same problem.

Yeah 100% this. Comparing a graph data model and a relational data model while obviously possible isn't really all that fruitful so long as each is being use to solve the problem that they're the best fit for.

nl · on June 21, 2017

Two points:

Firstly: RDF is an inadequate expression of most graphs, and SPARQL is a bad way to query graphs. See my comment here on this: https://news.ycombinator.com/item?id=14603090

Secondly: graphs storage is something which is very tempting in theory but very hard to get right in practice. I'm not going to say it is never appropriate (that is clearly untrue), but for most production applications it isn't the right choice.

I'd note for example that most social applications use a RDMS to store a single layer of friends (and then perhaps have a second graph DB for batch/stream processing of graph functions).

lemmsjid · on June 21, 2017

I think we're thinking of different use cases and therefore talking past one another. My SPARQL usage was not using an rdbms under the covers and was explicitly using RDF data. Querying triple data in SQL would be quite clumsy except in the simplest usages, hence the existence of SPARQL. I have seen the libraries that can run SPARQL against an rdbms (for example the 3 or 4 column indexed table that would represent triples or quads). I would imagine performance would be an issue there--but for me rdf has mainly been around aggregating and exploring data, if I need to scale to massive queries per second I'd use something else.

dragonwriter · on June 21, 2017

> Do your SPARQL queries turn into anything within orders of magnitude of an efficiently indexed SQL query when they get to the database?

Probably more efficient, if it's an RDF- (or EAV) oriented datastore underneath, and merely similarly efficient if it's actually layered on top of SQL.

mindcrime · on June 21, 2017

We use SPARQL heavily. YMMV. shrug

hcarvalhoalves · on June 21, 2017

Comparing SPARQL to REST doesn't make sense. RDF is a different beast.

rspeer · on June 21, 2017

The linked article is describing a world where you encode SPARQL in HTTP query parameters and get back RDF+XML. It really is attempting to be an API for querying everything, but querying nothing well.

You can write reasonable REST APIs for RDF. You can even formalize it with JSON-LD if you care to.

mindcrime · on June 21, 2017

You can write reasonable REST APIs for RDF. You can even formalize it with JSON-LD if you care to.

True. It's unfortunate that many people have such a strong mental association with RDF and RDF+XML. The Semantic Web community largely moved on from emphasizing use of RDF+XML something like 10+ years ago, in favor of N3 or Turtle, and - more recently - JSON-LD.

But people still hear "Semantic Web" and think "oh, that's that heavy weight XML bloated thing..."

zmonx · on June 21, 2017

When I hear "Semantic Web", I think:

That's the heavy weight XML bloated thing that has, something like 10+ years ago, moved towards N3 or Turtle, more recently to JSON-LD, and will, in due time, hopefully arrive at Prolog syntax as a very natural and convenient representation that is at the same time amenable for direct processing in a programming language that is eminently well suited for the domain of knowledge representation.

Which, I admit, may raise the question why Prolog syntax was not chosen all along, given that its syntax was invented and even standardized decades ago. By the way: A few members of the consortium were well aware that Prolog syntax would have been an excellent choice for RDF. Only a few though.

mindcrime · on June 21, 2017

Which, I admit, may raise the question why Prolog syntax was not chosen all along, given that its syntax was invented and even standardized decades ago. By the way: A few members of the consortium were well aware that Prolog syntax would have been an excellent choice for RDF. Only a few though.

Good point.

dragonwriter · on June 21, 2017

> You can write reasonable REST APIs for RDF.

The specification for SPARQL over HTTP is a reasonable REST API for RDF. (Though it highlights the need for a safe HTTP method like GET but with a request body, and overloads POST for queries for that purpose in addition to the god-awful query-string GET encoding.)

It's not JSON-for-all-the-things, but JSON is a stupid way of encoding a general-purpose query language. It does support JSON (among other encodings) for responses, though.

jerven · on June 21, 2017

I love SPARQL, although I often say it causes problems for developers but solves problems for organizations.

I know I am not in the average developer sphere, but working as a data provider to the wider public (ok scientific public) SPARQL is fantastic. First is that classic rest+json give data but do not allow analytics. Second is that SPARQL has federation. While CPU time wise it is inefficient for scientists its saves them months trying to get a local copy of the db setup.

SPARQL implementations are reasonably efficient and certainly do not need to be less efficient than SQL stores (even if they often are). However, as it is a query language not an implementation it can behave completely different. i.e. it can have great K/V performance with relatively poor analytics e.g. Marklogic or Oracle NoSQL SPARQL. or poor K/V perf but good analytic e.g. virtuoso or Oracle Semnet on its RDBMS. You can switch implementation and have completely different performance characteristics without needing to redo your data model etc...

As lead developer for the public facing rest and sparql endpoints of the UniProt consortium I know which is cheaper to run, SPARQL by an order of magnitude even if it uses more hardware.

Even inside the consortium developers who should have easy access to our SQL databases use our public SPARQL endpoint because it is easier to do so. That includes developers who have been writing SQL for decades and are not afraid of an explain plan. Opening a webpage is just quicker than getting SQL developer started and requires a whole lot less of tunneling.

Also a 1.5Tb on disk database such as sparql.uniprot.org is not going to be fun in all cases. The equivalent spread over the different production SQL databases is no fun either and no smaller (in this case actually federated until recently with oracle links, now some are postgres).

miguelrochefort · on June 21, 2017

Nothing makes me happier than seeing the semantic web hit the front page.

Too bad most people don't understand it.

mindcrime · on June 21, 2017

Same here. Honestly though, I'm surprised. I submitted this even though it's kinda old, because I was reading up on federated SPARQL queries, and had a random thought that somebody might find it interesting. I didn't actually expect any significant number of upvotes, or for this to make the front-page.

My feeling is, the Semantic Web vision is still growing and still permeating it's way through things. I think it's taken longer than probably anybody expected, but it's an ambitious-as-fuck-all idea too, so that should hardly be surprising.

The thing is, in this fad-driven industry, once something slips off the hype wagon, and isn't "cool" or "sexy" anymore, it's something of a struggle to get people interested. Everybody is all "ooooooh, shiny!" and off to chase the javascript framework du jour.

phlakaton · on June 21, 2017

Count me as one of them, though I'm more conversant about it than some. Let me ask then: what would you have people understand?

nepger21 · on June 21, 2017

Anybody can share some light on companies that uses semantic web technologies? Currently at university have a semantic web class but I am finding it difficult where the use case are to create value for business or startup.

tannhaeuser · on June 21, 2017

Facebook Open Graph crawlers use RDF in HTML head-meta elements conveying data about authors of web pages.

Similarly, Google Corporate Contacts extracts RDF from web pages (but can also use JSON microdata).

W3C's newly published open social standard is based on RDF (as JSON-LD) I believe.

RDF and SPARQL has lots of uses in libraries and bibliographic apps such as citation managers, as well as in publication of open data, in particular medical and bio informatics data.

Dublin Core (as RDF) is frequently used for standard document metadata, for example in PDF.

hsivonen · on June 21, 2017

Does Facebook now actually do RDFa-compliant processing for OGP? Back when OGP was introduced they didn't. https://lists.w3.org/Archives/Public/www-archive/2011Apr/006...

anon1253 · on June 22, 2017

We use quite a bit of SPARQL to reason over medical ontologies. It's not exactly "semantic web", but ontological knowledge (such as the Gene Database, UMLS, etc) are naturally serialized in graphs, which makes them an excellent fit for RDF triplestores, hence SPARQL. We regularly query millions of concepts with billions of relations in milliseconds. True, RDF under the hood is just a EAV model, but thinking about the identity between triples, graphs, and logic programming makes a lot of sense for us.

https://joelkuiper.eu/semantic-web

fxaguessy · on June 21, 2017

Yes, for example at Wallix, we created awless.io a CLI for AWS, that uses RDF (https://github.com/wallix/triplestore) to model cloud data. So far this linked data has help us since it is loosely coupled and has graph-like properties.

Vinnl · on June 21, 2017

I believe Schema.org uses (/can use?) linked data that e.g. Google can again use to provide more relevant information about your content: https://developers.google.com/schemas/

lolive · on June 21, 2017

One of the benefit of SPARQL is that you can extract a subset of a given dataset as RDF. And can inject it into another RDF graph.

Eventually, you can build-up your dataset by picking subsets of other datasets, and link your own data graph with all that.

(The top of the top is to use SPARQL federated query, so you can do all that a query-time).

If you consider SPARQL as a way to implement REST API, it is the most flexible API a data producer can provide: you can run whatever query on my dataset, I force nothing.

What I see as a failure in the SPARQL ecosystem, it is the lack of SPARQL query repositories for all public datasets.

I tried to develop a tool for that (cf datao.net) but it never took off.

ghukill · on June 21, 2017

Are there queries that SPARQL can perform over a triplestore that cannot be done with SQL over normalized data? Perhaps not.

But data normalization to that end is a moving target, while a bag of subject-predicate-object statements are quite doable. This, I believe, is a uniquely powerful characteristic of linked data / graph query languages and protocols.

To that end, agree with the comment above that GraphQL is mighty exciting.

barakm · on June 21, 2017

+1 Insightful. In fact, there's research toward showing the two are equivalent in possibility space of what can be represented/queried (https://arxiv.org/abs/1102.1889)

But yes, linked data and graphs are super powerful once the data is triplified. Suddenly you have an abstraction above the contents of your data into the 'shape' of your data.

SPARQL and RDF aren't going away, but they're the academic thing that I and others are trying to make useful. GraphQL is scratching the surface, but it's super exciting that it's scratching at all, imo.

(Disclosure: Founded CayleyGraph, supporting the open source https://github.com/cayleygraph/cayley, which I maintain and mostly wrote)

linkmotif · on June 21, 2017

GraphQL, though, is a bit of a lie nomenclature-wise. As I've experienced it, it's got nothing much to do with graphs, at least not in the sense that SPARQL deals with triples that form a graph. In this department I am really interested in TinkerPop [0].

I would love, some day, to spend some more time with triple stores, RDF and semantic technologies.

[0] http://tinkerpop.apache.org/docs/current/reference/

handojin · on June 21, 2017

You might really enjoy datomic (www.datomic.com). Everything is stored as entity attribute value time and you query with a dialect of datalog. You can check out www.learndatalogtoday.org to get a flavor.

tannhaeuser · on June 21, 2017

Datomic, though, isn't Datalog syntax at all.

I've got nothing against Datomic, but can't help to think learndatalogtoday is outright false advertising by trying to capture "Datalog" as SEO term for a proprietary graph database which has nothing to do with Datalog/Prolog.

The point of Datalog is that it's a subset of Prolog syntax, implying that engines can be reasonably exchanged for one another. But this is only possible with real Datalog, or SPARQL for that matter.

linkmotif · on June 21, 2017

Prolog and datalog ae really high on my list. Thanks for the reminder!

Datomic, though... wish there was an OSS version or CE or something.

anon1253 · on June 22, 2017

For those who like to know a conceptually a bit more about RDF/Sparql: https://joelkuiper.eu/semantic-web

Bonus: my SPARQL Clojure wrapper https://github.com/joelkuiper/yesparql

mfrager · on June 21, 2017

SPARQL is not an API framework as much as it is a query language. Many powerful graph databases are queried via SPARQL.

rspeer · on June 21, 2017

As a feature to put on a checklist, not as the way they expect you to use them. Powerful graph databases generally have to come up with their own query languages.