Publishing JSON-LD for Developers

TheAceOfHearts · on May 8, 2018

Try leading with an explanation of why developers should care about JSON-LD. Its value is not obvious to me.

According to the Wikipedia article [0] this is used by Google Knowledge Graph, but there's no other examples. Are there open source RDF processors? What are they used for?

Previously I've used json-schema for documenting, validating, and testing a REST API. The tooling wasn't very good, but it worked alright. As I understand it, JSON-LD wouldn't help at all with this.

The fact that JSON-LD is a W3C standard while json-schema isn't doesn't constitute an argument against it. Standards regularly show up and die without ever gaining adoption. Maybe instead of a vacuous remark on JSON-LD offering more than json-schema with respect to linked data you could actually muster some concrete example which highlight these alleged benefits?

I'm not an expert on this subject, but isn't it worrying that third-party links can change or go down without notice? Ideally if I'm hosting a context I'll want to replicate my whole dependency graph and provide a mirror in order to reduce dependencies on external systems. However, just because I want to host my own mirror doesn't mean I want it to be treated as a different kind of entity. It seems like something such as content-addressable storage [1] could be relevant here.

[0] https://en.wikipedia.org/wiki/JSON-LD

[1] https://en.wikipedia.org/wiki/Content-addressable_storage

jshen · on May 8, 2018

The main benefit in my eyes is data integration. If you have multiple systems with data, and you want to merge the data together into a unified whole for some reason then RDF/JSON-LD make it easy and mostly automatic.

For example: 1. How do you distinguish IDs from one system with those of another? RDF solves this. 2. How do you distinguish property names across systems? RDF solves this. 3. What if a entity originates in system A which you don’t control but you want to add a property to it in your system? RDF solves this.

JSON-LD is simply one way to serialize RDF data that makes look like what you would typically expect from an API that returns JSON. I.e. it’s developer friendly

In addition to the above, RDF is also an extraordinarily flexible/schemaless data model if you store your data as triples rather than as documents. There is definitely a steep learning curve to linked data, and I’ve had a hard time explaining it’s value over the years, but I think it’s worth the time to learn, understand, and leverage.

All of these benefits exist before you get into things like ontologies, which allows you to add additional value on top of what I outlined above.

vfaronov · on May 8, 2018

Outstanding explanation. I would like to emphasize one point:

> If you have multiple systems with data, and you want to merge the data together into a unified whole for some reason

This isn’t just about “big data” or “data warehousing” or such OLAPy concerns. It’s really about the bread and butter of modern information systems.

“Get some ID from service A, use it to query service B.” Does that sound familiar? That’s data integration for you. That’s what RDF / Linked Data is about.

AsyncAwait · on May 8, 2018

ActivityPub [1], (Mastodon [2]), also use JSON-LD.

1 - https://www.w3.org/TR/activitypub

2 - https://joinmastodon.org

igravious · on May 8, 2018

Here: https://json-ld.org/

JSON-LD is JSON for Linking Data.

When you're passing semantic web data around or maybe storing it temporarily in a text file (rather than in a triple-store) one of the formats you can use is JSON-LD. So that covers when you use it; the next question is why you use it over one of the other interchange or at rest formats. And generally the answer to that is that you're using Javascript as your language and you hate XML :)

Anyway, less facetiously here is the blurb from the website!

“Linked Data empowers people that publish and use information on the Web. It is a way to create a network of standards-based, machine-readable data across Web sites. It allows an application to start at one piece of Linked Data, and follow embedded links to other pieces of Linked Data that are hosted on different sites across the Web.”

“JSON-LD is a lightweight Linked Data format. It is easy for humans to read and write. It is based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale. JSON-LD is an ideal data format for programming environments, REST Web services, and unstructured databases such as CouchDB and MongoDB.”

dogweather · on May 8, 2018

> why developers should care about JSON-LD. Its value is not obvious to me.

This is at a lower level of abstraction, but... I care simply because it's Google's preferred encoding for web publishers to communicate metadata to them:

https://developers.google.com/search/docs/guides/intro-struc...

yoklov · on May 8, 2018

For an argument against using JSON-LD, see https://hsivonen.fi/no-json-ns/

vfaronov · on May 8, 2018

This article does not have a valid argument against using JSON-LD.

It may be read as an argument against making namespaces a ubiquitous part of JSON, in the same way as they are ubiquitous in XML (even though not part of the base XML spec). It’s a valid argument against getting to the point where namespaces pop up in every JSON tutorial, and every JSON serializer takes a namespace map, and every user of JSON has to deal with them.

But JSON-LD (as RDF) uses namespaces to solve a particular problem (merging data from disparate sources). Solutions involve costs, that’s normal. You might weigh this cost against the expected payoff and decide it’s not worth it for you — but the article doesn’t do that.

dochtman · on May 8, 2018

See also this recent Mozilla discussion on the proposed charter for the W3C working group:

https://groups.google.com/forum/m/#!topic/mozilla.dev.platfo...

appdrag · on May 8, 2018

Love this. Thanks will share it :)

mcluck · on May 8, 2018

As someone completely out of the loop on this one but deeply entrenched in modern web development, I have some questions.

1. What benefit does this format give that other schema systems do not?

2. The `@context` just seems like a way of defining variables. As best as I can tell, the use case leads to one only using each of these "variables" once. What's the benefit?

3. What benefit does RDF in any format provide to the average web developer?

Sorry if these questions are asinine. It's just that the article bemoans the fact that this hasn't hit mainstream web development yet, to my mind, does nothing to encourage its use.

titanix2 · on May 8, 2018

> 3. What benefit does RDF in any format provide to the average web developer?

Probably (almost) none, otherwise in its 20 years of existence it would have been used somewhere. Heck, even Semantic Web scholars actually don't use it for their own personal websites...

jshen · on May 8, 2018

It provides a lot of benefits, see my other comment on this thread for a few specific benefits.

titanix2 · on May 8, 2018

I read your post and you rose valid points.

I think one problem though is that RDF is a solution searching for a problem to solve. It's clear it was designed from some abstract space instead to be a solution evolving from real use cases. And that's why most people have a hard time understand it, and even more putting it in practice.

For example, even basic RDF concepts such as a resource are hard to grasp. You have an IRI, great, but should it be dereferenceable? If yes, what it expected to be returned: a document in various formats or more RDF triples? If not (URN case), what to do with it? All these problematics are not addressed in the spec itself but must be understood before even starting to implement a hello world.

There is interesting projects such as DBpedia. Another problem of RDF is social and pertains to its advocates: trying to impose the technology everywhere without taking in account the audience needs nor the limitations of the technology itself is not going to work great. See the suggestions made during this workshop to improve the technology: https://www.w3.org/2009/12/rdf-ws/ In 9 years (half of its live), nothing was addressed.

jshen · on May 8, 2018

I think it solves a real problem. I work at a large company, with many data silos. I’ve had to hire armies of interns to match in merge data over the years because it wasn’t possible to automate the merging of data from different databases.

I agree with your other points though, people have a hard time understanding it, and the community hasn’t really shown the value. My hobby project is an effort to show the value, but it’s not ready to share yet.

And yes, there’s a lot of confusion things (URI vs URL, dereference or not) that the community hasn’t done a good job articulating a happy path. There’s another you didn’t mention, there’s an impedance mismatch between triples and resources. One is property oriented, and the other entity oriented, and they dont’ always play nice together.

titanix2 · on May 8, 2018

Of course I don't know every issues but sure I forgot some. Difficulty of making reification for example. Anyway, I will be glad to hear about your project when it's published.

lolive · on May 9, 2018

I have always thought RDF and the whole Semantic Web was the technical bricks for a decentralized Knowledge Graph, à la Google KG. I still strongly believe that you cannot go any other diretion if this is your goal. (and honestly, i am still puzzled by the mess of companies internal search engine & their lack of investment in a knowledge graph).

rdtsc · on May 8, 2018

> Sorry if these questions are asinine. It's just that the article bemoans the fact that this hasn't hit mainstream web development yet, to my mind, does nothing to encourage its use.

They are not, those are good questions. The fact that you've been able to make and deliver successful products for years without knowing or worrying about RDF or JSON-LD should make you suspicious of JSON-LD and it's value proposition.

I've been hearing about for years and yet the development world is pretty happy moving ahead without it. I've looked at RDF and JSON-LD some years back when designing a web API shrugged and ignored it.

Now sure, it's nice. It's like having a car with chrome plated rims instead of regular rims. It's better, shinier, prettier but not enough for me to bother to spend time (money) to get it.

Boulth · on May 8, 2018

1. This is not a schema language.

2. Context is for shortcuts, all properties can be expanded to full names but these are qualified and long.

3. Benefit over what? RDF is for describing things with machine readable language.

mcluck · on May 8, 2018

1. The article states that the `@context` is basically a schema and JSON-LD provides a consistent way of formatting JSON data. Sounds a lot like a schema for JSON.

2. Okay, so you either put the long data in the `@context` or in the original data. I suppose it may improve readability but for a data interchange format (which I believe this is), it seems wasteful.

3. Benefit in general. Why would I use RDF at all? What is any of this for? I can describe my data just fine with normal JSON. Why should I design my JSON in this way?

Boulth · on May 8, 2018

I think it's useful to put this into perspective, if you're designing a format for your backend and frontend you don't need semantic data. If you're one party providing API to others you don't need it too. But if there is an ecosystem of independent parties providing and consuming data there is a need for a way of describing meaning of your fields. By meaning I don't mean "field X is ISO time" (that would be JSON schema) but rather "field X is date of birth".

As for practical uses JSON-LD is used by Google to scrape things from the internet, and check if they got a Place here, Person or maybe a Business. Activitypub that powers Mastodon is also JSON-LD.

otakucode · on May 8, 2018

Blah. Some people just won't be happy until JSON turns into the debacle XML turned into. It's a special kind of brain problem to see a simple straightforward thing and to only consider how one might bolt on arbitrary complexity.

rspeer · on May 8, 2018

You must have misunderstood something. JSON is a data format. JSON-LD is a use of that data format.

Your complaint is as if you looked at the Sublime Text configuration file, which is in JSON, and said "I liked JSON better when you didn't have to configure anything".

JSON-LD isn't designed to replace JSON, it's designed to replace all the awkward formats for RDF.

rspeer · on May 8, 2018

I recently wrote about the use of JSON-LD in ConceptNet: http://blog.conceptnet.io/posts/2018/conceptnet-and-json-ld/

Seeing some knee-jerk negativity in other replies, I should emphasize: nobody is under any obligation to use JSON-LD just because you use JSON! It's not an upgrade to JSON, it's an upgrade to RDF.

harlanji · on May 8, 2018

Please forgive me if off topic. I feel EDN deserves some love. Unsure if it’s lack of awareness or what. Defrecord with type hints is enough for most projects, esp. the data models. The collection types are complete and there are also comments, namespaces, and tags. The grammar is also short, simple and easy.

Spec: https://github.com/edn-format/edn/blob/master/README.md

Grammar: https://github.com/antlr/grammars-v4/blob/master/clojure/Clo... (EDN is a subset of Clojure).

Sorry if this is too fanboy, I just need to assume people aren’t aware when I see this big pile of mud that is JSON.

igravious · on May 8, 2018

No worries, it is slightly off-topic though :) JSON-LD is a JSON format for the serialization of linked data, not a general data interchange format like JSON itself. Presumably an EDN fanboy could come up with EDN-LD without too much trouble if they so desired. I linked this elsewhere but here are nine other RDF serialization formats that you could model EDN-LD on: https://help.poolparty.biz/doc/developer-guide/general-infor... if it ever comes into being.

talltimtom · on May 8, 2018

Only slightly related bit does anyone have a good current take on how to get going on using RDF for something of value? I recently got interested through a colleague showing off his pet rdf setup, but the more I read the more I am just confused with the platora of dead/abandoned and renamed projects, different standards and tons of different shorter versions like this one. It really feels like everything on the topic is written both by and for people who have been active in the space continually since 2005, so for someone new it’s difficult to see where to start and what to avoid.

jerf · on May 8, 2018

RDF is a zombie format. It isn't entirely "dead" but after some wildly ambitious promises, it hasn't lived up to any of them, and it seems only a few "true believers" are still working with it. This being a world of seven billion people, a few "true believers" can still be a few thousand people, but it's just not a technology I can say is vibrant, alive, or worth spending much time with. The core problem is that it solves the easy problems, and not even terribly well (I more-or-less endorse titanix2's comment elsewhere in the thread), while leaving the hard problems mostly untouched. The excitement it generated 10-15 years ago mostly comes from confusing solving the easy problem for the hard problem. Everything that I've personally ever seen used RDF has ripped it back out again at some point. Whether that's the fate in store for the current users named in the article I can't say, but it certainly wouldn't be a bad guess.

If the space intrigues you, I would suggest starting with a graph database instead.

lolive · on May 8, 2018

All the questions that the Semantic Web asked itself are reemerging with GraphQL. (exposing the DB schema, querying over the web, returning results based on a structure). The JS script kiddies are just scratching the surface today, but they will eventually have exactly the same questions. (data delivery, data reuse, data semantics, data exposure). They still worship plain JSON. But they will (have to) evolve. The Semantic Web has solved all those questions already, but in the most obscure manner.

PS: and yes, a graph database is a good start to open your mind.

jshen · on May 8, 2018

RDF solves very real problems, see my other comment in this thread for a list of a few of them. A graph database does not solve those problems in any way.

I also find it strange to call it dead when companies like google and facebook are using it for metadata on content to power their core products. Again, a graph database does not solve that.

dogweather · on May 8, 2018

I've been developing a strongly-typed Ruby library for generating JSON-LD that might be interesting or useful: https://github.com/public-law/schema-dot-org

igravious · on May 8, 2018

Awesome, I love messing with Ruby and semantic web stuff. Thanks for posting this. Could you explain more? How do you implement the strongly-typed feature? How does it interoperate with the Ruby-RDF library? https://github.com/ruby-rdf

dogweather · on May 8, 2018

Here's a quick example. Put this in a Rails template:

  <%= WebSite.new(
        name: 'Texas Public Law',
        url:  'https://texas.public.law',
      )%>

...and you'll see this in the HTML:

  <script type="application/ld+json">
  {
    "@context": "http://schema.org",
    "@type": "WebSite",
    "name": "Texas Public Law",
    "url": "https://texas.public.law"
  }
  </script>

This DRY's up my code - I just encode the right way of forming JSON-LD and the correct objects and attributes in one place, the library. (That JSON-LD is from my live site, https://texas.public.law)

Details:

First I created a small library which wraps Rails ActiveModel Validations so they can be used in any Ruby object. And I supply a new validator, TypeValidator.

https://github.com/public-law/validated_object

With this building block, I can create classes for each Schema type I care about. And they strictly enforce their attributes:

  class Organization < SchemaType
    attr_accessor founding_date
    validates founding_date, type: Date
  end

That's a little redundant, but to a Rails developer, it's immediately clear what's going on. The full Organization code: https://github.com/public-law/schema-dot-org/blob/master/lib...

I'm not using Ruby-RDF at all. I looked into the current Ruby meta-data and strong typing work, and found it all too complex for my simpler use case:

I'm creating websites, and I want to embed correct JSON-LD-formatted schema.org in a rock-solid way: (1) Correct syntax, and (2) correct semantics. I've got the information as Rails models and objects, and I need to convert this to HTML/JSON-LD.

So all of these objects, like Organization above, recursively creates proper markup in response to #to_s:

  <%= @an_organization %>

...instantiated from that class above, will result in this in the generated HTML page:

  <script type="application/ld+json">
  {
    "@context": "http://schema.org",
    "@type": "Organization",

etc.

https://github.com/public-law/schema-dot-org#usage

igravious · on May 8, 2018

Thanks for the in-depth explanation. Sounds really great. I'll add it to my Ruby RDF toolkit.

What do you think of Spira https://github.com/ruby-rdf/spira , does it complement what you are doing in any way?

dogweather · on May 8, 2018

That's pretty cool! But yeah, I don't have any RDF data sources. I parse lots of plain text and PDFs, and import some databases directly.

And for generating the website metadata, that info is coming from the Rails models.

athenot · on May 8, 2018

I guess I don't understand the purpose. If a high degree of formalization, typing and namespacing is required, why not use full-blown XML with namespaces and schemas?

lolive · on May 8, 2018

Everyone is slowly reinventing SOAP :) (Note that RDF can modelize graphs natively, has native support of types, IDs and labels, and has a javascript parsing library, a bit more powerful than JSON.parse()).

rspeer · on May 8, 2018

Because XML is a much worse match for everyone's data model than JSON is.

Data structures aren't marked-up text. And that one time that XML was supposed to be used for marked-up text (XHTML), it wasn't even good at that either.

goofballlogic · on May 8, 2018

A json-based format still has some advantages depending on context. E.g. JavaScript code can manipulate JSON data "natively" whereas XML support is far harder to achieve.

goofballlogic · on May 8, 2018

Awesome stuff. I'm working with JSON-LD in both C# and JavaScript and loving the freedom that the linked data concepts (constraints) bring to developing APIs.

titanix2 · on May 8, 2018

I'm working on language ressources and graphs and in my opinion, RDF is so flawed you shouldn't even bother with it. I will address the topic during a conference in June (email me -- info in my profile -- for follow up or to get my articles if you're interested by the topic). JSON-LD is trying to make RDF sexier (more exactly said, RDF try to pimp itself by using JSON-LD) but as it is just RDF with a new serialization syntax, it share every of its issues.

Main issues are:

- a literal node (strings which carry the useful information for an human) cannot be the source of an edge, so you end up creating "resources" nodes and pushing content as leaf of the graph. Access to its is thus done with a lot of indirections.

- no annotations, which mean you cannot represent simple graph construct such as a weighted edge.

- blank nodes, needed to join more than two piece of data which, are anonymous so they are difficult to reuse. That's sad because a graph is supposed to make data link each other. Coupled with issue 1, it mean most linguistic resources graphs are actually closer to trees than real graphs.

- reliance on the web infrastructure (IRI, DNS, web servers, ...) which is a big dependency with a lot of gotchas and not fitted for every uses cases (self contained data for example)

- performances is abyssal. I tried one of the dictionary build on RDF, and some query took up to 40 seconds (!) to return no result. Even bloated web dictionaries perform under 2 seconds.

Also, the second sentence of the article is totally false. I read the history of JSON-LD by one of its main author [1], and it does not originated in the RDF community. It originated elsewhere, and then there was RDF people included in the loop.

[1] http://manu.sporny.org/2014/json-ld-origins-2/

vfaronov · on May 8, 2018

RDF was not designed for arbitrary graphs. It was specifically designed for resource descriptions on the Semantic Web.

> reliance on the web infrastructure (IRI, DNS, web servers, ...)

This is the whole point of RDF. It is what enables Linked Data.

> I read the history of JSON-LD by one of its main author [1], and it does not originated in the RDF community.

Not sure what you mean exactly, but here’s from http://manu.sporny.org/2014/json-ld-origins/:

> JSON-LD started around late 2008 as the work on RDFa 1.0 was wrapping up. We were under pressure [...] to come up with a good way of programming against RDFa data.

JSON-LD definitely was always an RDF serialization, created by people intimately familiar with RDF and the Semantic Web.

titanix2 · on May 9, 2018

> RDF was not designed for arbitrary graphs.

Exactly. That's my exact point. So, now, my problem is other scholars asking "why don't you use our model X or Y based of RDF" when the only thing I care about is the graph part. And I don't use RDF because of the points I exposed earlier. I should I've put emphasis that I'm working in a given field (lexicography), and that the RDF issues I face may be irrelevant for some other use cases.

As for the JSON-LD origin I stand correct.

Boulth · on May 8, 2018

Is there a reasonable alternative? I've worked with RDF and LD and can agree with most of your points but would like to see if there is something better on the horizon.

titanix2 · on May 8, 2018

Currently no. I am concerned with making dictionaries better (especially mobile applications), so I don't have a general purpose solution but I propose something for that use case. I basically started from scratch and came up with a very simple type system which aggregates a few values (name, id, type, ...). This type system is not hierarchical so no inheritance; you could implement it with a few classes in a OO language.

Also, nodes can host complex structure but do not expose it to the graph (edges are made only between nodes) and edge can be of n-arity. The system is regular enough to implement simple and complex lexicographic constructs easily as well as to use the graph in an actual application.

Issue with RDF et al if that before you start even working on your real data, you have to come up with an ontology, that is to describe the world in categories, which is not easy. Another problem is that a lot of projects want to define standards that works with everything. That's not realistic. So I took an approach where you can learn from your domain by working directly on your data, make something useful with it, and then expand it you need it. KISS.

lolive · on May 9, 2018

Ontologies first is an idiocy! Whenever you deal with a real data environment, you discover that your use cases drive your data model. Not the opposite way. For example, the DBPedia ontology is plainly wrong (or outdated, I don't know). If you want the real data model, you really should infer it from the data itself (Guess what, you CAN do that. Because proper RDF auto-describes its data model. But it is a lot of work). The more i deal with data, the more i realize that the exposed data models are not matching the data. And that is a SHAME for whoever calls himself/herself a data engineer.

rjsw · on May 8, 2018

There are plenty of ontologies already defined that someone could use. I'm currently involved in work to map STEP [1] onto RDF/OWL/SKOS and it seems to be working out well.

I know that some of my collegues are using JSON-LD as the interchange format for this kind of thing.

[1] https://en.wikipedia.org/wiki/ISO_10303

titanix2 · on May 8, 2018

It totally depends on what you are doing. You need to take in account if using an ontology will solve any problem at hand. Also you may have an ontological system not build in RDF. At least in what I'm doing, RDF is creating more problems than it solves. Finally there is all the political layer that may go against publishing your data in an open and (web) accessible format and then you need another solution anyway.

lolive · on May 9, 2018

To counter the argument of opening data to the outside world, LinkedOpenData has been studied inside companies. I.e opening only inside the walled garden. And guess what? It failed. Simply because companies are not organized so data can "travel" flawlessly. Companies are organized as silos. With boundaries & (implicit) contracts between people, teams, hierarchies. Of course, everyone knows the information is somewhere here or there, but eventually it is a real hard job to get it for real.

dozzie · on May 8, 2018

All this sort of sounds like topic maps, though they never gained much of attention and very few people know of or remember them, and beside Java there are virtually no libraries or tooling for them.

tannhaeuser · on May 8, 2018

Prolog + Datalog? Prolog should even parse JSON without 3rd party libs with minimal fuss using just its integrated parser construction facilities (eg. the `op` built-in predicate).

Prolog won't come with reasoners for the description logic profiles supported by OWL 2 (EL, RL), but can be used to implement such reasoners if needed.

TheAceOfHearts · on May 8, 2018

I don't know what MP means, but if you publish an article I'd be interested in checking it out.

titanix2 · on May 8, 2018

I thought hacker news had private messages but no (and well MP is French, so my mistake it should have been PM), so send me a mail at the address found in my profile https://news.ycombinator.com/user?id=titanix2 so I can contact you back.

jonawesomegreen · on May 8, 2018

I'd also be interested in the article if you publish one, but I don't see any email in your profile. I believe only the `about` section is displayed publicly.

titanix2 · on May 8, 2018

Ah yes, the email field is not displayed. Fixed.

lolive · on May 8, 2018

The point is that all those problems are strictly worse with plain JSON.

lolive · on May 8, 2018

Ok, to be fair, I would say that JSON-LD could solve the problem of sending typed data through the wire once we have typed-contracts between server and client (for example when everybody switches from JS+REST to TS+Swagger).

Imho, JSON-LD is a piece of crap (because it is based on JSON, that is flawed by design). So I would definitely promote the usage of RDF/N3 (and its N3.parse()) as the way to go when you need to transmit typed graphs from server to client.

Note: N3 is a dialect of RDF, that is super easy to serialize and deserialize, that builds graphs of objects in memory (which JSON cannot do, because it can only describe trees). And that (almost always) follows a normalized vocabulary for types, IDs and labels.

lolive · on May 8, 2018

Here is a small primer that i wrote on HN about N3: https://news.ycombinator.com/item?id=14476070

(as you can see, nothing very complex :)

lolive · on May 8, 2018

Oh, and for your information, N3 has always had a reification syntax. For example:

{ [ x:firstname "Ora" ] dc:wrote [ dc:title "Moby Dick" ] } a n3:falsehood .

PS: yep, there are blank nodes here. PPS: very few parsers take this into account, I agree. But I am pretty sure N3.parse() does.

igravious · on May 8, 2018

If I were you I would not deliver that paper. Especially if these are the points you are going to be making.

> JSON-LD is trying to make RDF sexier (more exactly said, RDF try to pimp itself by using JSON-LD) but as it is just RDF with a new serialization syntax, it share every of its issues.

JSON-LD is _only_ a serialization format. Look, here are _nine_ others. https://help.poolparty.biz/doc/developer-guide/general-infor... Choose the one that best suits your needs. And using language like sexier and pimp, words fail me.

> a literal node (strings which carry the useful information for an human) cannot be the source of an edge, so you end up creating "resources" nodes and pushing content as leaf of the graph. Access to its is thus done with a lot of indirections.

What you mean here is that in the [S, P, O] resource, where S is the subject and P is the predicate and O is the object, one can't make the Subject a string literal. No you can't. Big deal. Whatever language and language framework you're using should have an abstraction for unpacking string literals, and if it doesn't you can roll your own. The semantics of RDF are that literal values are Objects. This is because in natural language we tend to say things like [User5000, is called, "titanix2"] and not ["titanix2", is the name of, User5000]. Sorry if you don't like it, that's the convention: https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal

> no annotations, which mean you cannot represent simple graph construct such as a weighted edge.

No idea what this means

> blank nodes, needed to join more than two piece of data which, are anonymous so they are difficult to reuse. That's sad because a graph is supposed to make data link each other. Coupled with issue 1, it mean most linguistic resources graphs are actually closer to trees than real graphs.

I didn't get how blank nodes worked originally and made myself look like a fool when I questioned how they worked on a mailing list. Such is life. They actually make sense for what they do once you wrap your head around them. Rather than me explaining them to you here I urge you consider how they work again: https://www.w3.org/TR/rdf11-concepts/#section-blank-nodes

> reliance on the web infrastructure […]

Well, it's not called the semantic web for nothing! Also, it's been pointed out to me that one can use URNs instead of URLs if you don't want to use the semantics of the HTTP schema. But you know what it seems to be working for pretty much everybody else and how expensive is it to but a domain these days?

> performances is abyssal. I tried one of the dictionary build on RDF, and some query took up to 40 seconds (!) to return no result. Even bloated web dictionaries perform under 2 seconds.

Without further details I'd have to presume you're talking about SPARQL? Yes some queries are slow but if you're dealing with millions or billions of triples then that's inevitable. I've discovered that you can optimize SPARQL once you start digging into it. No it's not easy, but it's not exactly simple tech either. Without details about the triple-store you were querying against and the query you used there's nothing more I can help you with. Anyway, your complaint is like you having a slow query on some particular relational database and you blame the entirety of SQL, wow.

titanix2 · on May 9, 2018

I'll reply to you without quoting everything to make it more readable. First, the paper is already accepted and is not about RDF per se. I know about the serialization formats and none of them will fit my needs because the problem lies in the data model, not a concrete syntax. I don't really understand the point you try to make about literals. Mine is that because literals cannot be the origin of an edge, they form sinks. And that's very annoying for building a graph (of strings) because you then need to create proxy nodes for data all the way up.

For annotations, you can read[1]. What I meant however a really simple graph construct: information on edges. In the Neo4j lingo that is called a property[2]. There absence in RDF make it difficult to represent situations such "Paris"---distance:54---"Rennes". I'm visibly not the only one it's itching[5].

Yes you can use URN, be some projects doesn't support them. Check out the linguistics data cloud[3] for an example with such a requirement. Moreover, renting a domain (you can't buy a domain name) may be cheap but why putting this burden on yourself if you can avoid it? Even Microsoft mess up with them[4] for time to time. I read some papers of 2 years old and all the endpoints and links where dead. So yes, if you're not doing a web project, no need for a web format. Sadly this is what is done in my field.

Anyway I'll be curious to see something you built with RDF, if any. I failed to find your work besides two papers that aren't related to the subject.

[1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.352...

[2] https://graphaware.com/neo4j/2013/10/24/neo4j-qualifying-rel...

[3] http://linguistic-lod.org/

[4] https://www.theregister.co.uk/2003/11/06/microsoft_forgets_t...

[5] http://dbooth.org/2010/rdf2/

igravious · on May 9, 2018

> I know about the serialization formats and none of them will fit my needs because the problem lies in the data model, not a concrete syntax. I don't really understand the point you try to make about literals.

So create your own serialization format suited to your own needs.

> I don't really understand the point you try to make about literals. Mine is that because literals cannot be the origin of an edge, they form sinks. And that's very annoying for building a graph (of strings) because you then need to create proxy nodes for data all the way up.

If by origin you mean what RDF calls subject then yes, as I said, the RDF data model does not support subject literals. That is by design. What you could do is create a resource that encapsulates strings in the way you want a use that for both the subject and object. Don't think them as proxy nodes, think of it like you now have the ability decorate your string literals in an extensible way.

> For annotations, you can read[1]. What I meant however a really simple graph construct: information on edges. In the Neo4j lingo that is called a property[2]. There absence in RDF make it difficult to represent situations such "Paris"---distance:54---"Rennes".

    PREFIX distance: <http://www.my_cool_domain.fr/prop/distance>

   [wdt:Q90 distance:354 wdt:Q647]

Because Paris and Rennes are already in Wikidata, why not use them? Create your own namespace, call the distance property "distance" when you're prefixing it, reserve numerals in this property to signify geographical magnitude in km. 354 between Paris and Rennes, not 54?

> I'm visibly not the only one it's itching[5].

Two wrongs don't make a right. Considering that link was calling for this in 2010 suggest to me that you oughtn't hold your breath. Also them self-describing as an expert is crass.

> Yes you can use URN, be some projects doesn't support them.

Well, that is not the fault of RDF now is it?

> Moreover, renting a domain (you can't buy a domain name) may be cheap but why putting this burden on yourself if you can avoid it?

If you can't afford €10 per year or the institution where your project is located can't afford it then you have bigger problems than imaginary RDF restrictions.

> Anyway I'll be curious to see something you built with RDF, if any. I failed to find your work besides two papers that aren't related to the subject.

I've used Wikidata and DBpedia to build a corpus of philosophers and their philosophical texts without too much of a problem once I wrapped my head around how RDF and SPARQL work.

Paris on Wikidata: https://www.wikidata.org/wiki/Q90

Rennes on Wikidata: https://www.wikidata.org/wiki/Q647

titanix2 · on May 9, 2018

You are changing the problem at hand. It is not how to create a distance relation, it is how to add more data to an existing edge. To do that in RDF you need reification, which is notoriously painful[1]. Moreover <http://www.my_cool_domain.fr/prop/distance54> is not equal to "54" and 54 cannot be inferred from it because URI are opaques. In addition, you have now a totally different property than those of every pair of cities that are not 54 km afar. So, RDF have some strengths and good points but in the current case it is creating unnecessary problems to encode a very simple situation.

[1] https://www.safaribooksonline.com/library/view/practical-rdf...