Not the entire point. Relational databases have schemas that are not well suited to graph structures, and SQL is also a barrier. A better question is whether you need to implement your own storage backend or whether something like Postgres would suffice.
OK then what type of graph would be hard to store in RDBMS?
There are obviously queries that one would want to run on graph data that RDBMS will not run efficiently (but the queries in this test are actually trivial for RDBMS). In what way is SQL a barrier compared to learning query language used by some graph database that is not applicable to anything else you do?
"A better question is whether you need to implement your own storage backend or whether something like Postgres would suffice". You could just use a more established Graph Database that actually performs well on complex graphs?
Here's a thing that most RDBMS don't do well but graph DBs do: find me nodes connected to at least X nodes of type T where those nodes have attribute A and are also connected to node N. Now duplicate that filter a potentially arbitrary number of times.
The issue here is that the number of joins explodes, and depending on your schema you may be doing lots of self joins.
An additional complication is if your dataset is too big for a single host. In Postures you shard, but that is manual and has significant cost.
In DGraph you lose some performance but (hopefully) if you know something about your queries you can optimize the distribution function to minimize cross node queries. This is a pretty hard problem to generalize, but even a partial solution is good.
Yours is a great example of what graph DBs should be good at, but many self-styled graph DBs out there at the moment are not. Graph DB means to me only two things: index-free edge traversal and scalable built-in graph operations. While these would seem to be necessary and sufficient criteria to distinguish a graph DB, some instead use only the criteria of the GGP and equivocate graph DBs with schema-free DBs, which should be orthogonal axes of database features.
OrientDB would indeed be an example of something that's only a "graph database" because marketing said it should be.
Last time I asked how to import an actual graph into OrientDB, a marketing person of theirs pointed me at a Java API for writing extensions to their code.
Dgraph is aimed at minimizing network calls. In fact, the network calls are directly proportional to the complexity of the query, not the number of results. Which means the queries would maintain their latency even as you add more machines to the cluster.
Right, but unless you know the queries in advance there is always the risk of pathological queries that thrash the network.
Naive example: in the movie dataset, if you partition by node type and have actors on one server and films on another a query like "find me all films with actors names starting with M who also starred in films with actors starting with N" will perform horribly, but if you partition by actor and film name it will be OK.
Titan (and I think most distributed Graph DBs) use pluggable distribution strategies and default to random to try to combat this problem.
I agree with you that there are queries where Graph DBs perform better than RDBMS (and you provide a good example of such query) so it would be really cool to see appropriate benchmarks. Also would be nice to see benchmarks vs more established graph dbs.
Yes. But I think that this is showing the scalability of the network and query stacks in DGraph, not other kinds of scalability.
Both are important. As someone who sometimes has very large graphs, I'm more interested in this benchmark than absolute performance: I'm happy to take a performance hit if it means I can scale out.
It doesn't. That paper proposes a data layer called Grail on top of an RDBMS, which is exactly what I described in my comment. An RDBMS may perform well at storing graph-like structures, but is not ergonomically suited to be used directly by humans for that purpose.