Not the *entire* point. Relational databases have schemas that are not well suit...

qaq · on June 21, 2016

OK then what type of graph would be hard to store in RDBMS? There are obviously queries that one would want to run on graph data that RDBMS will not run efficiently (but the queries in this test are actually trivial for RDBMS). In what way is SQL a barrier compared to learning query language used by some graph database that is not applicable to anything else you do? "A better question is whether you need to implement your own storage backend or whether something like Postgres would suffice". You could just use a more established Graph Database that actually performs well on complex graphs?

nl · on June 21, 2016

Here's a thing that most RDBMS don't do well but graph DBs do: find me nodes connected to at least X nodes of type T where those nodes have attribute A and are also connected to node N. Now duplicate that filter a potentially arbitrary number of times.

The issue here is that the number of joins explodes, and depending on your schema you may be doing lots of self joins.

An additional complication is if your dataset is too big for a single host. In Postures you shard, but that is manual and has significant cost.

In DGraph you lose some performance but (hopefully) if you know something about your queries you can optimize the distribution function to minimize cross node queries. This is a pretty hard problem to generalize, but even a partial solution is good.

aschampion · on June 21, 2016

Yours is a great example of what graph DBs should be good at, but many self-styled graph DBs out there at the moment are not. Graph DB means to me only two things: index-free edge traversal and scalable built-in graph operations. While these would seem to be necessary and sufficient criteria to distinguish a graph DB, some instead use only the criteria of the GGP and equivocate graph DBs with schema-free DBs, which should be orthogonal axes of database features.

nl · on June 21, 2016

What is GGP?

I'm not aware of any schema free databases marketing themselves as Graph DBs. I'm sure there are though.

There is a distinction between graph and graph processing frameworks (GraphX etc) though, but I don't think that's what you mean.

rspeer · on June 21, 2016

I'm assuming GGP = "great-grandparent poster" = "what the person three posts up from here said".

jskywalk · on June 21, 2016

OrientDB: http://orientdb.com/docs/last/Schema.html

rspeer · on June 21, 2016

OrientDB would indeed be an example of something that's only a "graph database" because marketing said it should be.

Last time I asked how to import an actual graph into OrientDB, a marketing person of theirs pointed me at a Java API for writing extensions to their code.

mrjn · on June 21, 2016

Dgraph is aimed at minimizing network calls. In fact, the network calls are directly proportional to the complexity of the query, not the number of results. Which means the queries would maintain their latency even as you add more machines to the cluster.

https://github.com/dgraph-io/dgraph/blob/master/present/sydn...

nl · on June 21, 2016

Right, but unless you know the queries in advance there is always the risk of pathological queries that thrash the network.

Naive example: in the movie dataset, if you partition by node type and have actors on one server and films on another a query like "find me all films with actors names starting with M who also starred in films with actors starting with N" will perform horribly, but if you partition by actor and film name it will be OK.

Titan (and I think most distributed Graph DBs) use pluggable distribution strategies and default to random to try to combat this problem.

qaq · on June 21, 2016

I agree with you that there are queries where Graph DBs perform better than RDBMS (and you provide a good example of such query) so it would be really cool to see appropriate benchmarks. Also would be nice to see benchmarks vs more established graph dbs.

nl · on June 21, 2016

Yes. But I think that this is showing the scalability of the network and query stacks in DGraph, not other kinds of scalability.

Both are important. As someone who sometimes has very large graphs, I'm more interested in this benchmark than absolute performance: I'm happy to take a performance hit if it means I can scale out.

dominotw · on June 21, 2016

this project https://github.com/google/cayley seems to have gone the route of using pluggable backends.

assface · on June 21, 2016

> Not the entire point. Relational databases have schemas that are not well suited to graph structures, and SQL is also a barrier.

The latest research disagrees with your statement:

http://cidrdb.org/cidr2015/Papers/CIDR15_Paper20.pdf

anonetal · on June 21, 2016

I think the following is a better references for putting unstructured property graphs or RDF into relational schemas.

SQLGraph: An Efficient Relational-Based Property Graph Store; SIGMOD 2015.

Previous hackernews discussion: https://news.ycombinator.com/item?id=11101013

atombender · on June 21, 2016

It doesn't. That paper proposes a data layer called Grail on top of an RDBMS, which is exactly what I described in my comment. An RDBMS may perform well at storing graph-like structures, but is not ergonomically suited to be used directly by humans for that purpose.