
Jepsen: Dgraph 1.1.1 - aphyr
https://jepsen.io/analyses/dgraph-1.1.1
======
mrjn
Hey folks, author of Dgraph here. If you're interested in the design details
of Dgraph, and have the appetite for a very technical research paper, please
check this out:

[https://dgraph.io/paper](https://dgraph.io/paper)

I'd like to thank Kyle in doing another round of testing. Some of these bugs
that we fixed (in 2018 and 2019) were very tricky edge cases -- it's
incredible to see Dgraph running so much more stable now, under all varied
failure scenarios.

Let me know if you have any questions, I'm around to answer.

~~~
bullen
Did you have to pay Kyle for him to test your database?

~~~
aphyr
Yep! Testing databases is my full time job. Vendors pay me for the work, and
that means the Jepsen library, test harness for each database, and all the
reports are free for everyone. :)

~~~
bullen
Does the Jepsen test require you to host the databases or can you test a
system completely remotely?

~~~
aphyr
Sure, I could run Jepsen against anything I can talk to over the network, but
a.) it might be painful or impossible to get a fresh, healthy cluster for each
test run (It's... fairly common that databases stop being databases during a
test) and b.) doing fault injection means I need a way to, well, inject
faults. That's why Jepsen tests almost always involve Jepsen controlling the
cluster.

~~~
bullen
Ok, what are the API calls you need now, and would need to remotely inject
faults? Can you make a HTTP server API with a client implementation for it so
that we can just make implement the server API and point your test cluster to
the nodes for automatic testing? I understand this would be counter to you
making your living out of this, but think of it as a "light" pro-bono version!

~~~
aphyr
It sounds like you're asking for contract work; you're welcome to email me at
aphyr@jepsen.io.

------
simonw
I find Dgraph to be one of the most interesting of the current batch of non-
relational data stores.

I've wanted a robust, easy-to-use graph database for years (ever since reading
about the crazy brilliant graph database stuff that goes on inside Facebook
[https://www.facebook.com/notes/facebook-engineering/tao-
the-...](https://www.facebook.com/notes/facebook-engineering/tao-the-power-of-
the-graph/10151525983993920/) ) and Neo4J never really cut it for me.

Watching Dgraph mature - and survive two rounds of Jepsen with relatively
decent marks - is intriguing. I don't have a project that needs it yet but
maybe something will come up soon.

~~~
johnymontana
> Neo4J never really cut it for me.

What did you find lacking in Neo4j?

~~~
j-e-k
In my experience, from ~1.5 years ago, performance became an extreme challenge
to overcome if you wanted to service user facing requests in <100ms (for a
non-trivial sized graph). I really did enjoy Cypher though and the tooling
around it was very polished. Tempted to try it again now that they have a new
version.

------
wiradikusuma
I've been using Dgraph for over a year (on/off, it's a side project). I first
saw Dgraph on HN.

I thought Dgraph was going to be the "secret sauce" for my app after reading
the list of features (maybe I was mesmerized by the cute mascot). Few months
down the line, though, sometimes I question my decision and whether I should
have used the good old PgSQL. Let me explain.

1\. Coming from SQL and key-value NoSQL, Dgraph was very foreign to me. It's
like OOP guy learning Functional Programming. Now I'm quite comfortable with
it, but it took me months to be productive. To give an analogy, learning
Dgraph is like learning Scala instead of Python.

2\. Actually it's worse, the resources are mostly in the forum (micheldiz is
my hero) and the official website. Until recently, the navigation for the docs
is horrible, everything in one big HTML file, difficult to jump in from Google
search result.

3\. Rudimentary tooling for development phase. When you're working on a new
idea for a product, you will experiment A LOT with the schema and data. When
you write wrong data, with most RDMBS you can use GUI to right-click and
delete or edit. In Dgraph, you must write mutation query (assuming you
remember the syntax, as it is a bespoke language). Dgraph GUI is very minimal.

4\. About the mutation.. in Dgraph (as far as I know) there's no referential
integrity in the DB-sense. Like, you can make FK to non-existent object, or
insert something invalid without returning error (but it's not stored, since
it's invalid). The "integrity check" is in your app.

5\. Because of #3 and #4, I find it easier to just drop the whole database and
recreate it along with seed data, every. time.

6\. The documentation for the Java client library is very minimal. So there
you go, unfamiliarity with the query language ("GraphQL+-"), with the Dgraph
itself, and with the client library.

I still use Dgraph, it's a good fit for my app, but if you're starting on a
new business idea, maybe don't use anything fancy. My mistake, being a
developer, I mixed research (new tech!) and bootstrapping.

(In case you're wondering, my app is [http://s.id/axtiva-android-
test](http://s.id/axtiva-android-test) \-- still version 0.0.x, but recently
I'm releasing weekly)

~~~
mrjn
Hmm... That's probably not the testimonial I was hoping to get from a Dgraph
user.

Though, I can see part of your pain is because of the custom query language,
GraphQL+-. We now offer standard official GraphQL compliance as well, which
tackles a lot of these issues you ran into.

1 and 2. GraphQL is becoming very common, so plenty of resources.

3\. GraphQL has many amazing editors.

4\. GraphQL allows for lack of referential integrity by setting certain fields
in the object as non-nullable, which can remove those objects from the results
and so on -- which is frankly, the thesis we have around a distributed graph
database, sharded by predicates (not nodes).

5\. arrr... sad.

6\. GraphQL has many client libraries and so on.

Hope we could change your opinion by switching you to standard GraphQL --
particularly, if you don't need the advanced features provided by plus-minus.

~~~
jcims
TBH I think most of those complaints are true of the graph database ecosystem
in general. I haven't used Dgraph but I echo the same concerns from my
experience.

~~~
mrjn
Sentiments are changing now, particularly with GraphQL. Dgraph bet on GraphQL
early on, and it's really catching on as a replacement for REST.

------
cube2222
Haven't used DGraph itself but I've used the storage engine they built for it
- Badger, an alternative to RocksDB better optimized for SSD's - in two
projects.

One was for event saving and retrieval, which was able to sustain a stable 60k
writes /s with simultaneous 10k reads /s. It worked great overall, with stuff
needing nontrivial tuning being 1. RAM usage 2. If you overwhelm it with
writes it'll stall to keep up with level 0/level 1 compactions.

Another one is OctoSQL[1], where we're building exactly-once event-time based
stream processing all around Badger. So far it was a breeze and I don't think
we'd build it if not for Badger.

Overall, at least the storage engine they're using is awesome, and I can
definitely recommend it!

[1]:[https://github.com/cube2222/octosql](https://github.com/cube2222/octosql)

~~~
mrjn
Love it! You should send a PR to add OctoSQL to the list of projects using
Badger (GitHub README).

~~~
cube2222
I'll certainly do when we actually release the version based on badger!

------
enzotar
Can anyone share their production experience with Dgraph?

~~~
chintan
We are doing PoCs around it -- however the text search is not ready for prime-
time. [https://github.com/dgraph-
io/dgraph/issues/5102](https://github.com/dgraph-io/dgraph/issues/5102)

~~~
mrjn
(author of Dgraph) We want to improve full text search, to bring it inline
with Elastic Search. A lot of people compare Dgraph against Elastic, because
they'd rather just have one solution (Dgraph) instead of two.

It's in our backlog to improve FTS drastically from where it stands today.

~~~
onefuncman
do you have any reusability of the infrastructure for indexing edge properties
to reuse in FTS?

------
maximente
one thing i'm not a huge fan of is dgraph's UID model, which is effectively an
auto-incrementing uint across the entire cluster. because it auto increments
server side, it's non-deterministic; ingesting 10 nodes before 10 others means
that the UIDs will change across despite being the same XID. there is a way to
use "blank nodes" to link nodes and edges with non-int UIDs, but that is only
per-mutation, not per-commit or per-transaction. there is no way to tell
dgraph what the UID should be.

that means that if you have externally unique IDs that you have infrastructure
around, you are either caching that node's UID externally or doing an XID->UID
lookup in order to create edges.

there is a bulk loader but that's only available in HA mode, and the UID:XID
map it generates is obviously for data you already had in flat files (or
whatever). so it's ok for static data sets, but not ideal for live updating
data.

the gRPC API also has strange undocumented (AFAICT) behavior where even
smallish batches of 100 hit some unspecified gRPC limit, so you need smaller
batches ergo more commits ergo more wasted compute.

~~~
mrjn
> there is no way to tell dgraph what the UID should be.

There is. You can lease UIDs from Zero, and do your own assignment. Look at
/assign endpoint [1]

> doing an XID->UID lookup in order to create edges.

Also, you can use upserts to do an XID lookup, before creating a new node.
Which is practically what other DBs do too.

> there is a bulk loader but that's only available in HA mode

Don't know what that means. Bulk Loader is a single process (not distributed),
and can be used to bootstrap a Dgraph cluster. The cluster can be a 2-node
cluster, or an HA cluster, that doesn't matter.

> where even smallish batches of 100 hit some unspecified gRPC limit

Never heard of that. Grpc does have a 4GB per message limit. But, I doubt
you'd hit that with 100 records.

[1]: [https://dgraph.io/docs/deploy/#more-about-dgraph-
zero](https://dgraph.io/docs/deploy/#more-about-dgraph-zero)

~~~
maximente
thanks for the reply!

i hope this comes across as non-critical feedback, but it'd be really, really
nice to put that assign endpoint in some form or fashion in the Mutation
documentation. it is completely absent from there, and i don't recall seeing
it in the tour of dgraph either.

furthermore, it's absent from the golang client. the documentation states:

> It’s possible to interface with Dgraph directly via gRPC or HTTP. However,
> if a client library exists for you language, this will be an easier option.

however it looks like i'll need an additional HTTP layer to interface with the
/assign endpoint. not a huge deal, but that seems like a big functionality gap
with the golang endpoint - would definitely like to see that added in there.

lastly, the /assign endpoint and the bulk loader can only be run with a DGraph
Zero instance, which, as far as i can tell, which doesn't run by default with
the provided docker image. that's an important detail that's not super duper
obvious from the docs, until you start seeing parameters like dgraph-zero, and
then realizing that it doesn't come with the quick start docker image.

again, hope this isn't taken personally. thanks for your work on the project!

~~~
mrjn
No worries at all, I like to hear feedback from users, whether its positive or
negative. Though, I also like to separate wheat from chaff, which is why I
have suggestions / follow up questions, etc.

Assign endpoint is something that you can just do once. You could say, give me
a million UIDs, and then use them however you want. You don't need to call it
repeatedly.

Also, its an endpoint to Zero, not to Alpha. Zeros are not supposed to be
directly talked to, in a running cluster. We're now doing work around exposing
some of Zero endpoints via Alphas, in our GraphQL rewrite of the /admin
endpoint. So, that might make it easier.

I think the consistent theme I'm hearing here is that our documentation isn't
clear -- we aim to improve that. But, could use more critical, logical
feedback / suggestions on our forum -- so please feel free to pitch in there.

------
wpietri
There's a part I don't get here: "To store large datasets Dgraph shards the
set of triples by attribute, breaks attributes into one or more tablets, and
assigns each tablet to a group of nodes." But earlier, it says, "For
convenience, Dgraph can also represent all triples associated with a given
entity as a JSON object mapping attributes to values—where values are other
entities, that entity’s attributes and values are embedded as an object,
recursively."

I know almost nothing about graph databases, so presumably this is just my
ignorance. But if entity-focused retrieval is an important use case, isn't
clustering by attribute going to kill performance? Naively, I'd think that one
would cluster by entity and what an entity is connected to.

~~~
aphyr
The first sentence is describing how Dgraph shards data for storage; the
second sentence discusses how data can be represented in the query API. You're
right that this could lead to broad fanout, if users typically retrieved _all_
attributes for a given UID. It also impacts the performance of joins: graph
traversal across a single attribute is _much_ faster when all those edges are
on the same node, but graph traversal across _different_ attributes might pay
a higher latency cost. This is a classic dilemma in distributed graph storage
--does one shard by attribute? By entity? Each leads to distinct performance
tradeoffs, and Dgraph happened to choose attribute sharding.

Also keep in mind that typical Dgraph workloads request specific attributes
(think "SELECT NAME, AGE") rather than everything ("SELECT *"), which reduces
the impact of fanout. :)

~~~
mrjn
> graph traversal across different attributes might pay a higher latency cost

Those can be done concurrently if at the same query level, so not necessarily
any slower. In other terms, the number of network calls required (in a
sufficiently distributed cluster, where each predicate/attribute is on a
different server), is proportional to the number of attributes asked for in
the query, not the number of results (at any step in graph traversal).

And that's the big part of the design. By constraining the number of network
calls to very few machines, while doing traversals, which would lead to
millions of results in the intermediate steps -- Dgraph can deal with high
fan-out queries (with lots of node results) much better.

Alternative would be to shard by nodes (entities) -- in which case, if the
intermediate steps have millions of results, they could end up broadcasting to
the entire cluster to execute a single query. That'd kill latency.

So, the problem is not how many attributes a query is asking for -- that's
generally bounded. The problem is how many nodes you end up with as you
traverse the graph, those could be in millions.

That's why many graph layer systems suck at doing anything deeper than 1 or 2
level traversals / joins.

~~~
aphyr
>> graph traversal across different attributes might pay a higher latency cost

> Those can be done concurrently if at the same query level, so not
> necessarily any slower.

An important clarification, yes! I should have made that more explicit. :)

------
anonymousDan
Does Jepsen ever open source any of their tooling/test harnesses? I teach a
distributed systems class and it would be great to have an automated test
framework/tools for Distributed systems issues

~~~
aphyr
Yes: the library is prominently linked on the home page, and there are deep
links to the Dgraph test suite code throughout the report. Pretty much all of
my work is OSS, and public release of test harness for each report is
explicitly part of the Jepsen ethics policy. :)

[https://github.com/jepsen-io](https://github.com/jepsen-io)
[https://jepsen.io/](https://jepsen.io/)
[https://jepsen.io/ethics](https://jepsen.io/ethics)

~~~
anonymousDan
Awesome! Quick question - how much of the test harness you use for each report
is generic/reusable, and how much is system specific? I have my students
implement various algorithms/systems in Elixir e.g. Raft/Paxos, various
broadcast algs etc. It would be nice to have something both they and I could
use to simulate network partitions etc.

~~~
aphyr
Kinda depends on what you're doing, how much of Jepsen you're using, and how
complex the system-specific code is. You can write a minimal Jepsen test in
~100 lines of code, if that's helpful. Jepsen and its main supporting
libraries (Elle and Knossos) clock in at about 19K lines of code; a little
over six years of full-time work.

For simulation testing, I'd suggest looking at Maelstrom, which uses Jepsen to
provide a sort of workbench for writing toy Raft implementations in any
language. You give it a binary which takes messages as JSON on STDIN and emits
messages to STDOUT; it spawns a bunch of "nodes" (local processes) of that
binary, connects them via a simulated network, generates pathological network
behavior, simulates client requests, and verifies the resulting histories with
Jepsen.

[https://github.com/jepsen-io/maelstrom](https://github.com/jepsen-
io/maelstrom)

------
sontek
This is missing too many hand drawn stuff and memes. This isn't the jepsen I
remember :P

