
Jepsen: RethinkDB 2.1.5 - akerl_
https://aphyr.com/posts/329-jepsen-rethinkdb-2-1-5
======
krat0sprakhar
This is awesome! I've been waiting for Aphyr to publish his analysis of
RethinkDB as this is a project I carried out in a recent distributed systems
class that I took. Although our analysis[0] is not as comprehensive (or
correct?) as Aphyr's we still managed to learn quite a lot.

If you are looking at using Jepsen to do your own analysis, I have a few
takeaways that might be worth sharing -

\- Have a passable knowledge of Clojure

\- Get a beefy workstation. We used a 160 gig EC2 instance[1] and still
couldn't get Knossos (the linearizability checker) to complete for longer
runs.

\- Use the Docker-in-Docker setup[2] to minimize the frustration

\- Pick an existing system closest to the system you want to analyze and see
Aphyr's version of the tests for guidance and overall flow. The code is well
commented and you should be able to follow through.

All in all, our major takeaway was that Jepsen is a wonderful (albeit complex)
piece of software that takes time to get up and running. Once you are past
that though, it stands as a very complete testing tool in itself.

Sincere thanks to Aphyr for open-sourcing it and helping us with our project!

[0] - [https://github.com/prakhar1989/ADS](https://github.com/prakhar1989/ADS)

[1] -
[https://twitter.com/prakharsriv9/status/675049396636110848](https://twitter.com/prakharsriv9/status/675049396636110848)

[2] -
[https://hub.docker.com/r/tjake/jepsen/](https://hub.docker.com/r/tjake/jepsen/)

------
GordyMD
Great to finally see a write up by Aphyr on RethinkDB. Ever since reading
these blogs and seeing RethinkDB lost standing Github issue [1] I was keen to
hear how RethinkDB would hold up to the tests once RAFT was implemented.

Given the thoroughness of the Jepsen test suite it is something people want to
see these days before being able to choose a database with any confidence.
Hopefully this sets out expectations with high transparency.

Kudos to the team at RethinkDB for funding and assisting Apyhr in his work.

[1]:
[https://github.com/rethinkdb/rethinkdb/issues/1493](https://github.com/rethinkdb/rethinkdb/issues/1493)

~~~
lomnakkus
> Given the thoroughness of the Jepsen test suite it is something people want
> to see these days before being able to choose a database with any
> confidence.

Definitely agreed on the "something people want to see" part, but this is this
is the thing... Jepsen isn't actually __that __thorough[1]. I rather think
that this is an indictment on the state of "practical" distributed computing
as it currently stands that a "simple" test for linearizability (nowadays) or
even simple CAS (which I believe Jepsen started out testing) in a partitioned
system would turn up such a huge amount of badly implemented distributed
systems and... frankly dishonest documentation around those systems.

It's not that I could necessarily do any better -- except maybe the "honesty"
part, or at least adding lots of qualifiers -- I just find it a bit... sad in
a way that we haven't come farther. Still, it _is_ a young field, so there may
be grounds for optimism for the future. (Thinking of e.g. dependent type
systems coming together with model checking coming together with verified
model->machine translation, chips with verified semantic models, etc. etc.)

[1] As Aphyr explicitly states, it's actually very limited in the state space
that it can explore simply because it's constrained to be "external" to the
system being tested. Model checkers can do much more -- but then you usually
don't get a fully automatic _and_ verified translation to machine code... and
who knows if you've modeled the CPU/IOMMU/etc. correctly anyway?

EDIT: Btw, Aphyr deserves a _HUGE_ amount of praise. AFAIK he's the only
person so far who has stepped up to the plate and "dared" to actually test
this stuff. It's kind of amazing in a way, but I think I'll blame corporate
culture for this sort of thing... "Oh, the documentation says $X therefore
we'll believe $X. At least they can't fire us for that." Not surprisingly I
was known to be hugely skeptical of any claims made of distributed systems,
but I was too much of a coward to embark on "Aphyr's Quest" :). I'm hoping to
coin a phrase with that last bit.

------
phamilton
> Consistent with the documentation, I have never found a linearization
> failure with these settings. If you use hard durability, majority writes,
> and majority reads, single-document ops in RethinkDB appear safe.

That's the juicy bit you all should care about.

~~~
rco8786
Afaik this is the only thing he's tested that has actually achieved that,
despite nearly every system claiming that.

~~~
bri3d
I believe that Zookeeper + Curator were entirely consistent with their
documentation as well as linearizable: [https://aphyr.com/posts/291-jepsen-
zookeeper](https://aphyr.com/posts/291-jepsen-zookeeper) . Now, of course,
Zookeeper is built as a state management / coordination service more than a
full-fledged database system, so it's not quite apples to apples, but it did
offer a similarly strong showing.

I also believe Riak was entirely consistent with its documentation when using
CRDTs: [https://aphyr.com/posts/285-jepsen-
riak](https://aphyr.com/posts/285-jepsen-riak) .

I think ElasticSearch is also now consistent with its documentation...
although the documentation basically says "here be dragons."

As an aside, re-reading these makes miss the old Aphyr work before he worked
on contract. The goofy image memes and 100% irreverent tone made for good
times.

------
overcast
My last two projects have been developed with RethinkDB as the backend, and
personally I love it. I tried playing with Mongo for a bunch of projects, and
I just didn't enjoy it's shortcomings. Coming from years of MySQL, and other
relational databases, RethinkDB gives me the best of both worlds. A relational
query language, with the flexibility of schema less tables. Change feeds make
real time applications super easy, and the admin GUI is awesome. It's
essentially the next evolution of databases, developed by people who know what
they are doing. You'll also definitely want to grab Thinky.io ORM when working
in Node.js

------
tracker1
I'm consistently impressed with the results that Aphyr publishes. Always very
thorough and as balanced as I've ever seen regarding database testing.

In addition, I really appreciate the approaches that RethinkDB team have
taken. Their approach to features and growing the product have been impressive
and well calculated. I've followed several issues, and know that they work
well on changes that take multiple steps and iterations to achieve (automatic
failover, for example). I'm hoping to have more opportunity to use their
product in the future (out of my hands at my current position).

~~~
ninjakeyboard
especially considering he's sponsored by Rethink in this article.

------
qaq
I think in 3-5 years consultants will be making a killing converting projects
from one of 100(s) NoSQL flavors to RDBMS or few surviving NoSQL flavors.

~~~
iheartmemcache
I'm already carving out a niche on the east coast with Merb and Rails 2
systems. Every time I think the gold rush is over, another gift falls into my
lap. Keep on keepin' on, novelty kings!

------
eeZi
RethinkDB is great. It's the first NoSQL database I've used so far which I
actually liked. The query language is sane and the developers definitely know
what they're doing. The realtime features are amazing and work very well with
Python + asyncio.

~~~
MrBuddyCasino
How would you compare it to Cassandra?

~~~
habitue
Biggest difference is RethinkDB is a JSON document store, and Cassandra is
column-oriented. Cassandra is eventually-consistent, and RethinkDB is
immediately consistent with all writes going through a primary node for each
table.

RethinkDB also has a query language that's integrated very deeply with the
driver language itself, vs CQL which is a SQL-like text string.

------
drhodes
Last month, there was a software engineering radio episode about rethinkdb,
[http://www.se-radio.net/2015/12/se-radio-
episode-243-rethink...](http://www.se-radio.net/2015/12/se-radio-
episode-243-rethinkdb-with-slava-akhmechet/)

~~~
OmarIsmail
And the month before that they did an episode with aphyr :) [http://www.se-
radio.net/2015/11/se-radio-episode-241-kyle-ki...](http://www.se-
radio.net/2015/11/se-radio-episode-241-kyle-kingsbury-on-consensus-in-
distributed-systems/)

------
tveita
Great article as usual, but I found it hard to read due to the use of en-
dashes without spaces.

E.g this sentence:

 _However, only operations on a single document are atomic–queries which
access multiple keys may read and write inconsistent data._

which I spent a minute trying to parse as:

 _However, only operations on a single document are "atomic-queries", which
access multiple keys may read and write inconsistent data._

~~~
aidenn0
Indeed; every style guide I've read recommends either unspaced em-dashes or
spaced en-dashes.

~~~
chipotle_coyote
I'd bet this is due to a "smart typography" processor that converts runs of
hyphens to em- or en-dashes.

The standard typing convention for decades has been to use "\--" to indicate
an em-dash, and that's certainly what I've learned. Many smart converters
(including SmartyPants, the one John Gruber created as a companion to
Markdown) make that conversion. However, there's also the TeX convention of
converting "\--" to an _en-dash_ and requiring "\---" for an em-dash; some
processors will use that standard by default, like Pandoc and MultiMarkdown.
If you're typing "\--" for em-dash but you're unknowingly using a processor
that turns that into en-dashes, well, you get this.

(At some point, Tumblr converted their internal Markdown processor from _"
\--" for em-dash_ to _" \--" for en-dash,_ and hundreds of my tech blog posts
now look stupid because all the em-dashes have become em-dashes. Thanks,
Tumblr.)

~~~
doug1001
nice one! i've seen the same problem but didn't know why (always reluctant to
eliminate myself as the root cause). Agreed that re-wiring 20+ year old print-
to-digital-symbol conversions on one side then doing it again on the other
side--all under the hood is a questionable design decision.

------
zenlikethat
Great write-up and delightful to see RethinkDB performing well. I think it is
a truly fantastic database to use. Easy installation, the built-in HTTP admin,
and query language make ad-hoc data querying fun and smooth.

------
juzffoo
I really wanted to try rethink db in one of my side projects, which is gonna
be a SaaS product. What are the best practices/ support for multi tenancy,
with rethink? The front end will mostly be Rails (I currently use apartment
gem with postgres).

------
seivan
Still waiting for first class Heroku support.

------
zeisss
I was most surprised by the linked research ethics -
[http://jepsen.io/ethics.html](http://jepsen.io/ethics.html)

I think this is a resonable approach to getting funding but also being
open/transparent about it.

~~~
Perceptes
That was a great read. It's a thorough discussion of Kyle's approach and
thought process regarding conflicts of interest. It's humble and honorable.

