
“Follow-The-Workload” Beats the Latency-Survivability Tradeoff in CockroachDB - awoods187
https://www.cockroachlabs.com/blog/follow-the-workload/
======
muxator
I always longed for the existence of such an Ops-friendly DB, which - building
on solid distributed systems concepts - could make easy and correct the hard
things (consistency, geographical replication, effective possibility of zero
downtime).

Now that I have a good candidate, I find myself wondering whether the
performances would be good enough to ditch traditional DBs (I know: I should
define "workload" before thinking about "performances"...)

I guess is the curse of tradeoffs: nothing in life is free.

~~~
arrty88
We had it with RethinkDB already!

~~~
a012
Does rethinkdb have ACID? No?

~~~
elvinyung
Whether that actually matters is arguable: _No one_ has performant distributed
(i.e. cross-partition) ACID. (Except maybe Google.)

For the thing that _does_ matter, which is ACID within the same
partition/entity group, I would argue that RethinkDB already does have that.

Mike Stonebraker says that there are no fast distributed transactions [1]. I'm
inclined to believe him, and that's _not_ just an appeal to authority.

[1]
[https://www.youtube.com/watch?v=KRcecxdGxvQ&t=54m22s](https://www.youtube.com/watch?v=KRcecxdGxvQ&t=54m22s)

------
ericand
For reference, this is part 2 of a two part series. Here is part 1:
[https://www.cockroachlabs.com/blog/tunable-controls-for-
late...](https://www.cockroachlabs.com/blog/tunable-controls-for-latency-
survivability/)

------
ahahuaya
Cockroachdb looks very exciting. I’m waiting for it’s Postgres compatibility
to be far enough along that it works with Elixir/Phoenix. Is that likely to
happen any time soon?

~~~
legojoey17
If you are willing to give it a shot, there is currently a fork of postgrex
from someone in the Elixir community! It handles some of rough edges of the
incompatibility. You can find it at
[https://hexdocs.pm/postgrex_cdb/readme.html](https://hexdocs.pm/postgrex_cdb/readme.html)
and the source code at
[https://github.com/jumpn/postgrex](https://github.com/jumpn/postgrex).

For any other issues you run into when using it, you may want to see the
discussion about postgrex compatibility at
[https://github.com/cockroachdb/cockroach/issues/5582](https://github.com/cockroachdb/cockroach/issues/5582).
If you run into different problems please do file the issue :).

I'm also an Elixir junkie, so I would also love to see ORM compatibility here!
I definitely want to dedicated time to it if I can. (Disclaimer: I'm currently
interning at Cockroach Labs.)

~~~
cpursley
I'd love to see a port of
[https://github.com/begriffs/postgrest](https://github.com/begriffs/postgrest)
to Elixir. Cockroachdb + Elixir-based Postgrest implementation would be the
ultimate platform for realtime distributed web scale.

------
sempron64
Can anyone elaborate as to how CockroachDB compares to Scylla? It seems to me
like they're very similar on a surface level.

~~~
coltonv
Sure!

Cockroach DB is a full SQL compatible, strongly consistent key-value store.
This means you can get the scalability and performance of a key-value store
alongside the comfortable SQL query language. You, and all the other
developers who work with you who already know SQL, can do pretty much all the
things you know and love from SQL like joins, secondary indexes, etc with full
ACID compliance. This means that when you read data from the database you
always know with 100% certainty you're going to get up to date values (no
stale reads).

ScyllaDB is similar in that data is stored in tables with a defined schema,
but it uses a different query language, CQL[1], which is often similar to SQL.
You can't to joins but you can have secondary indexes. You can store most of
the same data types that you know and love from a standard SQL store.
Interestingly enough, you get to CHOOSE the level of consistency you get, so
you can make your ScyllaDB strongly consistent or choose from an array of
eventually consistent options[2]. Most people however go with one of the
eventually consistent options, which allow Scylla to be insanely performant
and scalable. At the cost of strong consistency, you get an extremely high
performance at an almost infinite scale. CockroachDB, while performant and
scalable, can't match it here. It stands almost on a tier of it's own in terms
of scalability and performance.

So really, the choice is yours based on what you're looking for. I'd choose
CockroachDB for my purposes since I'm not storing Apple levels of data and
consistency is important to my work, but your specifications and needs may be
different.

[1] [http://docs.scylladb.com/getting-
started/ddl/](http://docs.scylladb.com/getting-started/ddl/) [2]
[http://docs.scylladb.com/architecture/architecture-fault-
tol...](http://docs.scylladb.com/architecture/architecture-fault-tolerance/)

------
adrianratnapala
So is this for helping with read, but not write latency?

I know little about CockroachDB, but I vaguely remember that it has consistent
replication. If so, then writes would require some round trips before
consensus is reached, killing any latency benefits from having the master
follow the load (unless you move all your replicas around!).

~~~
a-robinson
It helps with both, but is more dramatic for reads.

Reads go from 1 RTT to 0 RTTs, while writes go from 2 RTTs to 1 RTT. For
example, consider a cluster in which the nodes are 200ms RTT away from each
other.

* A read will take 200ms if the leaseholder isn't in the local datacenter vs single-digit milliseconds if it is local.

* A write will take 400ms if the leaseholder isn't in the local datacenter vs 200 milliseconds if it is local.

The write takes 400ms because it takes 100ms to get from the local datacenter
to leaseholder, 200ms to reach consensus (the leaseholder needs to send a
request to and get a response from one of the followers), and another 100ms
for the response from the leaseholder to get back to the local datacenter
where the request originated.

~~~
humanfromearth
Really looking forward to see this in action. Is follow the workload already
present in 1.1.3?

Is there a bit more detailed technical doc on how this is actually achieved
ideally with some examples? When is it decided that the lease holder moves
from one node to another? Is it based on some sort of stats?

Edit: found the doc follow-the-workload here:
[https://www.cockroachlabs.com/docs/stable/demo-follow-the-
wo...](https://www.cockroachlabs.com/docs/stable/demo-follow-the-
workload.html)

~~~
a-robinson
It's actually even in 1.0.0! We just haven't written about it previously.

The doc you found is the best resource. If you want even more detail, you can
check out the original RFC:
[https://github.com/cockroachdb/cockroach/blob/master/docs/RF...](https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170125_leaseholder_locality.md)

------
jinqueeny
Great post! Do you have benchmark to show how CRDB is compared with TiDB in
terms of performance and latency? And how CRDB is compared with Amazon Aurora
with PostgreSQL compatibility?

------
toolslive
Isn't "Follow-the-workload" something that is naturally achieved when Mencius
is used iso Raft?

~~~
a-robinson
WAN-optimized consensus protocols like Mencius and Egalitarian Paxos indeed
are better for writes because any node can commit a write in one round trip,
but they don't allow for reads with zero round trips. Every read must go
through the consensus machinery in order to ensure consistency, whereas in
CockroachDB the leaseholder can serve reads without needing to consult other
nodes.

~~~
toolslive
why do reads have to go through the consensus machinery ? If they don't go
through the machinery, read results will still be consistent, they just might
not include the very last updates, but you cannot avoid that. Raft has the
same problem (they all have, and it's unavoidable): suppose you read from the
Leader (Master, whatever you call it); the leader sends you a perfect fresh
consistent value; however, your network takes time to deliver it to you and
meanwhile an update might have happened. It just shows that the key value
store should provide atomic multi updates (or at least compare-and-swap )

~~~
cube2222
Without the consensus read you can get inconsistent values.

Suppose you write value a to leader A and then value b to leader A. Node B got
both updates, node C got only the a value because of network timeouts.

Now you do a read from B of both values, you get the current a and b. Then you
do a read from C, you receive the current a and the old b. Inconsistent.

Basically a violation of serializability and linearizability across the
cluster in respect to the client.

~~~
toolslive
No. In your example, value 'a' will not have been applied to the store in C
yet, as there is a hole in the log: a set of updates that contains update 'b'
(and possibly others). So the view in A will be more up to date than the view
in C, but both will be consistent.

