
Don't Settle for Eventual Consistency (2014) - alrex021
https://queue.acm.org/detail.cfm?id=2610533
======
taeric
Alternatively, don't require strong consistency everywhere. Instead, make sure
to have it places it makes sense, and always reason about it.

I view this as a generalization of optimistic locking on source control. I
don't envy my past self for having to checkout code with a lock that I was
going to change it.

~~~
sriram_malhar
Are you saying that optimistic locking isn't strongly consistent? If so,
that's not quite correct. When the locking is optimistic, a serialisable
transaction would abort if found conflicting with another transaction.

~~~
taeric
I'm saying that requiring strong consistency is like requiring pessimistic
locking in an org. In some ways, it sounds easier and seems to have better
guarantees. In practice, it is more work and slows you down.

~~~
sriram_malhar
Strong consistency has strong guarantees, not just 'seems' to have -- hence
the name.

In practice, strong consistency is less work for the client in many many
(most?) cases. If concurrent clients can update the same key concurrently in
several places, then the onus is on the client to somehow merge the differing
versions at some point. There are few good options to do so correctly and
efficiently. Let's consider the options.

a) Timestamps and last writer wins. Every version is time-stamped, and while
merging, the latest timestamped version wins out. This is error-prone; a
misconfigured clock can cause havoc.

b) Version-vectors: Allow you to track the causal history of different
versions, but you still have the problem that if two versions are indeed
concurrent, the client has to have code to merge them automatically --
experience with the Bayou system showed that this isn't easy. Your source-code
example is in this category; recall how Git cannot always automatically merge
conflicting changes ... the developer has to step in for manual changes.

3) Lattices. If you have the good fortune to have your data exhibit lattice
properties (like CRDTs), then automatic merging is easy and elegant. However,
you still don't get compositionality.

For a vast majority of the examples in recent history that have made a case
for eventual consistency, subsequent examples have shown how to achieve strong
consistency and still maintain respectable performance, as the subject of this
thread indicates. Google Spanner and CockroachDb are linearizable, the
strongest possible consistency, and are geo-replicated and unburdened by
traditional lock or 2PC issues.

~~~
taeric
Yeah, I'm not saying to completely abandon it. However, folks are well served
learning that idempotency is your friend. Not always achievable, to be sure.
However, even ec2 lets you launch instances with an idempotency key.

So, where I said "seems" to have. Take that more as these "seem to be required
guarantees." In practice, they are not required and you need to have plans for
them failing, at any rate. Rather than build up a system so that fluke
failures are painful, make one that is constantly checking for failures and
doing incremental work.

But, above all, my point was to reason about it. Not to take anything as
written in stone.

------
pdexter
See previous comments here
[https://news.ycombinator.com/item?id=7632303](https://news.ycombinator.com/item?id=7632303)

------
ianamartin
I would love to see a Jepsen evaluation of this system. I am not an expert in
this area, but it still strikes me as eventually consistent. Discussions
around data stores need to be bounded by what the data store can guarantee.
Not what optimistic conditions allow for. The paper is pretty light on details
about how the dependency graph works. I think that in theory, there is an
interesting idea here that basically packages a dependency graph along with
every message. But consider this failure case:

t0: alice "I lost my wedding ring." t1: alice "Nevermind, I found it."

Because of the peculiar circumstances around traffic at this moment, the nodes
closest to Alice don't get that second message. Instead, the nodes closest to
bob receive the message, and bob sees the message before alice does. Bob
replies, and because everything is okay with the nodes close to him, the
message gets passed back to alice. What happens? The node alice is reading
from is having issues, so things are re-routed to bob's cluster. Bob's message
has a timestamp that legitimately puts it in between the two messages from
alice because the server clocks are off a little.

Now, whatever was going wrong with the cluster close to alice has repaired and
it receives bob's message with a node in the graph that alice's cluster is
completely unaware of because it hasn't caught up.

Bob's message is telling the cluster, "You can't display this message (c)
unless you have already acknowledged prior messages (a and b)." Alice's
cluster says, "I have message a, but b doesn't exist."

According to the paper, bob's message will never be delivered to alice until
the cluster closest to alice sync's up with some other remote cluster and her
second message is restored. Which, of course, we know can fail to happen with
Cassandra, and the projects mentioned are forks of Cassandra.

I can't prove it formally, but this feels like the problems you run into with
messaging queues: you can guarantee at least once delivery or at most once
delivery, but not both at the same time. But a different set of tradeoffs:
under this theoretical model, you can deliver in order or not at all.

Obviously, this is a use case facebook would care about. But I don't know that
it's generally applicable for very many use cases.

Unless I'm missing something important, transmitting the graph doesn't really
do anything to change the fact that this is eventual consistency and brings
all of the problems that come with that. I guess the difference between this
and eventual consistency and causal consistency is that one provides out-of-
date/out-of-order data in an error condition, and the other presumably
provides nothing but an error. YMMV.

But to a larger point, calling a combined system of client libraries with a
forked version of Cassandra some other-category-of-consistency datastore seems
to me to be disingenuous. You're no longer talking about a data store. You're
talking about an ecosystem of libraries and developer practices and a
datastore that work together to provide some workarounds for the challenge
that CAP theorem presents. This is, unless I'm badly misreading the article,
granting a lot of leeway.

I also find it unfortunate that the optimized version of this only compared
throughput without discussing failure cases at all. CAP theorem is all and
only about what can be guaranteed in failure cases. We need to agree from now
on that if we're going to talk about CAP theorem, that we are talking about
guarantees in failure modes, not optimistic modes. Otherwise, there is no
value found by invoking the theorem.

And this strikes me as a particularly bold assumption given almost anyone's
real-world experience: "It is partition-tolerant because all reads are in the
local data center (partitions are assumed to occur only in the wide area, not
in the local data center)."

No, kind sir. It is not partition tolerant if that is your assumption. It has
already failed the P part of CAP if that's an assumption you are relying on.

I'm not trying to shoot the article or the authors down here. Now that CAP
theorem has been proved to be a conceptual "pick 2" situation, of course we
have to do everything we can to make up for that and do our best, and this
seems like a great effort at that.

But I think that the authors fundamentally misunderstand the point of writing
a paper about a data store and its relationship to CAP theorem. It's not about
optimistic cases, software development patterns, or libraries that sit on top
of the data store.

It's about guarantees of what happens in failure modes. From reading this
paper, all I can tell with certainty is that it's not consistent, and it's not
partition-tolerant. It's highly available. To me that seems like it's a step
backwards from platforms that are, or at least aim to be, CA or AP.

~~~
wuch
That is good point, so far I see one potential explanation, but it presumes
that clients are on equal footing with other peers in the system. Whenever you
send a write and destination discovers that it doesn't have all of writes
operations this write depends on, you send them as well. On the other hand, if
clients are tracking only dependencies without actual data behind them, then I
don't see how to ensure availability - maybe with fixed entry point to the
system (one for each client), but then it is not exactly available.

EDIT:

I did look at the actual paper [0], they provide a strong consistency within
local data center, and a causal consistency with convergent conflict handling
across data centers. They presume that each client resides within a data
center, so it is essentially a variant of fixed entry point solution I
mentioned earlier.

Additionally, what it quite important, you lose causal consistency if client
connects to different data centers.

[0] Don't Settle for Eventual: Scalable Causal Consistency for Wide-Area
Storage with COPS

