

Safety and liveness: Eventual consistency is not safe - pbailis
http://bailis.org/blog/safety-and-liveness-eventual-consistency-is-not-safe/

======
cperciva
_[1] Eventual convergence is likely the strongest convergence property we can
guarantee given unbounded partition durations._

I don't think this is true. Consider the property I call "eventually known
consistency", wherein the system can be asked "are all operations performed
before time T visible everywhere?", with a "yes"/"maybe" response, where "yes"
is guaranteed to eventually be returned after some bounded period of non-
partition.

Eventually known consistency can be used to get AP (just ask for the data), CP
(let T be the current time; spin until ConsistentUpTo(T) returns true; then
perform the read), or CA (in the sense that if as long as a partition does not
occur, the algorithm for CP provides a response within a bounded time), and is
thus strictly stronger than other properties.

~~~
pbailis
I'm not sure I entirely follow your argument. I agree that you can use some
sort of heartbeat protocol or negative acknowledgments to determine whether
writes occurred within a given time window (e.g., I haven't heard from the
cluster, so _maybe_ I missed an update in the time since the last heartbeat).
However, in general, I don't believe it's possible to guarantee non-trivial
convergence for fixed T and unlimited partition durations.

For any given T, if I partition each of your nodes for T+1 seconds, you won't
be able to guarantee convergence--your nodes won't communicate. Am I missing
something?

~~~
Dylan16807
I think you missed this line.

> "yes" is guaranteed to eventually be returned after some bounded period of
> non-partition

~~~
pbailis
If you can bound partition durations, you can definitely make stronger
guarantees.

If you can model your network delays, you can use some modeling like our work
on PBS (Probabilistically Bounded Staleness) to predict staleness:
<http://pbs.cs.berkeley.edu/#demo>

------
saurik
FWIW, the original Dynamo concept supported safety by returning not just the
latest version, but all conflicting versions: the client then had the
opportunity to make a merged version of the data that was newer than either of
the inputs, and store that as a replacement.

It should therefore be remembered that many/most implementations of "eventual
consistency" have these issues, it is not a requirement of the mechanism, and
some implementations realize this and either have merge implementations or
have plans to provide them.

(I am not certain where Cassandra is on this axis, but last I paid attention
they were actively trying to decide whether to modify the client protocol to
match Dynamo, or provide server-assisted merge operators more similar to their
existing server-assisted comparison operators.)

~~~
bonzoesc
> FWIW, the original Dynamo concept supported safety by returning not just the
> latest version, but all conflicting versions: the client then had the
> opportunity to make a merged version of the data that was newer than either
> of the inputs, and store that as a replacement.

Riak can do this as well; tracking siblings and inheritance with vector
clocks, allowing a writer to say that a particular version represents the
merge of two previous versions.

~~~
pbailis
You're right--I definitely simplified the discussion of handling concurrent
writes. Keeping around concurrent versions and using user-specified merge
functions are two ways of deciding which versions to store. Riak indeed
supports keeping concurrent versions or using an automatic last-writer-wins
policy [1]

The question of _which versions_ will be returned depends on the safety
properties of the consistency model. My main point isn't that Dynamo or Riak
don't provide safety properties, it's that "eventual consistency" isn't a
safety property on its own.

[1] <http://wiki.basho.com/Vector-Clocks.html>

------
StefanKarpinski
tl;dr. "Eventual consistency" is a b.s. marketing term that actually has
nearly no meaning.

The easiest way to see just how empty the definition is, is by negating it and
seing what undesirable property a system would have to have in order to fail
to be eventually consistent. Take the definition from wikipedia:

"Given a sufficiently long period of time over which no changes are sent, all
updates can be expected to propagate eventually through the system and all the
replicas will be consistent."

Negation:

"No matter how long you wait without sending changes, updates may not
propagate through the system, and disagreement may continue to exist between
replicas indefinitely."

So basically, saying that something is "eventually consistent" just means "we
won't completely ignore you forever". Great.

~~~
pbailis
Eventual consistency is definitely weak, but it's useful: if we couple
eventual convergence with safety properties, then we can describe non-trivial
properties.

The problem, which I alluded to in my first footnote, is that it's hard to
guarantee anything stronger than _eventual_ in the presence of partitions. If
you want to make sure your system behaves "correctly" under arbitrary
partitioning, you necessarily have to admit the trivial cases. What you _also_
want to guarantee is that, in the absence of partitions, the system still
"does the right thing"; this often gets lost in the definition of eventually
consistent systems, which is why you should consider both safety and liveness.

~~~
jf271
If you completely understand the weaknesses of eventual consistency you will
use is only where it is viable and the risk is justified. There is a reason
the relational databases haven't gone away completely.

~~~
pbailis
As a shameless plug, I might add that if you can model your network delays,
you can use some modeling like our work on PBS (Probabilistically Bounded
Staleness) to predict staleness: <http://pbs.cs.berkeley.edu/#demo>

I'd also add that the consistency related to ACID semantics from relational
databases refers to transactional consistency, not replica consistency.
Indeed, distributed RDBMSs often opt for strong (replica) consistency models,
but there is no reason a distributed relational database can't be weakly
(replica) consistent while maintaining ACID semantics on a single machine.
Moreover, if the RDBMS must needs to be available in the presence of
partitions, it must be weakly (replica) consistent.

