Honestly, Cassandra's Jepsen didn't set a high bar: https://aphyr.com/posts/294-...

cbsmith · on Sept 23, 2015

Except that problem has been largely addressed now.

saurik · on Sept 23, 2015

I really really really want to see Aphyr attack the patched version to see if he thinks the fix actually worked.

gegtik · on Sept 23, 2015

Datastax is presenting on the topic at their Summit on thursday http://cassandrasummit-datastax.com/agenda/testing-cassandra...

sargun · on Sept 23, 2015

How so? The fundamental flaw was using timestamps.

acconsta · on Sept 23, 2015

Right, I should add that it was two years ago. My point is that the age of a project has nothing to do with the correctness of its Paxos implementation.

cbsmith · on Sept 23, 2015

Jepsen's finding wasn't that there was a bug in Paxos. It was in how it handled conflicts.

acconsta · on Sept 23, 2015

"So you confer with DataStax for a while, and they manage to reproduce and fix the bug: #6029 (Lightweight transactions race render primary key useless), and #5985 (Paxos replay of in progress update is incorrect). You start building patched versions of Cassandra."

"Cassandra lightweight transactions are not even close to correct. Depending on throughput, they may drop anywhere from 1-5% of acknowledged writes–and this doesn’t even require a network partition to demonstrate. It’s just a broken implementation of Paxos. In addition to the deadlock bug, these Jepsen tests revealed #6012 (Cassandra may accept multiple proposals for a single Paxos round) and #6013 (unnecessarily high false negative probabilities)."

That's four bugs independent of the conflict resolution issue.

cbsmith · on Sept 23, 2015

Oh LWT's are a mess, indeed. Thankfully, you don't normally need them.