
Call Me Maybe: Percona XtraDB Cluster - stefans
https://aphyr.com/posts/328-call-me-maybe-percona-xtradb-cluster
======
sciurus
For those who want the backstory-

Discussion of Aphyr's MariaDB Galera Cluster post:
[https://news.ycombinator.com/item?id=10171942](https://news.ycombinator.com/item?id=10171942)

Discussion of Percona’s response to it:
[https://news.ycombinator.com/item?id=10238690](https://news.ycombinator.com/item?id=10238690)

------
MichaelGG
Excellent as usual. One question though. When refuting the rebuttal's claim:

>Moreover, if we test the same workload on a simple single instance InnoDB, we
will get the same result.

He simply states that he tested it and it didn't happen, so it must be OK. But
later on says the x=x+y form perhaps only accidentally works as "I suspect
that this only passes because the window of concurrency for the read/write
cycle is very short".

Why wouldn't this be true of the single-node scenario? Certainly things are
faster on a single node, so unless Jepsen is pre-empting and delaying each
thread-instruction combination, a test passing certainly isn't proof of it
actually being OK, right? It's possible that the same workload on a heavily
overloaded instance, might, somehow, exhibit the same behaviour, eh? I
understand that the next sentence goes on to attribute it to a likely problems
with Galera's locks. But the test alone shouldn't be conclusive evidence. Not
trying to be pedantic or discuss what "proof" means just curious as to if I
missed something.

------
_Codemonkeyism
Concerning his claims about CAP and that their cluster is neither C nor A,
because of P, while Percona claims it is CA,

some days ago a paper was linked here

"A Critique of the CAP Theorem" \-
[http://arxiv.org/abs/1509.05393](http://arxiv.org/abs/1509.05393)

which I found very englighting about the P in CAP.

~~~
pgaddict
I think this article from 2012 by Brewer himself is a good read too - a bit
shorter and somewhat more comprehensible: [http://www.infoq.com/articles/cap-
twelve-years-later-how-the...](http://www.infoq.com/articles/cap-twelve-years-
later-how-the-rules-have-changed)

There's a bunch of other articles listed on the wikipedia CAP page
[[https://en.wikipedia.org/wiki/CAP_theorem](https://en.wikipedia.org/wiki/CAP_theorem)],
for example the post by Abadi is worth reading.

The #1 issue is that the "2 out of 3" formulation is utterly misleading, as it
sounds as if all the three options are feature of the distributed system. In
reality, people don't control "P" \- it's a feature of the network. You may be
very careful about designing and operating your network, but one day a
partition will happen. So "CA" systems are nonsense.

The other problem is that while CAP uses the same basic terms as ACID, the
meaning of those terms is entirely different. This makes discussions about the
CAP vs. ACID stuff pretty much impossible, because each group thinks
Consistency or Availability means the same thing in both worlds. So RDBMS
people will shout "We have availability, just like you!" and NoSQL people will
shout "We have consistency, just like you!" Madness ...

~~~
lomnakkus
Re: AVID vs CAP.

You're right about the C(onsistency) bit, but the A in ACID stands for Atomic,
not Available.

~~~
pgaddict
Good point, but I wasn't implying that the "A" in ACID stands for Availability
(sorry if that was unclear), but that what people dealing with traditional
systems (build on ACID) mean when they speak about "Availability".

For example it may mean a master-slave system with an automatic failover,
which is rather different from what Availability means in CAP.

~~~
lomnakkus
Oh, sorry. The phrasing led me to believe that's what you were saying.

As you were.

------
MrBuddyCasino
And another one bites the dust. Talked to a Couchbase guy last week,
apparently a Jepsen test in is the works - I'm looking forward to it!

------
fasteo
I had high hopes in Percona guys with the terrific job they are doing in the
MySql space. In our experience, Percona server is working great in production
and percona toolkit is a life saver for us.

My very personal takeaway from all these aphyr tests: Distributed systems are
hard. Out of the box, very few work properly and probably none of them will
work as expected because of a faulty deployment. Think twice before deploying
one and consider other ways to scale your system.

In our case, we are going with the memcached approach: client-side sharding
with some kind of replication in the backend (mysql master-slave, drdb, etc).
Far easier to deploy and, what is more important, far easier to recover when
something goes wrong.

