
Jepsen: Aerospike 3.99.0.3 - aphyr
https://jepsen.io/analyses/aerospike-3-99-0-3
======
shaklee3
I highly recommend people watch Kyle Kingsbury's talks on YouTube. They're
extremely interesting, especially to see the differences in issues that affect
each database. He's also a great speaker, and makes it entertaining.

------
atombender
Anyone using AeroSpike in production? It's not a project you hear about much
these days.

~~~
jamesblonde
MySQL Cluster (NDB) is a better choice for many looking for a near real-time
key-value store. It support cross-partition transactions, partition-pruned
index scans, partition-aware transactions, multi-master asynchronous
geographic replication between clusters, an Event API, on-disk columns, on-
line add node, and has been around the block a lot longer. We benchmarked
AeroSpike vs NDB (a key-value store on top of NDB), and NDB out-performed
Aerospike for most workloads with equivalent hardware: [http://www.diva-
portal.org/smash/get/diva2:849736/FULLTEXT01...](http://www.diva-
portal.org/smash/get/diva2:849736/FULLTEXT01.pdf)

~~~
jbergens
Is this tested with Jepsen?

~~~
jamesblonde
I don't think so. I believe it would fail quite a few of Jespen's tests, as it
is a real-time DB. Typically, you set the transaction deadlock detection
timeout to be a couple of seconds. Jepsen thinks transactions can take minutes
to complete, and it's ok - the messages are just late. In fact, in NDB, most
people think 2PC is a blocking protocol, but it isn't in NDB. If the
Transaction Coordinator (TC) fails, there is a quick leader election protocol
and a new one takes over failed TCs. Participant failures cause blocking - but
only a few seconds (the configured deadlock detection timeout). So, it's
behaviour is similar to abortable consensus (aka Paxos), if automatically
retry failed transactions.

~~~
aphyr
> Jepsen thinks transactions can take minutes to complete, and it's ok

Jepsen takes great pains to make no assumptions about the time it takes for
transactions to execute. We test systems with millisecond-level timeouts, or
multi-minute timeouts, and both work fine.

~~~
jamesblonde
Ok, my mistake. I saw you tried to install NDB several years ago, without any
luck. Nowadays, there's mysql cluster mgr, severalnines installer, and we
wrote some chef code to automate it. So it's pretty straightforward.

------
bbulkow
Aerospike Founder here. Does anyone care that, after working with Kyle and
generating since new versions through the development process, Jepsen found no
real consistancy problems?

ie, “In hundreds of tests of SC mode through network partitions, 3.99.1.5 and
higher versions have not shown any sign of nonlinearizable histories, lost
increments to counters, or lost updates to sets.” – Kyle Kingsbury, Aerospike
3.99.0.3, 12-27-2017

Curious.

~~~
zzzcpan
As someone who follows Kyle's work I can say that with Jepsen it's only
impressive and shows how serious you are about consistency and distributed
systems if there are no real consistency problems on the first try.

------
argestes
Be aware it's nothing to do with rocket science.

~~~
lokimedes
There are so many software libraries named after normal concepts from physics
and engineering. While I see it as dreamfull flattery, it is really damned
annoying when you travel in all of these worlds, especially here on HN where
people post all of it.

(Best Regards, Grumpy software-, (defense-)systems engineer and former
particle physicist).

~~~
pdelbarba
I find this one especially egregious because Jepsen looks a lot like Jeppesen
the aerospace products company. They mostly do charts and stuff though so I
was pretty confused that they were mentioned alongside the term 'aerospike'

~~~
nemothekid
IIRC, Jepsen here is named after Carly Rae Jepsen, the pop singer.

~~~
johnymontana
As in "call me maybe", both a reference to the song lyrics and what might
happen in a distributed system when a network partition occurs.

------
themonk
Community edition is not a production ready product.

Use Aerospike only if you are ready to buy enterprise edition. Jepsen test
results are applicable to enterprise edition only.

~~~
5ersi
Can you elaborate on that? What's missing from CE that makes it not suitable
for production? Thanks for your input.

~~~
manigandham
One of the biggest issues is that deletes aren't durable in the CE edition.
The index entry will be removed but the key and data remains, and may or may
not be reclaimed by a background compaction process. If the server restarts
before the data is compacted, the deleted data is revived because the index is
recreated.

------
bmn__
I would like to run the jepsen tests on my own with the current versions of
Mongo and Redis, does anyone know how to do that?

~~~
aphyr
The Redis test was quite early in Jepsen's history, and I totally rewrote the
library after that--you can still find the original code in the `old` branch,
if you'd like to experiment. The Mongo tests are reasonably current; you'll
just need to pass a new --package-url on the CLI, I thiiiink. No promises, of
course, compatibility is always a moving target. ;-)

------
RobertRoberts
I am asking this because I think it's funny (so don't be mad) but does anyone
know what this means?

 _" AEROSPIKE 4.0 NOW GA

Strong Consistency with High Performance for Systems of Record and Systems of
Engagement"_

It seems like Marketing 3.0 speak, maybe their target market would understand
it intuitively?

~~~
glibgil
> Systems of Record

OLTP

> Systems of Engagement

OLAP

~~~
RobertRoberts
ty, knowing they actually mean something is reassuring. I've been seeing too
much of this kind of buzzword-mashups lately. Glad to see I am wrong. :)

------
bboreham
This kind of analysis is awesome. In a past job I might have deployed
Aerospike, then eventually realised it sometimes lost my data. But that would
be years later, after much midnight oil had been burned.

------
breakyerself
This reminds me of the turbo encabulator.

~~~
ttul
[https://www.youtube.com/watch?v=rLDgQg6bq7o](https://www.youtube.com/watch?v=rLDgQg6bq7o)

------
thriftwy
My take - if you didn't test your app in nodr loss scenario, you are likely to
lose data when that happens, even if your DB won't.

