
Jepsen: VoltDB 6.3 - samber
https://aphyr.com/posts/331-jepsen-voltdb-6-3
======
jhugg
John here from VoltDB. We really enjoyed working with Kyle on this project.
We’ve got some content on our website related to this work if you’re hungry
for more, including a blog post, a FAQ on transactions and consistency and
more detail on the issues Kyle found in VoltDB 6.3 that have been fixed in
6.4. It can all be found here:
[https://voltdb.com/jepsen](https://voltdb.com/jepsen)

There are also a number of us here to answer any questions about this Jepsen
work or about VoltDB generally.

~~~
baq
why would i use voltdb? serious question. this is the first time i hear about
voltdb.

i mean, i can use a postgres or mssql cluster for SQL needs and/or i can use
riak/... for kv and/or cassandra/... for column indexed storage and/or ...
etc. why should i look at voltdb?

~~~
jhugg
Well, one thing we can say definitively today is that VoltDB offers strong
serializability, when almost no other clustered systems do, and when they do,
they are slow.

But my more typical answer is the combination of throughput and transactional
logic. No other system does both as well as VoltDB. This comes up a lot in
policy enforcement, fraud detection, online gaming, ad-tech, billing-support
and more.

Here are some blog posts that might help:

[https://voltdb.com/blog/winning-now-and-future-where-
voltdb-...](https://voltdb.com/blog/winning-now-and-future-where-voltdb-
shines) [https://voltdb.com/blog/apps-need-acid](https://voltdb.com/blog/apps-
need-acid) [https://voltdb.com/blog/call-center-example-integrating-
proc...](https://voltdb.com/blog/call-center-example-integrating-processing-
and-state-make-streaming-problems-simple-solve)

This video's not bad:

[https://voltdb.com/resources/video/transactional-
streaming-i...](https://voltdb.com/resources/video/transactional-streaming-if-
you-can-compute-it-you-can-probably-stream-it)

------
heavenlyhash
Potential readers, not sure whether or not to make the slog: Do.

This is the most effective explanation (for me, anyway) of the difference
between "serializable" and "linearizable" of any of aphyr's blogs so far.
They've been keywords in that little cladistic tree of consistency models he
draws for a while now, but with this explanation and the examples, I finally
grok what they _mean_.

Thanks, aphyr.

~~~
grogers
I'm not so sure. The definition of strict serializable/linearizable given in
this post is weaker than what most people use. Most people require that
linearizable execution must happen as-if it matches a global clock. Simply
requiring that the effects of a transaction happen within the start/end time
of the partition executing the transaction does not guarantee this. Most
distributed databases (from what I can tell, including voltdb) only guarantee
that for operations which read/write the same keys. Non-conflicting
transactions execute without any synchronization overhead - and that's a good
thing! But in the presence of side channels, you need a truly global clock
(like spanner) to achieve strict serializablility.

~~~
aphyr
_Most people require that linearizable execution must happen as-if it matches
a global clock. Simply requiring that the effects of a transaction happen
within the start /end time of the partition executing the transaction does not
guarantee this._

I'm not sure what you mean by "within the start/end time of the partition
executing the transaction". Nothing in the informal definition I provided
mentioned partitions, or even process-local orders (that'd be sequential
consistency) so ... yeah, I dunno where you got this from. I agree that
linearizability is a global real-time constraint, and I use that sense in the
post and in the Knossos verifier.

 _Most distributed databases (from what I can tell, including voltdb) only
guarantee that for operations which read /write the same keys._

You raise an interesting question: if I verify only that operations on a
single key are linearizable, have I also verified that operations on _systems_
of independent keys are linearizable? The answer, as far as I know, is yes:
linearizability is a local (or "composable") property. From Herlihy & Wing
([https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf](https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf)):

 _Unlike alternative correctness conditions such as sequential consistency
[31] or serializability [40], linearizability is a local property: a system is
linearizable if each individual object is linearizable. Locality enhances
modularity and concurrency, since objects can be implemented and verified
independently, and run-time scheduling can be completely decentralized._

This is a commonly cited property in the literature, and has been proven
several ways--for instance, see Lin's recent constructive proof
([http://arxiv.org/abs/1412.8324](http://arxiv.org/abs/1412.8324)). There is
research showing linearizable systems vary in the probability distribution of
outcomes
([https://arxiv.org/pdf/1103.4690.pdf](https://arxiv.org/pdf/1103.4690.pdf)),
but this does not affect safety.

However, your comment led me to Charron-Bost & Cori 2003
([http://www.worldscientific.com/doi/abs/10.1142/S012962640300...](http://www.worldscientific.com/doi/abs/10.1142/S0129626403001100)),
whose abstract claims a counterexample system of two linearizable objects
whose composed system is nonlinearizable. I haven't found the full text yet,
and I'm not familiar with their sense of "The Global Time Axiom", so it's
possible their finding is still consistent with "linearizability is
composable". Not sure.

In any case, the multi-key tests in this analysis _do_ perform single-key
transactions (as well as multi-key transactions), and verify that their
composed system is fully linearizable. Because the state space for composed
systems is larger, these tests aren't as good at finding bugs--but if
composability turns out _not_ to hold, I can use this strategy more often.

 _But in the presence of side channels, you need a truly global clock (like
spanner) to achieve strict serializablility._

As I understand it, Spanner's global clocks are a performance optimization,
not a correctness condition. If linearizability required a global clock,
Zookeeper
([http://static.cs.brown.edu/courses/cs227/archives/2012/paper...](http://static.cs.brown.edu/courses/cs227/archives/2012/papers/replication/hunt.pdf))
and Raft ([https://raft.github.io/raft.pdf](https://raft.github.io/raft.pdf))
wouldn't be able to provide linearizable semantics. It is, of course, possible
that these papers are wrong, in which case I encourage you to publish!

~~~
aphyr
(I've since skimmed Charron-Bost & Cori, and it shows that linearizability is
not composable when there does not exist a total order of invocation and
response events. This might be of use in relativistic scenarios with...
accelerating spacecraft which still need to perform linearizable computation?
I don't think it's particularly relevant to clocks down here on the geoid.)

~~~
jhugg
I will update the docs to acknowledge that consistency guarantees may be
compromised if the relative speed between servers in a cluster or clients is a
nontrivial fraction of the speed of light.

Let me know if I don't have that right.

------
MichaelGG
VoltDB is a real fun platform. I've used it around the v1 and v2 era.
Incredibly high tx rate (on few-hundred-dollar servers 5 years ago could get
150K tx/sec.) Being able to use SQL is fantastic. The original research papers
(Volt's diverged significantly now I understand) are good reads[1].

It's a good fit any time you're considering storing a bunch of data in-memory
for performance reasons. (Like telecom routing info.) Instead of writing a
custom daemon, just spit it into VoltDB. Get replication, performance, etc.
for free! Very neat.

Sadly the open source version isn't very ACID. They dropped the D from
community edition. So you can scale out, but if any node dies, you're toast.
There's still some uses, where you're running a transient or easily-
rebuildable dataset. Or where you can manually run multiple full nodes (though
I guess you'd need to implement cluster failover manually).

I guess it shows that it is hard to make a living off of open source products
if they're really great. I've heard this from other open-source companies: the
product's fantastic, no one pays. But make a taste as open source, basically a
demo/trial, and get them to upgrade to commercial.

1:
[http://hstore.cs.brown.edu/publications/](http://hstore.cs.brown.edu/publications/)

------
po
This post is all about the failings of version 6.3 but it buries the lede:

 _Version 6.4 includes fixes for all the issues discussed here: stale reads,
dirty reads, lost updates (due to both partition detection races and invalid
recovery plans), and read-only transaction reordering are all fixed, plus
several incidental bugs the VoltDB team identified. After 6.4, VoltDB plans to
introduce per-session and per-request isolation levels for users who prefer
weaker isolation guarantees in exchange for improved latency. VoltDB’s pre-6.4
development builds have now passed all the original Jepsen tests, as well as
more aggressive elaborations on their themes. Version 6.4 appears to provide
strong serializability: the strongest safety invariant of any system we’ve
tested thus far. This is not a guarantee of correctness: Jepsen can only
demonstrate faults, not their absence. However, I am confident that the
scenarios we identified in these tests have been resolved. VoltDB has also
expanded their internal test suite to replicate Jepsen’s findings, which
should help prevent regressions._

I read the top part and thought 'oh, another system that fails to meet their
claims' but it's pretty impressive that they did the work to fix it. Nice job.

------
baq
delightful read for the inquisitive mind. do yourself a favor and consume this
piece of technology writing at its best.

------
anilgulecha
[offtopic] May I know what was used to build those handwritten-but-not-really
font in images? Any program or app that allows uses multiple glyphs for the
same character?

------
kalantri
This is good to know.

