Hacker News new | past | comments | ask | show | jobs | submit login

Why roll your own consensus algorithm?



Historical reasons mostly.

First, Raft wasn't a thing when we started.

Second, the consensus algorithm we use today is used for cluster membership, schema changes and other agreement purposes, but not for the main transactional path. However, in versions before 3.0 it was.

At the time, we wanted to make really different tradeoffs than something like Paxos. Namely, we wanted to be able to run with two copies, rather than the three that Paxos/Raft require. We also wanted crazy throughput and low performance variation, which are hard things to get with Paxos/Raft. The main tradeoff is that our system doesn't work well across high-latency links, which is ok with us.

Ultimately we added a second layer protocol for transactions in 3.0, and this made the transaction path not dependent on clock skew. This layer actually looks a lot like Raft, but does differ in some key ways.

We may even someday switch to Raft for the core agreement stuff, but there's a lot of work to be done for that to happen.

This link has more info about how transactions work in VoltDB: https://voltdb.com/sites/default/files/tn-transactions.pdf


This is a great explanation. You can see the thought process and care that went into the design.


Did you make formal specification of your algorithms in something like TLA+?


That would be cool.

Kyle Kingsbury started something like this (in Clojure) as part of this project (https://github.com/jepsen-io/voltdb/tree/master/replication-...), but it's a large task and would probably take more than a few days.

We have done some initial work on TLA+, but it may be a while to finish anything. If someone wants to help, there's probably a publishable paper in it.


Cool. I am always a little wary of distributed algorithms that don't go through that level of validation. But I recognize that few commercial firms do that, although AWS is a stand out. I had the same criticism of ZooKeeper for a long time. Glad to hear you have at least considered it.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: