Is anyone else surprised at the relatively mild reaction to this result here on Hacker News? Only a couple dozen comments, and many are addressing other topics.
Can we take a second to appreciate that this result will rapidly become the first thing taught in every single distributed systems class? And if this holds as a generalization of trustful distributed consensus as a field, then she has defined its Turing Machine equivalent. And it is even remarkably easy to understand! That is wild!!
Props to the authors. Personally, this is going on the top of my “to memorize” list.
I disagree. This paper generalizes variants of Paxos - certainly not the class of all consensus algorithms. Moreoever, it generalizes Paxos variants, with different requirements - weaker quorums are always a tradeoff with liveness in the presense of faults.
tldr; I don't see world-changing novelty here, theoretically. There are dozens of other consensus algorithms in the world, some randomized, some not quorum based, etc. This is interesting if you're in Paxos land, trying to tune your Paxos for different fault-tolerance requirements. It's no way comparable to a computational model.
It generalizes specifically a class of consensus algorithms that write to sequences of registers in a monotonic way, such that if a "previous" register decides v, future "registers" and associated quorums will decide v.
This is not a property of all consensus algorithms. Consensus algorithms don't need to behave this way. Paxos and derivatives behave this way, but I'm not sure Raft, etc. slide in as neatly (that would require a substantial analysis).
In particular, this generalization does not capture any nondeterministic algorithm, any algorithm tolerating Byzantine faults (or various other consensus problems), any non-quorum based algorithm, Nakamoto style (which is a byzantine consensus algorithm), Ben-Orr, etc. etc.
So, my question for you: what other algorithms do you think this paper generalizes, that are useful outside of the narrow scope of optimizing Paxos?
Not really, in fact it's something I've come to expect here. There's a loss in social status for appearing too enthusiastic about a new thing (that might not be so great) and if the new thing replaces a difficult-to-use old thing then the people who invested effort in understanding the old thing don't want to participate in making it obsolete.
Don't forget that people often just want to use an algorithm and are not necessarily that interested in the underlying mathematical structure, only in whether it works reliably and does a better job than the existing solution.
I'm very skeptical that many people will want this. Data misrepresentation is rarely
(never?) due to a fault in the persistence medium. Its usually a political reason, or human error that perverts consensus.
What? Consensus algorithms in this class are used everywhere in web programming, from MySQL to AWS. Ironic that your comment may very well have touched several consensus algorithms on its way to my eyes. The fact that you remain blissfully unaware of them is a testament to their power.
This has nothing to do with "data misrepresentation". It's about distributed consensus. It's not even really about storage, but about multiple parties agreeing on a single truth.
Great paper with an incredibly important result. It will take years to shake out in industry and implementations, but should dramatically improve situations where strong consistency is important.
Can you explain how? In many cases when a "generalized result" comes out there was already a generalized result that covered all but obscure corner cases.
This is the first paper i've seen to generalize consensus while only using immutable state.
That is, you can describe PaxOS, Raft, etc using the generalization given here.
The result is fairly understandable.
(Of course, as mentioned, Raft is trivial in her framework because there is only one thing deciding values, Raft is spending it's time in the leader election part to find that one thing)
Also, from the abstract, "the surprising result of this analysis is a substantial weakening to the quorum requirements of these widely studied algorithms."
The big thing here is that its analytical framework allows the authors to clearly analyze Paxos and some important variants and identify performance improvements (e.g. steps in the algorithm you can elide) that don't undermine the guarantees provided.
Making multiple computers all store the same thing is hard because they have to send messages between each other, and sometimes messages get lost.
Other people have tried to write down how they solved this problem, but their explanations were hard for adults to understand.
This person has tried a new way to solve this problem, and a new way to explain their solution that uses ideas that many big kids who go to big kid school to study computers can understand.
1) Heidi Howard’s papers are very fun to read. She’s done lots of work I’ve enjoyed esp wrt Raft
2) She touches on two things in this story’s paper that I first read in another paper of hers that are a joy to realize for two reasons a) She made a practical improvement in the already simple Raft protocol b) quorum != (obvious) majority - in (eg) Raft she identifies that quorum actually spans two states, which are
* leader election and
* “regular” mode with a leader and requisite # followers
Really cool result and a very readable paper. I'm struggling to figure out how this would be applicable to iterated consensus (i.e. logs). It seems like the extension examples are focused on reducing the message round trips for specific cases that don't matter when applied to muti-paxos or raft.
Twitter used to be a useful place for general distsys discussions, but I haven't been around for a few years now so I'm not certain how much still takes place.
This tweet, for example, I would have expected to see more people chime in on.
To inquire what avenues for theoretical discussion are currently active, I would suggest reaching out on Twitter to Chris Meiklejohn (cmeik), Michael Bernstein (mrb_bk) or Kyle Kingsbury (aphyr).
I believe you have to go through literature to get this kind of back-and-forth. At least, that's how I manage, even if that feels slow and opinionated.
Yes. In theory, maybe it's useful to have more flexibility to trade-off durability for performance (smaller quorum size hopefully reduces deciding latency [unless your small quorum contains the straggler!]) for specific kinds of data in the same replica group.
In practice though, it seems easier just to run separate instances of Paxos.
Forgive the tangent, but could quantum entanglement one day provide consensus with trivial (if expensive) complexity? E.g., nodes maintain a supply of preentangled particles, observing them at predetermined timestamps, and using that known shared randomness to make decisions?
Edit: ok after that sat there for a while I realized that generating shared pseudorandom bits would be just as effective.
If you abstract quantum-entangled particles to a communication channel with zero latency, you still have the consensus problem—you just run into the edge cases less frequently. So it would help, but consensus is still a fundamental challenge with parallel processes.
Given the handwaviness I might just be misunderstanding what you mean by "zero latency", so I just wanted to clarify: entangled particles do NOT enable FTL communication. Sorry if I am saying something you know, but others might misread your comment.
As an alternative to Paxos-related consensus algorithms, some projects may want to take a look at the Raft consensus protocol: https://www.brianstorti.com/raft/
Whereas Paxos tries to maintain a fully distributed consensus state, Raft instead proposes a protocol for automatically electing a leader node from a group of follower nodes, and coordinating the replication of the write log through that leader node.
I haven't had the pleasure of experimenting with either one though and would love to hear from people who have.
Raft is arguably more difficult to understand at a deep level. After extensive study, I've noticed that Raft hides all of its complexity under its leader election setup. There's nuance after nuance. With Paxos, the complexity is explicit and at the forefront, which is actually better, when trying to reason about a distributed system and its invariants.
It's not that simple. Here's an overview on paxos/raft consensus algorithms: https://vadosware.io/post/paxosmon-gotta-concensus-them-all/ and even that is incomplete and not necessarily describes trade-offs well. Submitted generalized solution is a better point to start experimenting from.
By the way, Paxos is the name of a Greek island; it doesn’t have funky capitalisation. The name is from Lamport’s paper “the part-time parliament” which describes the algorithm in the setting of Ancient Greece.
Exactly this, not sure why people think it's a different thing - just wrap nodes into a higher level of abstraction. I believe Raft might be a better approach to scalable consensus tho but there needs to be more layers to a network so the network can be more self-balancing and localised.
If we we call "heartbeat" the frequency at which a node is synced to the network (achieves consensus) - lower heartbeat nodes need to stay close only to nodes with a "heartbeat" faster than theirs and elect leaders in that group. The leaders in that group are also in a group of other leaders of that layer and listening to their leader, which belongs to a group above (like zooming out from a group and the group becoming a node in a larger graph).
In the end, there is only one state after all and the graph is the ultimate leader with the highest heartbeat. We wouldn't need many layers to cover all nodes, assuming laws of nature and all. Then just go a layer up/down if the network is getting too large, use a karmic system to position new/reactivated nodes. Seems scalable enough.
Sorry OP, it might be too early and I'm just thinking out loud in a comment box.
They do, effectively.
They show what is necessary for quorums, and show that a large number of algorithms meet these requirements.
Hence, the single leader election is not actually necessary (and this is one of their points - the existing algorithms require much stronger quorum guarantees than are actually necessary for correctness)
It took me months to do a reliable Raft implementation. The author of the Raft paper, Diego Ongaro, wrote in a comment on a Raft mailing list that it could take weeks.
The Raft paper is really easy to read and understand, but the actual code is very tricky.
To make it work with any kind of performance, all message passing needs to be asynchronous, and you need an event loop to process messages one at a time. Things are much simpler if your code is effectively single-threaded when processing an inbound message. That took me a while to figure out. Futures and promises are very helpful.
It took a while to get the building blocks together: the log file, which can be traversed, read, and rolled back/truncated (use a kv store for this, trust me), and the transport mechanism to deal with sending complex objects and handling errors and timeouts.
Tracking terms, states (leader, follower, candidate), rejecting out of date terms, voting, elections -- it's all very complex. You'll have to build some really, really good multi-threaded tests that simulate out-of-order and dropped messages, then really pound away at the system. And good logging to find that odd corner case that keeps coming up and throwing mysterious messages.
I did two implementations, one with plain text files and one with a kv store. For plain text files, I had to maintain file offsets of the records so I could traverse backward and truncate records that had to be replaced, and then break the files into segments so I could retire segments that are no longer needed. It had all kinds of little bugs.
Then I reimplemented using a kv store, and it took a tiny fraction of the time. And surprisingly had better performance.
Just create a good interface at the start, use the kv store underneath, and then later if you don't want the dependency swap it out for the more complicated code.
Not OP, but it is very easy to make mistakes while implementing it and end up with something that is not actually reliable. You have to have the paper opened and write code line by line looking at it. I implemented it a while ago so that's the most I remember, but it did a bit of fuzzing to nail down every bug.
No. Raft and Bitcoin's algorithms for agreeing on what the blockchain is are completely different.
AIUI, Bitcoin's "consensus" is just whoever has the longest valid blockchain is the winner. The algorithm tunes the difficulty s.t. it takes ~10 minutes to find a block, so the chance of two blockchains continuing to have the same length for any long amount of time is unlikely. That is, one chain will find the next block before the other, and in the meantime, be able to distribute that fact to the other nodes and thus "win".
> Honest generators only build onto a block (by referencing it in blocks they create) if it is the latest block in the longest valid chain. "Length" is calculated as total combined difficulty of that chain, not number of blocks, though this distinction is only important in the context of a few potential attacks.
No. Bitcoin uses its own protocol. Raft could not be used for a secure cryptocurrency bevause Raft does not handle byzantine faults[1]. It assumes nodes are not run by malicious actors and messages sent are unaltered in transit.
True. Though in tradeoff it has massively increased throughput. I'm always frustrated when I see blockchains recommended as a solution for a consensus problem in anything but a zero-trust scenario
Even in a zero trust scenario, you need a majority of nodes to be good faith, non-colluding actors. Even anarchy breaks down in the face of unequal power.
Bitcoin consensus is what is called Byzantine consensus - that is your consensus still has to work with some nodes actively trying to send bad information to mess up the protocol.
Raft and Paxos are non-Byzantine. They assume that the nodes are actively trying to follow the protocol though there me be such things as network partitions and crashes.
Bitcoin consensus is extremely simple: the chain with the most accumulated difficulty. Furthermore blocks aren't removed or changed, so consensus is in fact a monotonic process. You don't need either Paxos or Raft or any other sophisticated algorithm; just broadcast every block you believe to be part of the most difficult chain and consensus will be achieved.
Can we take a second to appreciate that this result will rapidly become the first thing taught in every single distributed systems class? And if this holds as a generalization of trustful distributed consensus as a field, then she has defined its Turing Machine equivalent. And it is even remarkably easy to understand! That is wild!!
Props to the authors. Personally, this is going on the top of my “to memorize” list.