Hacker News new | past | comments | ask | show | jobs | submit login

I'll have to admit that over time I've ended up with mixed feelings about this paper. This is mainly due to people reading this paper without knowing much about consensus and drawing conclusions like "Raft is better than Paxos" or "Raft is the best consensus algorithm" though. Some thoughts (please elaborate if you think I'm simplifying it too much or if you disagree with me!):

First of all remember that Paxos is a family of protocols for solving consensus. When doing research it's useful to reduce a problem into smaller and smaller parts. The "standard" Paxos algorithm is a very simple consensus algorithm which can only decide a single value once. It's not practical at all, but provides a good framework for thinking about consensus.

When this article proposes "Raft vs Paxos" they are actually comparing Raft against a standard way of configuring Paxos with a leader (MultiPaxos). Note that MultiPaxos allows a lot of nuances in the implementations while still being called "MultiPaxos". MultiPaxos is not a spec you implement; it's a set of ideas.

Raft on the other hand is a concrete protocol with well-defined, specified behavior. In fact, Raft is essentially an implementation of MultiPaxos[1]. This is a very good thing! Paxos provides a framework for thinking about consensus, while Raft puts some of these ideas into a concrete specification which is easy to implement. And it is a good point that we should make the knowledge in the field of consensus available for a wider audience. Yay, Raft is good!

And here comes the problem: A lot of people have read the Raft paper and made the conclusion that "Raft is the best way of solving consensus". Raft is (relatively) easy to implement and get started with and gives you a very simple model to program for (a log of commands), but it's far from a panacea.

The most important thing to know about Raft is that it's not performant (every command has to be sent to a single leader which becomes a bottleneck) nor scalable (every command needs to be processed by all nodes). Etcd supports "1000s of writes" and recommends up to "7 nodes".

This doesn't mean that Raft is bad; it's just a trade-off you need to be aware of. Simplicity vs performance. If you're integrating Raft into your stack and aim for scalability/performance you must always be very weary of when you use it. You should minimize writes at all costs. Unfortunately many developers gets the impression that you can just plug Raft into an existing system and suddenly have a performant and scalable distributed system.

A good example is CockrouchDB: They're using plain Raft for writes, but uses "leader leases" for scaling reads. Suddenly things become a lot more complicated (for instance see this issue about how leader leases are implemented in the Go library for Raft: https://github.com/hashicorp/raft/issues/108). I'm sorry, but you're going to have to get your hands dirty if you want something that's both fast and correct.

The end result is that you have two choices: (1) You can use a library which provides a simple model (a log of commands), but doesn't scale well or (2) you can use a more complicated consensus algorithm and then deal with all of the Hard Problems™ that comes with it. If you're going for the second option, you might as well take advantage of all of the research discovered in the last few years (see https://vadosware.io/post/paxosmon-gotta-concensus-them-all/)

It should also be noted that even though the consensus algorithm doesn't scale, it doesn't mean your system can't scale. Scalog (https://www.usenix.org/system/files/nsdi20-paper-ding.pdf) is an example of a system which uses a consensus algorithm in a constant way (i.e. regardless of load of the system). Once again: Focus on how you can avoiding using a consensus algorithm due to the way your system works.

TL;DR: Hard problems are still hard. Don't think that Raft is a magical piece of software which solves all of your problems.

[1]: There are some nuances between Raft and "standard" MultiPaxos as mentioned in https://arxiv.org/abs/2004.05074. I would still consider Raft to be in the same class as MultiPaxos (compared to other solutions of consensus).




Yours is an excellent comment, factual and even handed. As someone who had to delve into these things and had to implement e.g. a variant of MultiPaxos for work (this was before Raft existed), I agree that there is a bit of unhealthy confusion going around.

We generalize every day because everyone can't be expected to learn everything, so we abstract away, define constraints and best practices and we select our library-fied tools and apply them. We do this with operating systems, file systems, encryption, and also with distributed systems. The basics of distributed systems and their invariants and trade offs are not that hard if you give yourself time to study them properly, but when you want high performance and scale it does get Challenging.

It is important that any abstractions we make, any generalizations, constraint definitions and best practices that we expect people to read, accept and adhere to are as correct as possible without breaking that abstraction. So thank you for your comment. Please note that mine is not a pro-Paxos comment either, just one appreciating good information being spread so that people can make good choices and trade offs.

Distributed systems are hard. To paraphrase: In the data flow, no one can hear your canary msg scream.


FWIW, CockroachDB uses and co-maintains etcd/raft not hashicorp/raft.

https://github.com/etcd-io/etcd/tree/master/raft#raft-librar...

Users:

- cockroachdb A Scalable, Survivable, Strongly-Consistent SQL Database

- dgraph A Scalable, Distributed, Low Latency, High Throughput Graph Database

- etcd A distributed reliable key-value store

- tikv A Distributed transactional key value database powered by Rust and Raft

- swarmkit A toolkit for orchestrating distributed systems at any scale.

https://github.com/etcd-io/etcd/tree/master/raft#notable-use...

Go docs

https://pkg.go.dev/go.etcd.io/etcd/raft?tab=doc


Huh, I wasn't aware of that. Somehow I always thought that hashicorp/raft was used in etcd. Thanks for the correction!


If people are interested in studying a combination of Raft and SQLite, you could check out this Raft-based database I created:

https://github.com/rqlite/rqlite

Your point about "the impression that you can just plug Raft into an existing system and suddenly have a performant and scalable distributed system" is an excellent one. One the biggest misconceptions I come across is that rqlite is distributed for performance when it's actually all about reliability. In fact performance is significantly reduced versus just writing to SQLite itself.


You make super important points.

> First of all remember that Paxos is a family of protocols for solving consensus... Raft on the other hand is a concrete protocol with well-defined, specified behavior. In fact, Raft is essentially an implementation of MultiPaxos... You have two choices: (1) You can use a library which provides a simple model (a log of commands), but doesn't scale well or (2) You can use a more complicated consensus algorithm and then deal with all of the Hard Problems™ that comes with it.

At AWS [0], everyone (I spoke to) who worked on distributed consensus had this exact same opinion, so you're not at all off the mark.

> A good example is CockroachDB: They're using plain Raft for writes, but uses "leader leases" for scaling reads.

The Chubby paper by Google [1] goes in to excruciating details of running a production Paxos system.

> Focus on how you can avoiding using a consensus algorithm due to the way your system works.

Amazon SQS may be one such example: I'd presume, it scales by avoiding consensus, in a way, simply maintaining multiple copies [2] and by placing guard-rails around delivery [3][4], ingestion, and duration of storage [5].

[0] https://aws.amazon.com/builders-library/leader-election-in-d...

[1] https://blog.acolyer.org/2015/02/13/the-chubby-lock-service-...

[2] https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQS...

[3] https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQS...

[4] https://patents.google.com/patent/US10362131B1/en

[5] https://patents.google.com/patent/US8261286B1/en





Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: