
Paxos in 25 Lines - Cieplak
http://nil.csail.mit.edu/6.824/2015/notes/paxos-code.html
======
GeneralMayhem
This is missing what to me is the most important part of the algorithm: a
quorum of acceptors must propagate writes to the learners. With just what's
shown here, you're not tolerant to network partitions that cause a subset of
the "accept" messages to be lost.

That process can of course be optimized in a number of ways that drastically
cut down on the network overhead as compared to the naive MxN write pattern,
but what's written here is not safe on its own.

~~~
jakewins
Mostly unrelated, but a fun fact about quorums that I enjoy noting whenever I
can because it still seems under-explored: A quorum != a majority. Currently
most (all?) production implementations I've seen of RAFT and the various
Paxoses use "majority" as the quorum algorithm, so the two get mostly
conflated.

In my layman understanding: Given a set, a quorum is some method to choosing a
sub set, such that any two such sub sets will always have at least one
overlapping member.

Majority is one quorum algorithm - given a set [A,B,C], the majorities are:
[A,B,C], [A,B], [A,C] and [B,C]. Any two of those sets will have at least one
member overlapping.

However, majority is somewhat wasteful, because the latency of these quorum-
based algorithms are almost always bound by the slowest member of the quorum -
the more machines you need to wait for, the more likely one of them will be
outlier-slow.

You'd potentially be better off choosing a quorum algorithm that requires less
than a majority - because that'd mean, in the best case, fewer responses to
wait for, lowering the probability that one of those members will be very
slow. There are drawbacks to this - it makes fault tolerance and provisioning
harder to calculate - but it's got some cool potential benefits.

Some cool ones to explore here:
[https://pdfs.semanticscholar.org/a243/7f18205414f6398b29c4f8...](https://pdfs.semanticscholar.org/a243/7f18205414f6398b29c4f86d59d76f27a200.pdf)

~~~
elvinyung
I would argue that by the time you've chosen Paxos or some other majority
quorum commit protocol, you're already well aware that you're building a CP
system, and that availability and latency aren't your main concern. A majority
quorum is basically the most obvious (and somewhat brute force) way of
providing serializable consistency in the system.

The one non-majority quorum commit protocol that most people are probably
already familiar with is the "sloppy quorum" replication in Dynamo systems[1]
(e.g. Cassandra, Riak, Voldemort, etc.). Basically, since the quorum is
configurable on a per-cluster basis instead of being inherent to the protocol,
and usually isn't a majority of the cluster, the system can still make
progress when half of the nodes are unreachable. (But of course, as the paper
notes, this means that you need to resolve conflicts some other way, which
adds a whole bunch of complexity.)

1: [http://www.allthingsdistributed.com/files/amazon-dynamo-
sosp...](http://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf)

~~~
dllthomas
> you're already well aware that you're building a CP system, and that
> availability and latency aren't your main concern

Assuming you've chosen correctly between CP and AP approaches, this tells us
that availability and latency aren't as important as consistency. But there's
nothing that says they aren't arbitrarily close...

~~~
elvinyung
Yeah, definitely -- I agree that the decision doesn't mean to just to blindly
throw away availability optimizations once you've decided that consistency is
important.

Actually, invoking CAP probably didn't add to my message. What I meant to say
is that people don't talk about non-majority quorum commits that much because
the interesting part is that the serializability comes with majority/
_overlapping_ quorums.

~~~
dllthomas
As I read it, the comment you were replying to was still restricting its
discussion to overlapping quorums, and merely pointing out that that's not
actually synonymous with majority.

~~~
elvinyung
Fair enough :) I guess my mind latched onto the "it still seems under-
explored" part and wanted to try and respond to that.

------
elvinyung
I realize this is pseudocode, but I still feel like the bigger challenge is
not in implementing a theoretically correct Paxos, but a production-ready one.
It's probably pretty well-known the Chubby[1] team's experiences dealing with
unexpected complexity from using Paxos in production.

A choice quote: "While Paxos can be described with a page of pseudo-code, our
complete implementation contains several thousand lines of C++ code."

1:
[https://static.googleusercontent.com/media/research.google.c...](https://static.googleusercontent.com/media/research.google.com/en//archive/paxos_made_live.pdf)

~~~
bastawhiz
When I hear about these algorithms taking many thousands of lines of code in a
"low-level" language like C or C++, I wonder how much of that could be
simplified away if you didn't need to manually manage memory. Performance
aside, how much of those "several thousand lines" would be unnecessary in a
higher-level language?

I implemented Raft in a couple hundred lines of succinct JavaScript a few
years ago. I can only imagine someone smarter than me could write a
production-ready Paxos implementation in less than a thousand well-commented
lines of JavaScript or Python.

~~~
elvinyung
> I implemented Raft in a couple hundred lines of succinct JavaScript

But is it production-ready? :)

None of the extra complications described in the paper were inherent to C/C++.
It covered things like leader leases, log compaction, handling disk
corruption, and group membership changes -- optimizations that weren't
intrinsic to Paxos itself, but still crucial for running it in production.

Another choice quote from the paper: "There are significant gaps between the
description of the Paxos algorithm and the needs of a real-world system. In
order to build a real-world system, an expert needs to use numerous ideas
scattered in the literature and make several relatively small protocol
extensions."

Also, a random data point: etcd's Raft implementation stands at about 4000
lines of Go right now, not including tests.

~~~
bastawhiz
> But is it production-ready? :)

Production-ready enough for my use case ;)

I also didn't mention Go in my post because--despite having managed memory--
it's syntactically very long. Not a complaint, but all of the Go code I've
seen and written tends to be "taller and skinnier" (less dense?) than the code
I've seen and written in other languages like Scala or Python.

------
TuringTest
[https://en.wikipedia.org/wiki/Paxos_(computer_science)](https://en.wikipedia.org/wiki/Paxos_\(computer_science\))

------
bfung

        1	proposer(v):
        2    while not decided:
        2	    choose n, unique and higher than any n seen so far
    

26 lines.

It's pseudocode, so not really only 26 lines as it needs some more supporting
functions to "choose n, unique and..." and other stuff to make setting
variable states atomic.

Good way to explain the algo though.

~~~
cortesoft
Number of lines is the most ridiculous metric anyway. Most languages have no
line length limit, just replace all newlines with semicolons, and you have a
one line program!

~~~
d0vs
> Most languages have no line length limit

Some languages do?

~~~
porges
("Free-format") Fortran has a max line length of 132 chars, up from ("fixed-
format") 72 chars on punch cards.

------
tromp
Curiously, Paxos takes exactly as many lines as a self contained interpreter
for a pure functional programming language, written in C :-)

[http://www.ioccc.org/2012/tromp/tromp.c](http://www.ioccc.org/2012/tromp/tromp.c)

[http://www.ioccc.org/2012/tromp/hint.html](http://www.ioccc.org/2012/tromp/hint.html)

------
gosubpl
For me this shows the difference between theoretical setting and what you
would want to do in practice. I have been following 6.824 (where this is
sourced from), to learn something about distributed systems programming and it
was great fun to shed a lot of figurative sweat to convert those 26 (actually)
lines into working "production" code. Hundreds lines of code, because in real-
life we have packet loss, network partitions, etc. But the pseudo-code in the
link itself is correct, however, it doesn't tell the whole story.

Now I am repeating that experience, as Akka project contributor (
[http://akka.io/news/2017/03/17/akka-2.5.0-RC1-released.html](http://akka.io/news/2017/03/17/akka-2.5.0-RC1-released.html)
) on getting delta-CRDTs into Akka. And again - what was a few lines of
pseudo-code in the original paper, or even tens of lines of real code but in
some ideal setting ( [https://github.com/CBaquero/delta-enabled-
crdts](https://github.com/CBaquero/delta-enabled-crdts) ) is becoming
literally thousands lines of "production grade" code.

Finally - I wholeheartedly recommend the 6.824 course to anyone interested in
distributed systems. Even if you don't like strong consistency, you'll learn a
lot about testing and debugging distributed systems, the knowledge you can re-
use later in your career.

------
Maro
Here's a version I wrote in C++ for ScalienDB about 5-6 years ago, this
startup has since folded so it's dead code:

[https://github.com/scalien/scaliendb/tree/master/src/Framewo...](https://github.com/scalien/scaliendb/tree/master/src/Framework/Replication)

Paxos: for replicating data

PaxosLease: for negotiating a lease, eg. leader

Quorum: pluggable "majority" rules, not that important

ReplicatedLog: use Paxos for each append, initiated by leader

~~~
ww520
Pretty nice code.

------
barhun
everyone, please, read the following blog post before using any 'wow!'s in
your next exclamation:

RAFT Explained – Part 1/3: Introduction to the Consensus Problem
[http://container-solutions.com/raft-explained-part-1-the-
con...](http://container-solutions.com/raft-explained-part-1-the-consenus-
problem/)

"While Paxos can be described with a page of pseudo-code, our complete
implementation contains several thousand lines of C++ code."

------
jlebrech
I miss numbed code, cba scrolling up to a line and doing: end, return, space
space, delete, enter, mouse, space space, end.

just type 8.5: code here

(float to insert between lines)

also no nesting.

then run a processor like go-fmt that checks the format for you.

and use the directory structure for class and methods, directory is a class,
and a filename is a method.

------
jdnier
Or if you're interested in the opposite of pseudocode, here's a TLA+ spec for
Paxos.

Related to yesterday's TLA+ video post
[https://news.ycombinator.com/item?id=13918648](https://news.ycombinator.com/item?id=13918648)

~~~
algorithmsRcool
You forgot to include a link to the spec

~~~
jdnier
Ugh, so I did. Here's the link:
[https://github.com/bringhurst/tlaplus/blob/master/examples/P...](https://github.com/bringhurst/tlaplus/blob/master/examples/Paxos/MCPaxos.tla)

------
krat0sprakhar
Interesting! This should be renamed to Paxos Pseudo-code in 25 lines.

------
nojvek
Can someone explain Paxos to layman. What is it even supposed to do?

------
hubert123
This is what I mean when I tend to say that all scientific papers should have
a minimal reproducable working sample with instructions attached. Lets say I
am interested in dam building with turbines and all its glory: One would
assume that this is really complex cross cutting tech, but I still firmly
believe that if you cant show me how to build a tiny sample dam that powers my
mobile phone or my computer, you havent done your part to make your theory
sufficiently reproduceable.

------
dpc_pw
Here's one in 1 line:

run_paxos()

~~~
mickronome
Very succinctly put, regardless if it is a function call or a builtin
statement :)

I have used something similar to defuse endless arguments about which language
is more expressive, or better, and turn it into a more productive discourse. I
simply make a tentative assertion that there is a perfect language for every
problem, one where only one line of code is needed to solve the problem, it
reads as follows: doit

Then I follow up with stating that the language is probably rather useless for
anything else.

I don't know why it usually works to open up the discussion, it seems to me as
such a trivial and obvious observation, but apparently the perspective is
something many rarely come to observe without prompting.

I'm well aware that 'doit' can't really be considered to be a language, except
in a very limited sense, it can also simply be a function call, which maybe
helps to bring into focus the intersection between language, libraries and
their relative applicability to the task that needs solving, and the
environment it must be solved in.

Trivial, obvious but somehow deeply at the heart of writing the correct code
to solve a particular problem, because everything is a tradeoff somewhere
between extremes.

