
Ask HN: Recommendations for a book on Distributed Systems? - bogomipz
I wanted to ask what people have read and would recommend for a book(s) on Distributed Systems. Both the Coulouris book and the Tanenbaum book seem to be pretty standard but the reviews and ratings on these seem to be underwhelming:<p>https:&#x2F;&#x2F;www.amazon.com&#x2F;Distributed-Systems-Principles-Paradigms-2nd&#x2F;dp&#x2F;0132392275&#x2F;<p>https:&#x2F;&#x2F;www.amazon.com&#x2F;Distributed-Systems-Concepts-Design-5th&#x2F;dp&#x2F;0132143011
======
MediumD
It's not released yet, but I've been reading the early release version of
Designing Data-Intensive Applications by Martin Kleppmann
([http://shop.oreilly.com/product/0636920032175.do](http://shop.oreilly.com/product/0636920032175.do)).
I've found it pretty useful and well-written thus far. He does a good job of
explaining concepts and then tying them to real-world implementations and
examples. It's a good balance of theory and practical knowledge.

~~~
mariokostelac
This is definitely a must read. A very systematic approach of explaining
different concepts with real examples, problems each solution brings and
tradeoffs we do by choosing approach A vs approach B. Not sure if I can
recommend any technical book more than this one!

------
justicezyx
Reading all of Lamport's papers is a one-stop shop for the theoretical aspect
of distributed systems. He has a page [0] that collects a short comments on
his own papers, starting from that page to get a sense of the overview.

0\. [http://research.microsoft.com/en-
us/um/people/lamport/pubs/p...](http://research.microsoft.com/en-
us/um/people/lamport/pubs/pubs.html)

------
davidw
These are some decent notes:

[http://videlalvaro.github.io/2015/12/learning-about-
distribu...](http://videlalvaro.github.io/2015/12/learning-about-distributed-
systems.html)

[http://dancres.github.io/Pages/](http://dancres.github.io/Pages/)

[http://book.mixu.net/distsys/single-
page.html](http://book.mixu.net/distsys/single-page.html)

There does not seem to be a lot out there that bridges the gap between very
theoretical papers and real world usage. There's also a lot of handwaving
about how "distributed systems are hard!", which is absolutely not telling me
something I did not already know.

------
filereaper
We used Coulouris in our Distributed Systems course back in 2010, it covers
all the fundamentals used in today's modern systems.

Things like ring quorums are implemented by Cassandra and other systems, I
just didn't know about them (ie Cassandra et al...) to put a face to the name
back then.

One thing to do is to start reading papers, and branch out from there. There's
a few major ones like the Map-Reduce Paper, BigTable, Dremel, Raft, perhaps
Paxos etc... and use the citations to then deepen your understanding.

Also there's plenty of people here who can always answer questions. :)

Cheers.

~~~
_dark_matter_
I concur with reading papers - we never used textbooks. I've linked to the
course readings in [0].

[0]
[http://pages.cs.wisc.edu/~swift/classes/cs739-fa14/wiki/pmwi...](http://pages.cs.wisc.edu/~swift/classes/cs739-fa14/wiki/pmwiki.php/Main/ReadingList)

------
sergiotapia
I recommend The Little Elixir & OTP Guidebook.

[https://www.manning.com/books/the-little-elixir-and-otp-
guid...](https://www.manning.com/books/the-little-elixir-and-otp-guidebook)

It's easier to build distributed apps with Elixir because of Erlang's OTP.
Check it out, I'm sure it'll be interesting for you.

~~~
davidw
Erlang (and Elixir) are great systems, but they're not a magic bullet - you
still need to know what you're doing when designing a distributed system.

~~~
macintux
Agreed. They give you much more powerful concurrency primitives than most
languages, but there are still a host of challenges to overcome.

------
ubercow
I really enjoyed Distributed Systems for Fun and Profit. [0]

0: [http://book.mixu.net/distsys/single-
page.html](http://book.mixu.net/distsys/single-page.html)

------
adamnemecek
To add to what other's have said I've found "Designing for Scalability with
Erlang/OTP: Implement Robust, Fault-Tolerant Systems"
[https://www.amazon.com/Designing-Scalability-Erlang-OTP-
Faul...](https://www.amazon.com/Designing-Scalability-Erlang-OTP-Fault-
Tolerant/dp/1449320732/ref=as_li_ss_tl?_encoding=UTF8&psc=1&refRID=8KVM0G4T0EQR2C9XH2F9&sa-
no-
redirect=1&linkCode=ll1&tag=adamnemecek03-20&linkId=f33dc06050b9ce619e48627a26b28df4)
to pretty useful even though I don't care about Erlang (that much), because
the last 4 chapters are about designing scalable and fault tolerant
distributed systems in generic terms that apply even to non-Erlang systems.

Also as a side note, does anyone know why are the Amazon categories so bad?
[http://prntscr.com/dr3itj](http://prntscr.com/dr3itj) This book is classified
in "javascript".

You might also find this book helpful [https://www.amazon.com/Systems-
Performance-Enterprise-Brenda...](https://www.amazon.com/Systems-Performance-
Enterprise-Brendan-Gregg/dp/0133390098/ref=as_li_ss_tl?sa-no-
redirect=1&linkCode=ll1&tag=adamnemecek03-20&linkId=b8efbf8af9791d145623c92ebf3711e8)

since it's essentially about how to perf measure, and debug computers in the
cloud which is the annoying part of distributed systems.

Also you should keep in mind that there are some books that are specifically
about doing distributed systems using a particular framework e.g. this one on
using Akka on JVM [https://www.amazon.com/Akka-Action-Raymond-
Roestenburg/dp/16...](https://www.amazon.com/Akka-Action-Raymond-
Roestenburg/dp/1617291013/ref=as_li_ss_tl?s=books&ie=UTF8&qid=1483469962&sr=1-1&keywords=akka+scala&linkCode=ll1&tag=adamnemecek03-20&linkId=0e5d32cc30b59064afe99c4fd59fb035)

------
aduffy
Not exactly a book, but Yale CS hosts a set of notes from a course of theirs,
almost 500 pages.

[http://cs-www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf](http://cs-
www.cs.yale.edu/homes/aspnes/classes/465/notes.pdf)

------
macintux
Haven't updated this in quite a while, but here's a list of reading lists +
relevant books.

[https://gist.github.com/macintux/6227368](https://gist.github.com/macintux/6227368)

------
davidrupp
I like [https://mitpress.mit.edu/books/distributed-
algorithms](https://mitpress.mit.edu/books/distributed-algorithms).

------
morazow
I would suggest "Introduction to Reliable and Secure Distributed Programming"
([https://www.amazon.de/Introduction-Reliable-Secure-
Distribut...](https://www.amazon.de/Introduction-Reliable-Secure-Distributed-
Programming/dp/3642152597/)).

I never read the book, but took a course with author based on this book. It
was fun and interesting; covering both basic and advanced concepts in
distributed systems.

~~~
johndubchak
I have the book and am currently reading it. I like it, seems very thorough.
The first part is fairly theoretical and the 2nd part gets into the more
practical implementation of distributed systems based on the previous theory
covered.

I'd recommend it.

------
sangaya
This was my favorite back in 2008. If I can recall accurately the fundamentals
it covers should still apply.

Andrew S. Tanenbaum and 1 more Distributed Systems: Principles and Paradigms
(2nd Edition) 2nd Edition ISBN-13: 978-0132392273, ISBN-10: 0132392275

------
kartiksura
I find redis-cluster spec one of the simplest ways to explain clustering, HA,
leader election. [https://redis.io/topics/cluster-
spec](https://redis.io/topics/cluster-spec)

------
panzagl
Reviews for textbooks (which these are) are always underwhelming due to the
uneven quality of CS classes and students.

~~~
bogomipz
But there are books like CLRS and SICP that are also text books that are
almost universally recommended.

~~~
panzagl
24% of the Amazon reviews for SICP are for one star- I probably should have
specified 'popular reviews' as obviously the reviews in professional journals,
etc will be different than a place like Amazon where you have undergrads
ranting about their abysmal distributed systems class and the $100 book the
professor never used.

------
contingencies
pacemaker/corosync documentation @
[http://clusterlabs.org/](http://clusterlabs.org/)

nontrivial distributed systems are like crypto: don't roll your own primitives
unless you are a masochist, fetishist, time-rich or forced to.

