
Distributed systems theory for the distributed systems engineer - shakkhar
http://the-paper-trail.org/blog/distributed-systems-theory-for-the-distributed-systems-engineer/
======
catwell
For people not already into distributed systems but want to get started, I
blogged my (very) short list of things to read last year [1].

Today I would add a fifth item to that list: "Why Logical Clocks are Easy",
which is one of the best explanations of causality I have seen so far [2].

[1] [https://blog.separateconcerns.com/2015-07-07-four-easy-
reads...](https://blog.separateconcerns.com/2015-07-07-four-easy-reads-
distsys.html)

[2]
[http://queue.acm.org/detail.cfm?id=2917756](http://queue.acm.org/detail.cfm?id=2917756)

~~~
hyperpape
Then add it ;)

We can't keep everything we ever write up to date, but there's little point in
reading someone's 2015 list of what to read when they've decided there's a
good addition in 2016.

~~~
catwell
You are right, I will.

------
Jupe
Practially speaking, I've learned much more from Aphyr's Jepsen Test framework
(and write-ups about test results) than from any other single source.

Ref: [https://aphyr.com/tags/Jepsen](https://aphyr.com/tags/Jepsen)

------
krat0sprakhar
I recently took a distributed systems course
([https://roxanageambasu.github.io/ds2-class/](https://roxanageambasu.github.io/ds2-class/))
in school and our professor referred us to Prof Steve Gribble's videos which,
IMHO, are extremely informative and fun to listen to.

Couldn't recommend it more -
[http://courses.cs.washington.edu/courses/csep552/13sp/video/](http://courses.cs.washington.edu/courses/csep552/13sp/video/)

Class Webpage -
[http://courses.cs.washington.edu/courses/csep552/13sp/](http://courses.cs.washington.edu/courses/csep552/13sp/)

------
hyperpape
I'm not sure how it would read to someone who hasn't been previously reading
anything on the subject, but I like aphyr's notes on a two day course on
distributed systems as a high level overview of the topics involved:
[https://github.com/aphyr/distsys-class](https://github.com/aphyr/distsys-
class).

------
marinabercea
Great timing and submission, thank you for posting! I've been meaning to get
more in depth knowledge on distributed systems, but despite having access to
several academic (text)books, I felt overwhelmed and didn't know where to
start exactly and what sub-topics I might want to focus on.

Just downloaded and sent to Kindle 'Distributed Systems for Fun and Profit' as
a free PDF written by an engineer currently working for Stripe, a book
recommended in the article. It's only 62 pages and doesn't feel intimidating!

------
sciurus
I'm looking forward to the publication of Martin Kleppmann's book Designing
Data-Intensive Applications.

[http://shop.oreilly.com/product/0636920032175.do?sortby=publ...](http://shop.oreilly.com/product/0636920032175.do?sortby=publicationDate)

~~~
ibash
Join safari books online and start reading it -- it's good.

~~~
nazgob
I want to but I never read incomplete books. And its still missing few
chapters.

------
davidw
> But I’ve come to thinking that recommending a ton of theoretical papers is
> often precisely the wrong way to go about learning distributed systems
> theory (unless you are in a PhD program). Papers are usually deep, usually
> complex, and require both serious study, and usually significant experience
> to glean their important contributions and to place them in context. What
> good is requiring that level of expertise of engineers?

Bingo! We need some "O'Reilly style" distributed systems material. Most of us
are not going to be designing new algorithms, but plugging in various pieces.
Having a generic understanding of those pieces and where they work well, and
when to actually go to the research are kind of missing right now in that
world.

Some other links that people might find interesting:

[http://videlalvaro.github.io/2015/12/learning-about-
distribu...](http://videlalvaro.github.io/2015/12/learning-about-distributed-
systems.html)

[http://book.mixu.net/distsys/single-
page.html](http://book.mixu.net/distsys/single-page.html)

[http://dancres.github.io/Pages/](http://dancres.github.io/Pages/)

~~~
collyw
Pretty much the same as most aspects of IT. How often does anyone write their
won sort routine? How often does that sort of thing get asked in interviews?

------
craigching
Probably one of the more used books (by universities) on the topic is
"Distributed Systems: Principles and Paradigms" by Tanenbaum and van Steen. I
just finished a class that used this book and I understand that there are
criticisms of it, but it did seem to me to be reasonable given the breadth of
the subject. And most, if not all, of those papers are covered to some degree
in this book.

Something I'm looking forward to, Pearson has returned the copyrights of the
book to the authors and they are supposedly updating it. Could be interesting:
[http://www.distributed-
systems.net/index.php?id=distributed-...](http://www.distributed-
systems.net/index.php?id=distributed-systems-principles-and-paradigms)

The main web site says the 3rd edition is nearing completion.

------
johnbender
Can anyone familiar with the linked material comment on whether there is a
standard model used in the proofs there and in the DS literature?

I'm thinking of something like Lamport's global time model from "On
interprocess communication".

~~~
einarvollset
No there is not.

------
dschiptsov
MIT biology courses teaches very fine distributed systems theory.)

------
rollulus
> Gwen Shapira, SA superstar and now full-time engineer at Cloudera [...]

Gwen is at Confluent, the Kafka company. Doing a great job there!

~~~
kod
The post in the OP is from 2014

------
einarvollset
(Before you down vote: I have a PhD in distributed systems and fault
tolerance. Okay, now you can down vote for the duchebaggery of this prescript)

I think a fundamental and very underrated paper and concept (which actually
predates Paxos, yet Lamport ignored or was unaware of) is the notion of
randomized consensus protocols. Simpler than "structured" leader type
algorithms. Believe Ben Or's algorithm was first.

~~~
mjb
> Believe Ben Or's algorithm was first.

Ben-Or's "Another Advantage of Free Choice" beat Rabin's "Randomized Byzantine
Generals" by a couple of months in 1983. These algorithms show how much people
over-extend results like FLP. The result is about a very particular system
model, and the addition of even a very tiny extra piece (in Ben-Or's case, a
random oracle) makes the consensus problem possible again.

I wouldn't say that these algorithms were really ignored by Lamport when he
wrote the Paxos paper. Again, they're solving a different problem in a
different system model. If you want to pick on Lamport, talk about Liskov's
Viewstamped Replication.

If anybody has a digital copy of Ben-Or's paper that isn't partially cut off,
please make it available. Both the copy in the ACM library and the only copy
the author himself has are missing some of the right hand side.

~~~
einarvollset
I disagree - an ex-colleague at Cornell wrote a paper proving equivalence.
Will have to dig that up..

