
Distributed systems for fun and profit (free ebook) - phiggy
http://book.mixu.net/distsys/
======
peterwwillis
You can get more practical information by reading the wikipedia page on
distributed computing. There's many different models of distributed systems
that this ebook doesn't touch on. For example, several of the systems i'm
familiar with don't require consensus, consistency or time ordering (or
they're resolved based on certain assumptions about the model, data,
application, etc)

~~~
mixu
Any chance you could be more specific as to what you feel is missing in the
book?

Granted, "distributed systems" is a enormous topic that no book can cover
fully, but I have tried to cover things like:

\- key papers (Lamport; Fischer, Lynch and Patterson; Chandra and Toueg etc.)

\- topics relevant to highly successful commercial systems (e.g. 2PC => *SQL
systems, Paxos => GFS/Chubby, ZAB => Zookeeper, Dynamo =>
Riak/Voldemort/Cassandra)

\- and recent topics such as CRDTs and the CALM theorem.

Having a sense of how time, consistency and fault tolerance have been
explained and handled is (I think) a prerequisite to more advanced topics, but
I'd be interested in hearing what parts you'd feel need improvement because
some day (~ some years from now) - I will revise the book and it would be nice
to have a solid list of issues to revise.

~~~
chubot
I'm not sure exactly what the grandparent comment meant, but I think I have an
idea. I only skimmed the contents so take this with a grain of salt.

Your book is focusing on a pretty narrow part of distributed computing. I
would rename it "Managing State in Distributed Systems", or "Distributed
Storage Systems". Your examples are Bigtable and Dynamo, which fall in this
category.

The book seems to be aimed at sort of a "beginning" audience. But the topics
are inappropriate for a beginning audience, and skewed for an expert audience.

Real distributed systems try to be stateless wherever possible. You need "big
computer science" to manage state in distributed systems, but most code in a
distributed system should not manage state. These techniques should be
confined to specialized storage systems.

Here are some examples of real world distributed systems that don't use the
described techniques to manage state:

    
    
      - clusters of stateless web servers + single master database (99%+ of websites people use)
      - message queue / work queue.  A single machine can productively manage 1,000 - 10,000 stateless workers, depending on the workload.
      - MapReduce
      - Original GFS
      - Napster
      - BitTorrent (tracker and trackerless would be interesting to write about)
      - BitCoin
    

The title seems to imply a practical bent, but it seems more like a collection
of ideas (which are important and interesting, but not really what engineers
need to know. IMO the #1 skill for distributed computing is to be competent at
BOTH programming a single computer and at system administration).

If I wanted to be harsh, I would say it looks like you read a bunch of stuff
and didn't work with it or implement it? At the very least, the ideas don't
seem to be put in the context of commonly deployed distributed systems.

People need to understand these simpler, more robust, and more performant
techniques, and how to apply them to their specific problem domain, rather
than blindly throwing consensus at every problem (which is a disturbing trend
I've seen).

~~~
throwawaykf03
It goes even beyond that. A lot of other very important, fundamental topics
belong under the umbrella of distributed systems, starting with routing. The
Internet is, after all, a giant distributed routing system.

Another topic that's huge all by itself is peer-to-peer networks, and all
their associated aspects, such as structured (DHTs like Chord, Cassandra,
etc.) vs unstructured (Gnutella, Kazaa, etc.), P2P search, handling churn,
handling peers with heterogenous capabilities, peer selection, topology
organization, decentralized routing, file-sharing (torrents) vs streaming
(PPLive, Spotify), etc.

Other topics (with several overlapping aspects) include:

\- Security, such as Sybil attacks, group key management, etc;

\- Overlay networks;

\- CDNs;

\- Ad hoc and mesh networks;

\- MMOs and multiplayer games;

\- SCADA and industrial control systems;

\- Pub/Sub systems and application layer multicast;

\- Distributed file systems;

\- Load balancing and bandwidth management;

And that's just off the top of my head... I'm sure I'm missing other important
topics.

~~~
mixu
Indeed, but ultimately covering all of those topics would require an
incredible amount of time and effort. So I need to pick and choose my battles
as some topics are more important or interesting to me than others. :)

~~~
throwawaykf03
Completely understand. But as chubot suggests, the topic of "Distributed
Systems" is really broad and something narrower in the title, such as
"Distributed Data Systems" may be more apt.

------
abofh
As a 'bottom up' interviewer, I'm often criticized for the same approach
you're taking. You might do well to start at the highest level (and pseudo
code) and work your way down to the clock.

In fact, I'd start with bad examples of temporal messaging and work down from
there: ("Debit 250", to "Debit account A 250", to "Debit account A for 250 at
timestamp Y", to "Debit account A for 250 at signed timestamp Y") -- showing
bad is often more effective than explaining good.

------
tantalor
The ePub chapter titles are "The third chapter", etc. Would be better if they
used the actual chapter titles, like "Time and order".

[http://i.imgur.com/T6YtJ0F.png](http://i.imgur.com/T6YtJ0F.png)

