
What We Talk About When We Talk About Distributed Systems - ra7
http://videlalvaro.github.io/2015/12/learning-about-distributed-systems.html
======
maglite77
Also, remember that even building a "simple" web app (comprised of a browser
(client), web server, and database) counts as a distributed system. Concerns
over event ordering, lack of global time, and data consistency models are all
alive and well even within boring LOB systems.

To put it another way, distributed system does not necessarily imply a massive
research cluster connected via MPI.

------
pron
A nice survey of the topic can be found in _Total Order Broadcast and
Multicast Algorithms: Taxonomy and Survey_ , Xavier Défago, André Schiper,
Péter Urbán, 2004[1]. Interestingly, the unexpected fame of the CAP theorem
and especially its various misinterpretations has done some harm to this field
of distributed systems when people started believing that eventual consistency
is the only way to achieve the scalability requirements of their applications.

On the JVM, the best group communication infrastructure is the terrific
JGroups library[2]. On .NET (and soon the JVM, too) there's Vsync (Isis 2)[3].
For C/C++ (with Java and Python binding), there's Spread[4]. All three are
heavily inspired by the work of (or, in the case of Vsync/ISIS, directly
created by) Ken Birman of Cornell, and implement group membership and leader
election, atomic multicast, failure detection and more.

[1]:
[http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102....](http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.3146)

[2]: [http://jgroups.org/](http://jgroups.org/)

[3]: [http://vsync.codeplex.com/](http://vsync.codeplex.com/)

[4]: [http://www.spread.org/](http://www.spread.org/)

~~~
monocasa
I can't speak for JGroups (definitely going to check that out tonight), but
spread and Vsync/ISIS both struck me as perfect examples of the meme of poor
quality academic code.

Spread is an #ifdef soup containing scary stuff like "#ifndef _REENTRANT" in
their hello world examples. Additionally it has a weird not quite BSD licence
that doesn't look compatible with the GPL. It also makes you send them either
an ssh key to checkout from subversion, or send them email/name/company info
just to get the download link working.

Vsync/ISIS is a single 50KLOC C# file that spawns upwards of a dozen threads
on init.

IMHO there's room for someone with more of a software engineering background
to come in and clean them up (assuming that this person/group knows enough
about distributed systems to clear things up without introducing more
problems).

~~~
yid
> spread and Vsync/ISIS both struck me as perfect examples of the meme of poor
> quality academic code.

I was about to say that academic code is almost always poor quality, but that
should never, ever apply to _distributed systems_ code. If there's one thing
we should see from academic distributed systems research, it's principled,
well-constructed, solidly tested libraries, rather than wannabe commercial
databases written by graduate students.

------
gamapuna
[http://nil.csail.mit.edu/6.824/2015/schedule.html](http://nil.csail.mit.edu/6.824/2015/schedule.html)
MIT distributed systems course is pretty good IMO.

~~~
rajathagasthya
Do you know if they have lecture videos for this course?

~~~
v1ct0r
Cloud Computing Concept on Cousera is really good. I've learned a lot from
there.

~~~
rajathagasthya
Yes, it's really good too.

------
jamesblonde
Much of the theory of Distributed systems is published at Principles of
Distributed Systems (PODC). However, a lot of emphasis there is on the
asynchronous model. Results that hold for the asynchronous model generally
hold for other models. The main point a layman needs to know about the
asynchronous model is that a slow node cannot be distinguished from a dead
node. Implementers of distributed systems will generally ignore most results
on the asynchronous model. A slow node might as well be a dead node, so we can
use heartbeats and timeouts to exclude these nodes. Typically, robust
distributed systems in industry will follow some form of partially synchronous
model, adapted to the type of environment it works in: LAN on baremetal =>
reliable, almost synchronous when not overloaded; WAN (Internet) => as close
to the asynchronous model as is practical.

------
jlas
I appreciate the Carver reference
[https://en.wikipedia.org/wiki/What_We_Talk_About_When_We_Tal...](https://en.wikipedia.org/wiki/What_We_Talk_About_When_We_Talk_About_Love)

~~~
jessaustin
At this point it might be more of a reference to _Birdman_.

~~~
cdcarter
I came to start making the references via Murakami, personally.

------
buckbova
I enjoy reading on distributed systems, but does anyone have suggestions for
practical blogs/books/articles on the subject with directly applicable
information?

~~~
thecage411
Distributed Systems for Fun and Profit is pretty good:
[http://book.mixu.net/distsys/single-
page.html](http://book.mixu.net/distsys/single-page.html).

Focuses on theory results that are applicable to real systems and analyzes
those systems.

------
davidw
Nice to see this, and completely agree with the sentiment. There is a lot of
academic stuff, which is great but not always easy to evaluate or apply, and
there is also a lot of harrumphing about how distributed systems are really
hard without going into much in the way of details about how to actually
evaluate and approach the problem.

~~~
sbilstein
Perhaps one of the consequences of getting information about programming
topics mostly through twitter or other quick one take sources ;)

More seriously, when I started digging into distributed systems I just started
picking up papers and reading and asking questions. It was definitely a
meandering path until I synthesized, maybe after 1 year, what it was all
about. Reading Paxos one week and DyanmoDB the next week made it tough to see
the bigger picture. I think this survey post would have been really helpful to
me in the past.

~~~
davidw
> It was definitely a meandering path

Which is fine, but not everyone has the time or inclination for that.
Sometimes academic papers leave out lots of important practical details, or
don't make a good connection with how a system might look in the real world.

I think there's some room for a few good books that fill that space, although
perhaps they exist and I'm not aware of them.

------
Myrmornis
I have a question, hopefully it doesn't sound too stupid.

Basically my question is: do you have any recommendations for reading about
"concurrency" but at a higher level than "concurrent programming" and less
formal than "distributed systems" algorithms? Here's an attempt at explaining
what I mean:

Suppose I worked for a company which mostly writes web apps in python. They
don't write much multi-threaded code. They don't write native desktop
graphical apps or do "systems programming". So books about "concurrent
programming" with threads, or in go/scala/clojure/erlang etc seem,
superficially, slightly irrelevant. Also, books about formal distributed
systems algorithms seem, superficially, slightly irrelevant. But "concurrency"
is still highly relevant! It's just at a higher level than is meant by
"concurrency" in a programming language and it's not as formal or regular as
is usually meant by "distributed systems".

What I mean is: the company handles concurrent HTTP requests resulting in
concurrent database transactions; celery tasks might also be running at the
same time; message-queue type problems come up frequently; they are
increasingly tending towards a "service-oriented" architecture which is
inherently concurrent; and, at the end of the day, all the higher-level
"business logic" is some sort of emergent property of a complex, and
concurrent, underlying system of apps communicating via HTTP, with
asynchronous background tasks, etc. Wise senior programmers say: "Oh, you're
going to need to apply some row-locking there.". Wide-eyed junior programmers
say: "Oh yes, of course" and wonder, "What should I be reading to know this?".

What should they be reading?

Also, how the hell do people diagram this stuff in order to talk about it?
Google searches tend to end up either in UML languages that seem to be
unfashionable (but might in fact be just the right thing?) or in academic CS
papers that seem to be uninfluential. But there must be something better than
those pessimistic paragraphs from No Silver Bullet:

> Software is invisible and unvisualizable... The reality of software is not
> inherently embedded in space... As soon as we attempt to diagram software
> structure, we find it to constitute not one, but several, general directed
> graphs superimposed one upon another. The several graphs may represent the
> flow of control, the flow of data, patterns of dependency, time sequence,
> name-space relationships... In spite of progress in restricting and
> simplifying the structures of software, they remain inherently
> unvisualizable, and thus do not permit the mind to use some of its most
> powerful conceptual tools. This lack not only impedes the process of design
> within one mind, it severely hinders communication among minds.

~~~
pron
Sometimes, in order to understand the higher level concepts to a satisfying
degree, there is no choice other than to learn and understand the low-level
stuff. I think _Java Concurrency in Practice_ has everything you need (even if
you're not using Java). It is aimed at the practitioner, but covers all the
necessary background.

