
Getting real about distributed system reliability - boredandroid
http://blog.empathybox.com/post/19574936361/getting-real-about-distributed-system-reliability
======
dfc
Nice commentary by zooko:

[https://tahoe-lafs.org/pipermail/tahoe-
dev/2012-March/007185...](https://tahoe-lafs.org/pipermail/tahoe-
dev/2012-March/007185.html)

------
jleader
I really liked "These systems end up being large hunks of monitoring, tests,
and operational procedures with a teeny little distributed system strapped to
the back."

------
swah
Where is the flaw in the reasoning? Is it the dreaded Hadoop single points of
failure? No, it is far more fundamental than that: __the problem is the
assumption that failures are independent. __Surely no belief could possibly be
more counter to our own experience or just common sense than believing that
there is no correlation between failures of machines in a cluster.

