
Call me maybe: final thoughts - llambda
http://aphyr.com/posts/286-call-me-maybe-final-thoughts
======
ChuckMcM
So a long time ago, in a galaxy far far away, when the network was purportedly
the computer, these questions kept me awake at night.

When Remote Procedure Calls were a new thing technologists got into an
interesting debate about whether or not you could create a distributed
infrastructure which had the same semantics as local execution. It was after
all the holy grail to be able to write code and them seamlessly pull it apart
into as many pieces as you wanted for scaling (caveat Amdahl's law).

There were great flame wars over something called "At most once" semantics,
where a remote procedure call would either succeed or fail, and if it
succeeded it would have done so exactly one time. And marshalling between
systems, with "network canonical order" vs "receiver makes it right" order for
bit fields, integers, floats, or strings.

I have a lot of scars from those days.

It left me with one really strong belief though. That was that to reason about
distributed systems _at all_ you had to be ruthlessly simple in your designs.
You have to be able to know exactly what the lowermost piece of the puzzle is
going to do before you can reason about what pieces that depend on that may or
may not do. And at each layer every alternative cuts like a scalpel and make
reasoning about the next layer that much harder.

~~~
rdtsc
> So a long time ago, in a galaxy far far away, when the network was
> purportedly the computer, these questions kept me awake at night.

Interesting. That was a bit before "my time". It was just passed down as folk
wisdom -- "never make the network transparent, it cannot be done easily" and
never quite knew where that came from.

I remember asking a senior developer about RPC after seeing a a bunch of RPC
daemons on a Sun machine, and he said, it is some really complicated stuff
that you don't really want to know about and unless you absolutely need to use
it. I was just a young intern then.

But there are a whole bunch of technologies from that time that try to
abstract the network away (I can think of NFS, or I guess pretty much any
remote storage that makes itself appear as a local file system).

Then I've seen RPC API philosophies that take 2 sides. Some magically proxy
method calls and marshal data between object, others want users to always
marshal by hand and send via the sockets.

> You have to be able to know exactly what the lowermost piece of the puzzle
> is going to do before you can reason about what pieces that depend on that
> may or may not do.

So it is possible to abstract any of that away? Maybe it is not a horizontal
abstraction as in "hide the network away", but maybe a vertical abstraction,
as in the creating socket-like objects that have extra features (like ZeroMQ).
Connecting, sending, receiving is still there, but there is extra helping on
top at each step.

On another side note. I think network topologies and characteristics changed.
It used to be that creating consistent distributed systems was easiest even
though networks speed were slower. Networks were more likely to be local in a
data center. So chance of a network split was much lower. (And I've learned
there are fewer more evil things in a general distributed system than a
network split). Today's distributed systems are more likely so experience
network splits (multi-zone, multi-datacenter clusters I guess are more common
+ unpredictability added by using VMs). So maybe paradoxically distributed
system really got harder not easier.

~~~
ChuckMcM
_So it is possible to abstract any of that away? Maybe it is not a horizontal
abstraction as in "hide the network away", but maybe a vertical abstraction,
as in the creating socket-like objects that have extra features (like ZeroMQ).
Connecting, sending, receiving is still there, but there is extra helping on
top at each step._

Its possible to design bits of it away (and that becomes necessary) in order
to build larger systems. Protobufs were, in a lot of ways, one response.

The key takeaway for me from that time, and going forward, was that
distributed services are effectively messages in flight. The more messages in
flight, the more combinations of arrival times, the fewer messages in flight,
the less scaling. This became what is known as the CAP theorem. (Wonderfully
articulated by Bowers, when I read it I said, "Yeah that!")

For things like RAID arrays, where you can related the state of the data in a
RAID array mathematically with the data that you know, you can abstract it
away to a pretty straight forward open/read/write interface. For things where
the data set mutates along a path (imagine a state machine of 'n' states,
where its 'path' is the sequence of states it is in between time 0 and time t)
if correctness is a function of the path then you have a lot harder time of
it. The canonical example of this is two writers to a file. The sequence
Insert A -> Insert B -> Insert C -> Delete Last leaves behind a different file
if it sees Insert A -> Insert B -> Delete Last -> Insert C.

Unwinding those sorts of things seems (or at least illuminating them) seems to
be what Jepsen is trying to solve.

------
voidmain
I _love_ that someone is actually testing database fault tolerance in public,
instead of just assuming that anything with replication is fault tolerant...

Stateful distributed systems are really, really hard to get right. The only
hope is to do a LOT of simulation. Not that testing is a substitute for
designing things right, but no one is smart enough to be sure they've thought
of all cases.

At FoundationDB we do this sort of testing millions of times over:
<http://foundationdb.com/white-papers/testing/>

so that we can let people plug and unplug our servers at trade shows without
fear (video):
[http://blog.foundationdb.com/post/50924738464/foundationdb-f...](http://blog.foundationdb.com/post/50924738464/foundationdb-
fault-tolerance-demo-video)

and ultimately so that applications on top of our system face a somewhat less
scary world.

~~~
shin_lao
Nice to see you are serious about testing, however this PR page has got
something that hits me as strange:

 _As a rough measure of the effectiveness of our simulation testing framework,
Quicksand has only ever found one bug._

To me, this means your test is useless, not that your software is reliable.

Generally speaking, simulating hardware faults is not very useful because
you're testing the software layer below you: you don't talk to the hardware,
you talk to the kernel which talks to the hardware.

(Unless your filesystem and network code is in kernel land in which case I
will retract this comment and go die in a dark corner of Earth. ;) )

~~~
voidmain
Quicksand (the physical fault testing environment) doesn't find our bugs
because our simulation-based testing finds them first!

Our deterministic simulations cover all of the failure cases that quicksand
can test and many more. The main reason we built quicksand is precisely to
find out if our assumptions about the layers below us are wrong, since the
simulator can't test its own assumptions. For example, there are
hardware/software combinations out there where fsync() can't be trusted...

~~~
malkia
Can this piece of software be used to test other systems/db's in combination
with OS?

------
jamwt
I'm late to the party on this, but unless I'm misreading things... the author
used Riak the wrong way and it failed, and then he used it the right way, and
it succeeded.

Granted, as the author says, Basho tries to delay the necessity of "breaking
the news" about the flaws of LWW to prospective Riak users, but the author
references the Dynamo paper. The Dynamo paper is chock full of information
about about siblings and the key role they play. If you read the Dynamo paper,
you would know immediately that using a system like Riak without thinking
through sibling resolution is akin to lighting your data on fire. This
complexity of siblings is the tradeoff you pay for high availability (no
silver bullet, etc).

And if your scaling/availability problems are serious enough to even begin
saying "we a distributed database like Riak!", hopefully it's not asking too
much to have you read the Dynamo paper to understand how these sorts of
databases actually work.

( note - we admin a largish riak cluster here at bu.mp )

~~~
brown9-2
It makes sense to me to test the default configuration that Riak ships, and
even with the caveat of "you probably shouldn't use this", it's failures are
an instructive lesson (which is the underlying goal of the series of blog
posts, not just benchmarking various databases).

------
cs648
Slightly off topic, but I didn't understand the first line: "Notorious
computer expert Joe Damato explains: “Literally no one knows.”". I guess it's
a joke I didn't get?

~~~
aphyr
In-joke. You gotta know the guy. :)

