
Testing Distributed Systems with Deterministic Simulation [video] - tosh
https://www.youtube.com/watch?v=4fFDFbi3toc
======
parley
I did this for a decently complex distributed system for embedded devices, and
it practically saved my life.

It was in C and I didn't do any language extending/precompiling, but I had
interfaces for everything related to I/O, execution actors, randomness, etc.

On target hardware everything used TCP/UDP/real disk, pthreads, normal rand
sources, etc. In simulation everything used virtual networking, a single
simulation thread stepping all event loops, test-specified random seeds for
reproducibility, etc.

It is completely invaluable. I can concisely write completely deterministic
system tests that will execute tens of thousands of lines of code. I can fuzz
test actor scheduling, I/O problems like dropped packets/msgs, and everything
you can think of. I can run the entire test suite in valgrind and other nice
tools. I can put a big machine in a corner of the office to fuzz test the
suite for weeks on end and email me when a a test fails and tell me exactly
which random seed to use to reproduce the failure myself within minutes. I can
debug the entire simulation perfectly in GDB.

I've barely begun to describe how great it is to have, how many bugs these
tests have caught or what a reliable regression test suite one can build. It
doesn't replace testing on target - I do that extensively as well. Big system
scenario tests don't replace smaller module and unit tests - I do those too.
But deterministic simulation testing saved my sanity. Don't hesitate to
evaluate this approach if you're doing something similar.

------
wwilson
I'm glad people have been getting a kick out of this, but for those who don't
want to listen to me drone on for 40 minutes:

(1) Here are the slides I used:
[http://www.slideshare.net/FoundationDB/deterministic-
simulat...](http://www.slideshare.net/FoundationDB/deterministic-simulation-
testing)

(2) Here is a (significantly out of date) white paper of ours which covers
some of the same territory: [https://foundationdb.com/key-value-store/white-
papers/testin...](https://foundationdb.com/key-value-store/white-
papers/testing)

~~~
tinco
You actually have a very pleasant presentation style, it drew me in and I
enjoyed every minute of it. Though perhaps it's just because I find the
subject and your solution very interesting.

What's your standpoint on formal verification? Did you guys think about it and
reject it?

------
jamii
[http://db.cs.berkeley.edu/papers/dbtest12-bloom.pdf](http://db.cs.berkeley.edu/papers/dbtest12-bloom.pdf)
has a really nice extension of this idea - since their language is unordered
by default and has only a few explicit ordered primitives, they can use an SMT
solver to determine which event interleavings cannot possibly affect the
result. This lets them generate schedules that more efficiently explore the
space of all possible schedules and find bugs faster.

------
tinco
This talk is awesome. The idea that the simulation framework becomes part of
the production code is brilliant. I wonder if these ideas could be merged with
formal methods, so that perhaps a model of the simulation could be generated,
and then through model analysis it could generate simulation stories for
itself that humans might overlook.

