
Fuzzing Raft for Fun and Publication - otoolep
http://colin-scott.github.io/blog/2015/10/07/fuzzing-raft-for-fun-and-profit/
======
Smerity
I'm very excited by this work. As they say, testing for distributed systems
leaves a great deal to be desired, but there work is a great step!

I was excited for FoundationDB, who put a great deal of focus on testing[1],
but they were acquired into mystery by Apple. To give you an idea of how well
tested they were, Aphyr of "Call Me Maybe" fame didn't test them "because
their testing appears to be waaaay more rigorous than mine"[2]. FoundationDB
ran Jepsen internally anyway[3].

DEMi is a really nice tool that has me excited. I've worked with similar test
systems for non-blocking IO models and they're hugely useful. For non-blocking
IO, you can artificially accelerate time as you can leap to the next
interrupt, speeding up fuzzing substantially (see: Flow from FoundationDB in
[1]).

@ikneaddough, the demi-applications repo notes Spark, have you tested it so
far? I quite like Spark but had some very temperamental issues at scale.

[1]:
[https://www.youtube.com/watch?v=4fFDFbi3toc](https://www.youtube.com/watch?v=4fFDFbi3toc)

[2]:
[https://twitter.com/aphyr/status/405017101804396546](https://twitter.com/aphyr/status/405017101804396546)

[3]:
[https://web.archive.org/web/20150325003511/http://blog.found...](https://web.archive.org/web/20150325003511/http://blog.foundationdb.com/call-
me-maybe-foundationdb-vs-jepsen)

~~~
ikneaddough
Yup! We ran on Spark as well, though we were only focused on reproducing (and
minimizing) known bugs, not on finding bugs. For more info, see our research
paper:
[http://eecs.berkeley.edu/~rcs/research/nsdi_draft.pdf](http://eecs.berkeley.edu/~rcs/research/nsdi_draft.pdf)

------
resc1440
This is super interesting - especially the tool they used. It looks like it
interposes itself at the RPC level, unlike Jepsen which intervenes at the
network level. That lets it create _reproducible_ test cases in a distributed
system, which it then _minimizes_ so that they can be understood more easily.
Very cool.

[https://github.com/NetSys/demi](https://github.com/NetSys/demi)
[http://www.eecs.berkeley.edu/~rcs/research/nsdi_draft.pdf](http://www.eecs.berkeley.edu/~rcs/research/nsdi_draft.pdf)

------
r0naa
I am working on an implementation of Raft and that's a gold mine of issues to
look for. It's also relieving to see testing tools for DSs coming up.

DEMi is written very clearly, and it's a pleasure to read the code:
[https://github.com/NetSys/demi](https://github.com/NetSys/demi) . Definitely
challenges the research "spaghetti code" stereotype.

Really cool. :-)

~~~
ikneaddough
Haha, I'm flattered. Also, thinking to myself that you must have gotten very
lucky in which source file you chose to read first :-)

------
munin
Next, they should fuzz IronFleet:
[http://research.microsoft.com/apps/pubs/default.aspx?id=2558...](http://research.microsoft.com/apps/pubs/default.aspx?id=255833)

~~~
ikneaddough
Actually not such a crazy idea! There are always parts of verified
implementations that aren't verified. And who knows, maybe they got some of
their proofs wrong!

~~~
munin
it would be really, really interesting if fuzzing found an error in the proof!
there isn't a good track record for this though. i've been thinking about what
to do to have a chance of success with that for a little.

~~~
viraptor
I've got a feeling that would be a very common reference point for many future
"theoretical or practical security" debates...

~~~
munin
those debates don't use facts or evidence, so, probably not ;)

