

Show HN: Damocles – An Erlang library for distribution testing - lostcolony
https://github.com/lostcolony/damocles

======
lostcolony
This is something I've been working on in my spare time for a bit in
preparation for a project I'm about to start at work.

While there are libraries out there for testing certain distribution behaviors
across multiple nodes (such as Aphyr's Jepsen), and utilities/libraries that
will monkey with traffic headed outwards (Crapify, Comcast, etc), I didn't see
anything that gave me the control I wanted to be able to programatically drive
multiple adapters, that I might make easy use of in automated testing.

I always like being able to run my tests locally, without having to rely on
shared network resources. To that end, this basically lets you spin up a bunch
of local adapters, and start dropping/delaying packets between them, without
affecting eth0, and without affecting 127.0.0.1/localhost (so running tests
shouldn't interfere with anything else, network-wise). You'll need to get some
IPs not in use on the network, but everyone running the tests should be able
to use those concurrently since they're not being exposed.

This lets youeasily test simple stuff (one node disappeared, and we try a
write to a node, what happens?), to completely insane stuff (nodes 1 and 2
can't talk to 4 and 5, nor the reverse, everyone can talk to node 3, but 3
can't respond, and there's a 200ms delay between 4 and 5. And we try a write
to a node. Or whatever), all on one box, in a repeatable, automated way.

I eventually want to expand it to work across remote interfaces, so that it
can be used at load, but this is a start.

A command line hook is included; all functionality is callable from the
command line through it.

~~~
rdtsc
Great work, thanks for sharing. Nice Dialyzer specs too!

I see you used the tc command. It is an often forgotten gem but really useful
in testing failure scenarios in networked (distributed systems).

For everyone else:

[http://linux.die.net/man/8/tc](http://linux.die.net/man/8/tc)

------
mlakewood
I've been thinking of doing something like this for a while. However I want to
be able to get datapoints on all the requests going though the proxies so that
I can run integration tests on a bunch of distributed systems working
together, and then be able to induce failures as well.

The other thing that I think would be great would be to add generative testing
to this, to help with testing. Although im not sure how you would model the
system, as I havent done much generative testing.

~~~
lostcolony
Well, I'm not actually passing anything through proxies. Instead, I'm creating
new loopback interfaces, lo:0, lo:1, etc. Presumably any tooling that monitors
an interface could be set to watch those (in fact, you can create the
interfaces on a machine, and leave them up in between runs, telling Damocles
about them using 'register_interface'. That way they stay consistent and can
be referenced by other tooling).

You can actually use this to test multiple systems working together as well.
This is completely agnostic to what's running on the interfaces. As long as
everything can be run on one machine (at least until I find the time to do the
work to get it working remotely), you can set up multiple interfaces each of
which runs an instance of one app and then another set of interfaces that each
run an instance of another app (etc), configure the apps to talk to each other
however you like, and then start monkeying with the connections between
interfaces using this.

I'm not familiar with generative testing but in general, my goal wasn't to
provide a testing framework for distributed systems, but instead a library
with higher level primitives that could be used to easily compose pretty
complex failure scenarios.

~~~
mlakewood
Ahh I see. So all the connecting up of what listens on which interface is done
somewhere else. This just created failures at the network level in some
scriptable manner. Everything else is up to you.

Makes sense.

What I was thinking about was actually setting up a proxy so you could mess
with or record specific requests, and then if all the services in a system
were wrapped like that then you would be able to do some really cool
integration testing.

