
Distributed Systems Testing: The Lost World - scott_s
http://tagide.com/blog/research/distributed-systems-testing-the-lost-world/
======
akkartik
_"..trace logs, a relatively old technique that glorifies printf as First-
Class Citizen Of The Testing Guild.."_

I don't understand the stigma attached to logs and debug-by-print. When is a
tool that uses x to provide y 'glorifying' x?

I've built and regularly use two tools that rely on logs:

a) I write white-box tests that allow me to check not just what a function
returns but specific events in the course of the function running:
[http://akkartik/name/post/tracing-tests](http://akkartik/name/post/tracing-
tests). This allows me to check, for example, that searching in a sorted array
doesn't require double the lookups when the size of the array doubles. It also
allows me to test subcomponents without calling them directly, leaving their
interfaces underspecified and flexible to change later without needing to
modify a bunch of tests.

b) I often debug my programs by dumping a trace and running it through a
'trace browser' which starts out showing just the coarsest level and allows me
to drill down in specific places as I want. This 'zoomable' google-maps-like
UI gives me all the benefits of time-travel debugging at a fraction of the
system engineering it usually takes. (I _still_ can't get the feature to work
in gdb..) Try it out:

    
    
      $ git clone https://github.com/akkartik/mu
      $ cd mu
      $ ./mu browse-trace .traces/factorial-test
    

It'll take ~30s to compile the first time you run it (C compiler required on
Linux/OS X). Once it compiles, you'll see an ncurses program. Use 'q' to quit,
hit enter to zoom into lines (you can see how many lines are collapsed in
parens), backspace to zoom out on a line.

These are both things nobody can do on a conventional toolchain, and super
easy to build. Maybe print seems trivial because nobody's building tools
around it?

------
jedberg
The best way to test a distributed system is to break it intentionally after
you think you have sufficient monitoring to find all the problems. Then you'll
know if you have sufficient monitoring.

And if you have an outage where you didn't have sufficient monitoring, then
you add it afterwards. But at least by breaking it intentionally you can at
least be watching.

~~~
timboio
+1

This is the entire idea behind Netflix's Failure Injection framework
([http://techblog.netflix.com/2014/10/fit-failure-injection-
te...](http://techblog.netflix.com/2014/10/fit-failure-injection-
testing.html)) which was briefly alluded to in OP's link.

~~~
joneholland
Which makes sense considering jedburg works (worked?) for Netflix after he
left Reddit.

~~~
jedberg
Worked, but yes I was deeply involved in that project.

------
jakozaur
Distributed testing is very important, but based on article doesn't get as
much papers on it. Some plausible explanation:

1\. Making things that actual work is industry, which writes very few papers
comparing to academia. Academia cares more about design, less about working
large scale systems.

2\. Testing is more about well-oiled machinery, less about some super clever
algorithm. Testing is more of a craft than art.

~~~
was_boring
Let's not forget about the social aspect -- testing is seen by many in the
industry as a less prestigious activity, one which you hire barely competent
"developers" and can hurt career progression if accepting a job doing it
(exceptions always exist).

~~~
im_down_w_otp
It seems like the marketplace has decided that it's usually okay to
push/deliver/ship questionable software. We even have a cute colloquial
industry term for this, "MVP". _Minimally_ viable. That's Grade-D beef... by
design.

When the focus is pushing out Grade-D goods what's the point of rigorous
testing? If you get it wrong, it barely matters. If it matters at all.

Perhaps the social view of software testing is just a corollary to the state
of software engineering?

~~~
ethbro
I feel like MVP design winning is the market clearly deciding that it values
features over stability. I'm curious if the economics of this will change when
/ if we come to the end of exponential hardware growth.

~~~
im_down_w_otp
I think to some extent the boundary of friction that's going to grow between
software creators and hardware manufacturers as various shapes of Smart-
thing-X become more prevalent might ebb this a bit.

Because as more feature-laiden software is written to facilitate growth in
consumer and industrial markets that is deployed on something more akin to an
embedded device than a PC or server there's going to have to be more rigor
applied to the correctness of the software to compensate for how tightly
coupled it is to the largely immutable use-specific hardware.

~~~
ethbro
Frighteningly possible counter-alternative: continue to ship bad software on
embedded devices and just toss network connectivity in there in the event the
device becomes popular / customers demand an update.

~~~
im_down_w_otp
Yeah. I certainly fully expect a certain cross-section that is shaped just
like that. Take "Nest" as an example. Unless the "smart" thermostat can
somehow manage to burn my house down by turning the heat up way, way too
high... then the fact that it's almost comically broken as a "smart" device
doesn't really matter. Just keep pushing updates to it.

However, I think there's a growing surface area where our consumer
expectations for iteration and features are going to bleed into places that
were previously left largely untouched by such demands. "Smart" cars are
probably a good example. There's growing and intensifying competition for the
ownership of the experience in the cabin/cockpit of the car, and in order to
draw lines of competition there's a lot of focus on how to integrate that
space with the rest of your normal and your digital lifestyle. That ends up
being a place where, to avoid ending up the subject of very public screw ups,
some amount of rigor that's applied to the lower-level stack will also
eventually bubble up to the feature/application level at the same time that
enablement for features/applications (network connectivity, new kinds of input
devices, etc.) will end up being forced into the lower levels of the stack. Or
at the very least a useful set of stable abstractions and tools will be built
to prevent today's software developers from having to care very much about
applying that rigor and the abstractions/tools will just take care of it for
them.

------
chris_va
Having seen both sides, I would argue that academia doesn't really have a
subfield that fits distributed systems well (most industry systems span
multiple academic fields, from networking to fault tolerance, databases, OS,
etc). Not to say that people don't research it, but more people research
subcomponents.

Also, I would argue that academia is about 7 years behind industry in this
area, and so papers don't match up to reality well.

On the other side, industry testing is usually very specific to the internal
product created, and so there is an incentive not to publish anything.

There is some nice academic research outside of CS, notably mechanical
engineering, on systems testing. It is pretty applicable. Contracts, sub-unit
testing, etc are all anaolgous.

My two cents.

~~~
sevensor
Interesting point about the MechE literature. There's a whole academic field
of multidisciplinary design analysis and optimization that tries to deal with
systems problems. They have a whole suite of tools for coming to grips with
the couplings between subsystems. I hadn't ever thought about it in terms of
distributed software systems, but it definitely applies.

------
metasean
A friend and colleague of mine developed a simple distributed testing
framework in JavaScript - [https://github.com/PsychoLlama/panic-
server](https://github.com/PsychoLlama/panic-server) (It works in conjunction
with Mocha, Jasmine, and possibly other test frameworks -
[https://github.com/PsychoLlama/panic-
demo](https://github.com/PsychoLlama/panic-demo))

(We work on an open source distributed data sync engine and needed a way to
test distributed aspects of that engine -
[https://github.com/amark/gun/](https://github.com/amark/gun/))

------
Ericson2314
IMO if testing this stuff is difficult, rather than simply understudied, one
shouldn't be dismissing formal verification sans evidence.

------
markbnj
>> There’s too many things packed into the concept of testing distributed
systems, and that is pretty clear in what came into my Twitter feed

I think this gets closest to the truth. That first slide regarding "testing a
microservice architecture" with its complicated block diagram hints at why the
level of abstraction is too high to be useful. In the end a distributed system
consists of components that have interfaces and which interact with each other
through those interfaces. You test the component modules in unit testing, the
components themselves at the interface level, and the whole system in end-to-
end request and response flow. That latter is assisted by centralized event
collection and storage tools like logstash + elasticsearch, but it could just
as easily be dumped log files. Whatever works. That's the closest I've
personally come to "distributed systems testing."

------
crocal
Distributed systems testing is well studied and applied in industry, but there
is significant IP and competitive advantage attached to it. For this reason,
papers are rarely published. So, this world is not lost to everyone ;). You
will see such expert frameworks defined and applied in aeronautics, military
and transportation systems, for example. The key challenge is reproducibility
of the testing. Central logging of events is not enough, you need some way to
enforce or guarantee the global causal ordering of events. Leslie Lamport is
recommended reading.

~~~
ethbro
It seems there's an under appreciation for pre-digital systems engineering
lessons at the ground levels of computer science. There's a lot of cool and
really useful smart stuff in there.

------
polskibus
What worked for me best was to have as thorough monitoring as possible (IO,
mem, CPU, per process, etc.) on infrastructure side and as complete event
sourcing on the app domain side. It's easier said than done of course, and in
a brownfield situation you will never achieve perfection in terms of coverage.

Once that's in place, write tests with selenium or unit test framework with
parallel extensions (depending if cases are frontend or backend based) and
analyze using home made tools.

