
What I wish systems researchers would work on - ivanist
http://matt-welsh.blogspot.com/2013/05/what-i-wish-systems-researchers-would.html
======
jacques_chester
Computer systems, which are _deterministic_ , have long since passed out of
the realm of being _understandable_.

They're chaotic systems[1]. Parameters which are below the threshold of
detection can drastically affect their trajectory in phase space. Accidental
complexity has zoomed off to live a life all its own, and that takeoff has
been continuous for decades [2].

Everything we build around software that's not directly bearing on the problem
domain is chipping away at this or that aspect of the ungoverned bits. As we
corral each level of new unpredictability, we merely set the scene of the next
level of complexity.

 _There is no universal solution_. There is no final boss which, once
defeated, leads to the scrolling names and 8-bit themesong. It never stops,
ever, because the value of systems that frequently fail unpredictably is lower
than the cost of flawless systems and higher than the value of not building
such systems.

Edit: sounds very unhelpful, even defeatist. I know. I'm deeply ambivalent
about what _can_ be done versus what we should _assume is possible_. I'm a fan
of serious software engineering and tool support and all that jazz ... yet
thoroughly pessimistic about the limits of the possible. Deep sighs for every
occasion!

[1] [http://chester.id.au/2012/06/27/a-not-sobrief-aside-on-
reign...](http://chester.id.au/2012/06/27/a-not-sobrief-aside-on-reigning-in-
chaos/) (warning, self linkage)

[2]
[http://www.cs.nott.ac.uk/~cah/G51ISS/Documents/NoSilverBulle...](http://www.cs.nott.ac.uk/~cah/G51ISS/Documents/NoSilverBullet.html)

~~~
codex
They're not even deterministic. Even if your input is totally ordered, entropy
from the memory subsystem will bubble up to communicating threads if you have
more than one. Sometimes one thread will read the newly written value,
sometimes the old. That said, there is some clever research out of UW into how
to make such systems deterministic while still recovering almost all
parallelism.

~~~
InclinedPlane
I think the more important point jacques_chester was making is that even in
the case where computer systems are 100% deterministic it doesn't matter
because they are so complex they become chaotic systems. And then hypothetical
determinism won't save you because there are too many variables in play
simultaneously, and even the slightest variation in any one of them could
change the behavior completely.

~~~
slacka
Sorry guys, but I don't buy your defeatist arguments for one second. Sure
using traditional tools, you may be right. However, all we have to do is look
at the brain with its 100 trillion Connections (axioms) or a CPU with its
billions of transistors to see that your logic is flawed. If it was not
possible to design fault tolerant systems on large scale neither of these two
examples could exist.

I don't claim to have the answers, but a few years ago, I had to help out some
EE's write some testing software in LabVIEW. The first thing that shocked me
was the system was immune to bad data. Secondly, how elegantly it took
advantage of our multiprocessing and multithreading hardware.

What I wish systems researchers would work on is concurrent, signal-based,
synchronous languages. This is a winnable game, but we must swallow the red
pill and change the rules of the game with new tools and fresh ideas.

------
ethereal
Speaking as a student studying systems (particularly security and OS
development) . . . the second point sounds really interesting.

Recently I've been working on a simple microkernel (called Sydi) that should
actually make this possible. It's a distributed system straight down to the
kernel: when two instances of the kernel detect each other running on a
network, they freely swap processes and resources between each other as
required to balance the load. This is a nightmare from a security perspective,
of course, and I haven't figured out the authentication bits yet -- too busy
working on an AML interpreter because I don't like ACPICA -- but that's not
the issue here. (It's far from finished, but I'm pretty confident that I've
worked out the first 85% of the details, with just the remaining 115% left,
plus the next 95% of the programming . . .)

Due to some aspects of its design (asynchronous system calls via message-
passing, transparent message routing between nodes in the network, etc) I feel
that it will be entirely possible to take a snapshot of an entire cluster of
machines running the kernel. It would be expensive -- requiring a freeze of
all running processes while the snapshot is taking place to maintain any level
of precision -- but I'm confident I could code that in during the space of a
week or two . . .

I haven't thought much about what kind of analysis one could do on the
instrumentation/snapshot results, though. I'm sadly too inexperienced with
`real-world' systems stuff to be able to say. Anyone have any suggestions for
possible analysis avenues to explore?

~~~
pi18n
This sounds interesting! Have you published anything more substantial on it? I
yearn for the day when I can have a Plan 9-ish resource sharing and file
system connecting my personal computer to my cloud computers.

------
david927
I want to register a complaint. _Everyone_ talks about wanting ground-breaking
research and innovation, but in my own personal experience I see little-to-no
support for it.

It reminds of the scene from the film _Election_ , where a character intones,
after you've seen him ski down a double-black diamond slope and crash, "Last
winter when I broke my leg, I was so mad at God."

We're spending money to squeeze the last drops of gold out of the research
done in the 60's and 70's, while research starves, and then complaining that
there's nothing new. I don't see how we can expect anything different.

~~~
jnazario
i originally opened the link expecting to conclude the same thing, however if
you read the article again i think the guy is a) highly qualified to complain
(after all he chaired HotOS's PC, no small recognition of his stature in the
community) and b) he raises good points.

systems are about interactions, and he's identified the challenges in
analyzing systems.

this isn't so much about work from the 60s and 70s, but about modern
architectures and the new things they enable. we're still working with models
form the 60s and 70s and ignoring various aspects of new system architectures
and wondering why we still suck.

~~~
david927
Thanks, I think you're right that I need to read it again. I have a personal
issue with this and it leads to knee-jerk reactions. Thanks for that.

------
zobzu
i find it "interesting" that all the complains are about useability from a
"userland" developer pov. Generally what comes from that are patched up
systems to "run stuff in kernel" because "its safe and fast"

Which of course is not true. There are however various attempts to make a
"proper" "modern" OS but all of these have failed due to the lack of
commercial backing. There is apparently no financial interest into having a
secure, scalable, fast, deterministic, simple OS right now (which, oh,
requires rewriting all your apps, obviously)

Even Microsoft tried. Yes, Microsoft did an excellent such OS.

~~~
qznc
Well, there is a lot of interest in a secure, scalable, fast, deterministic,
simple OS which is also backwards compatible. ;)

What do you mean by "safe" though? How is kernel-space safer then user-space?

~~~
pjmlp
By being written in systems programming languages safer than C.

~~~
alberich
This could catch some programming bugs, sure, but it wouldn't make the OS much
safer. You can still have attacks on integrity, privacy and availability.

~~~
pjmlp
Agreed, but the attack window would be smaller, that is what counts.

If you really want to attack a system, nothing beats social engineering
anyway.

------
PaulHoule
CS research has been pathological for a long time.

Anybody near the field has their own list of "taboo" research topics that are
very interesting but just don't get funding.

Then you go to the library and realize they have 120 shelf-feet of conference
proceedings about Internet QoS that have gone precisely nowhere.

------
InclinedPlane
Configuration, performance, package management, reliability/robustness,
security, monitoring, virtualization, and distributed computing. Those are the
big problems and missing pieces that people doing systems level engineering
are facing today. And, no coincidence, these are the things where you see the
most repeated instances of tools that address these issues.

Consider package management. You have an OS level package manager, you have OS
level application management (on android, for example), you have node package
management, you have ruby gems, you have perl and python packages, you have
wordpress themes and plugins, etc. Obviously it doesn't make sense to have a
single global package manager, but what about standards for package
management? Competing package management infrastructures which serve as
foundations for all these lower level systems? A mechanism for aggregating
information and control for all of the package management sub-systems?

As the article points out a lot of systems folks and OS folks are still
retreading the same paths. They want to pave those paths, then build moving
walkways, and so forth. That's all fine and good, but there's a lot of
wilderness out there, and a lot of people are out in that wilderness hacking
away with machetes and pickaxes building settlements, but for some reason most
OS folks don't even imagine that roads can go there too.

~~~
regularfry
As far as the ruby/python/perl package problem goes, I think we're heading
towards a standard with virtualenv-style tools and potted environments. The
simplest possible standard there is a <foo>/bin/activate script to set the
right environment variables; you can build pretty much anything else you want
on top of that.

------
rlpb
I think all of these areas are making great progress.

> An escape from configuration hell

debconf. chef, puppet. git. augeas. gsettings. In the Windows world, the
registry.

There's tons of work being done in this area. You may say that more can be
done, or that it isn't good enough, but I think you're wrong to claim that
nobody is working on it.

> Understanding interactions in a large, production system

How about the Google Prediction API? Send the data in, and look for changes
when you have a problem that you're trying to fix, such as in your example.

> Pushing the envelope on new computing platforms

This is incredibly vague. I find it difficult to understand your definition of
"new computing platforms". Based on my interpretation of what you mean, I
think a modern smartphone might qualify: a new form factor, more ubiquitous
than before, novel UX (capacitive touchscreen), and an array of interesting
sensors (accelerometers, compasses, GPS).

I think your post highlights the lamentation of AI researchers: we don't have
a good, static measure of success. Once we've achieved something, our
definition changes to exclude it.

~~~
mprovost
None of those configuration management projects really advance the state of
the art beyond cfengine which has been around for 20 years. They're still just
pushing text files around.

------
jsnell
I think people replying here aren't quite appreciating the complexity of
configuration properly here. One might think that configuration doesn't get
harder with scale, but it does. Suggesting that the solution is simply using a
general purpose language for configuration or storing configs in a key-value
db or in a version control system is almost adorable. It's basically
completely missing where the pain points are.

And you can be sure that a company like Google will have done configuration
using flat files, key-value stores, DSLs, general purpose languages that
generate config files, general purpose languages that actually execute stuff,
and probably many other ways :-) If there were simple solutions like that,
they'd already be in use.

Here's a small list of desirable properties:

\- Configurations must be reusable, composable and parametrizable. When
running a system with multiple components, you need to be able to take base
configurations of those two components and merge them into a useful
combination, rather than writing the configurations from scratch. You must
also be able to parametrize those base configurations in useful ways -- which
external services to connect to, which data center and on how many machines to
run on, etc. And note -- this can't be a one time thing. If the base
configurations change, there needs to be some way of eventually propagating
those changes to all users.

\- Configurations must be reproducible. If the configuration is a program, it
shouldn't depend on the environment where it's run on (nor should it be able
to have side-effects on the environment). Why? Because when somebody else
needs to fix your system at 3am, the last thing you want them to do is need to
worry about exactly replicating the previous person's setup.

\- Tied to the previous point, configurations also need to be static or snap-
shottable. If a server crashes and you need to restart a task on another
machine, it'd be absolutely inexcusable for it to end up using a different
configuration than the original task due to e.g. time dependencies in
programmatic configuration.

\- It must be possible to update the configuration of running services without
restarting them, and it must be possible to roll out config changes gradually.
During such a gradual rollout you need to have options for manual and
automatic rollback if the new configuration is problematic on the canaries.

\- Configurations need to be debuggable. Once your systems grow to having tens
of thousands of lines of configuration or hundreds of thousands of
programatically generated key-value pairs, the mere act of figuring out "why
does this key have this value" can be a challenge.

\- It'd be good if configurations can be validated in advance, rather than
only when actually starting the program reading the configuration. At a
minimum, this would probably mean that people writing configurations that
other people use as a base be able to manually add assertions or constraints
on whatever is parametrized. But you might also want constraints and
validation against the actual runtime environment. "Don't allow stopping this
service unless that service is also down", or something.

Some of these desirable properties are of course conflicting. And some are
going to be things that aren't universally agreed on. For example there is a
real tension between not being able to run arbitrary code from a configuration
(important for several of these properties) vs. reuse of data across different
services.

My thinking on this is already a few years out of date, it's probable that
configuration has again gotten an order of magnitude more complex since the
last time I thought about this stuff seriously, and there are already new and
annoying properties to think about. The main point is just that this really is
a bit more complex than your .emacs file is.

~~~
vsh426
The properties that you listed reminded me of the Nix package manager
<http://nixos.org/nix/> and NixOS that uses it. I came across it originally
while looking into package managers that would allow rollbacks or at least
make it easy for multiple installed versions of a package.

~~~
abecedarius
I had the same thought. (I haven't used Nix, only read about it.)

------
tikhonj
Dealing with configuration files is a problem that extends well beyond
distributed systems, or even systems programming in general. It's certainly a
very important issue.

One of the fundamental problems with configuration files is that they are
written in ad-hoc external DSLs. This means that the config files behave
nothing like actual software--instead, they have their own special rules to
follow and their own semantics to understand. These languages also tend to
have very little abstractive power, which leads to all sort of incidental
complexity and redundancy. Good software engineering and programming language
practices are simply thrown away in config files.

Some of these issues are mitigated in part by using a well-understood format
like XML or even JSON. This is great, but it does not go far enough: config
files still stand alone and still have significant problems with abstraction
and redundancy.

I think a particularly illuminating example is CSS. When you get down to it,
CSS is just a configuration file for how your website looks. IT certainly
isn't a programming language in the normal sense of the word. And look at all
the problems CSS has: rules are easy to mess up, you end up copying and
pasting the same snippets all over the place and css files quickly become
tangled and messy.

These problems are addressed to a first degree by preprocessors like Sass and
Less. But they wouldn't have existed in the first place if CSS was an embedded
DSL instead of a standalone language.

At the very least, being an embedded language would give it access to many of
the features preprocessors provide for free. You would be able to pack rules
into variables, return them from functions and organize your code in more
natural ways. Moreover, you could also take advantage of your language's type
system to ensure each rule only had acceptable values. Instead of magically
not working at runtime, your mistakes would be caught by compile time.

This is particularly underscored by how woefully complex CSS3 is getting. A
whole bunch of the new features look like function calls or normal code;
however, since CSS is not a programming language, it has to get brand new
syntax for this. This is both confusing and unnecessary: if CSS was just
embedded in another language, we could just use functions in the host
language.

I think this approach is promising for configuration files in general. Instead
of arbitrary custom languages, we could have our config files be DSL shallowly
embedded in whatever host language we're using. This would simultaneously give
us more flexibility _and_ make the configuration files neater. If done
correctly, it would also make it very easy to verify and process these
configuration files programmatically.

There has already been some software which has taken this approach. Emacs has
all of its configuration in elisp, and it's probably the most customized
software in existence. My .emacs file is gigantic, and yet I find it much
easier to manage than configuration for a whole bunch of other programs like
Apache.

Another great example is XMonad. This is really more along the lines of what I
really want because it lets you take advantage of Haskell's type system when
writing or editing your configuration file. It catches mistakes before you can
run them. Since your config file is just a normal .hs file, this also makes it
very easy to add new features: for example, it would be trivial to support
"user profiles" controlled by an environment variable. You would just read the
environment variable inside your xmonad.hs file and load the configuration as
appropriate.

Specifying configuration information inside a general-purpose language--
probably as an embedded DSL--is quite a step forward from the ad-hoc config
files in use today. It certainly won't solve all the problems with
configuration in all fields, but I think it would make for a nice improvement
across the board.

~~~
rwmj
You really don't want to use a Turing-complete language for configuration. It
makes all sorts of things impossible such as:

\- Automatically scanning for insecure configurations (eg. OpenSCAP)

\- Parsing the configuration in other programs.

\- Modifying the configuration programmatically (cf. Puppet et al)

Also, <http://augeas.net/>

~~~
tikhonj
Those are the things that would actually get easier: your configuration would
be a data structure in the host language rather than a text file. So you
wouldn't have to parse it at all. Other programs would have to either be in
the same language or have some way of communicating with the host language,
but that isn't too bad a restriction.

Similarly, this would make modifying configuration programmatically better.
Instead of dealing with serializing and deserializing to random text formats,
you would just provide a Config -> Config function. This would also make tools
modifying configuration parameters more composable because they wouldn't be
overwriting each other's changes unnecessarily.

------
kraemate
All the three points in the "wishlist" are real-world problems, something
which academic conferences will shun straight-away. All work in areas like
configuration management must be either a)Evolutionary work based off existing
tools. This unfortunately doesn't qualify as good-enough research OR b)A
radically new approach, which is almost useless unless a robust enough
solution is developed, deployed, and tested. Researchers prefer multiple
papers over perfecting the engineering of software tools.

Hence there is no incentive for academic systems researchers to focus on these
(it's no fun having paper after paper shot down).

------
graycat
In the OP, the

> The "bug" here was not a software bug, or even a bad configuration: it was
> the unexpected interaction between two very different (and independently-
> maintained) software systems leading to a new mode of resource exhaustion.

can be viewed as pointing to the solution:

Broadly in computing, we have 'resource limits' from the largest integer that
can be stored in 2's complement in 32 bits to disk partition size, maximum
single file size, number of IP ports, etc. Mostly we don't pay really careful
attention all of these resource limits.

Well, for 'systems' such as are being considered in the OP, do consider the
resource limits. So, for each such limit, have some software track its usage
and then report the usage and especially when usage of that resource is close
to be exhausted. And when the resource is exhausted, then report that the
problem was the resource was exhausted and who the users of that resource
were.

For a little more, can make some progress by doing simulations of 'networks of
queues' with whatever stochastic processes want to assume for 'work arrival
time' and 'work service times'.

For more, do some real time system monitoring. For problems that are well
understood and/or have been seen before, e.g., running out of a critical
resource, just monitor for the well understood aspects of the problem.

Then there are problems never seen before and not yet well understood. Here
monitoring is essentially forced to be a continually applied 'statistical
hypothesis test' for the 'health and wellness' of the system complete with
rates of 'false positives' (false alarms) and rates of 'false negatives'
(missed detections of real problems).

For the operational definition of 'healthy and well', collect some operational
data, let it 'age' until other evidence indicates that the system was 'healthy
and well' during that time, and, then, use that data as the definition. Then
decree and declare that any data 'too far' from the healthy and well data
indicates 'sick'.

Then as usual in statistical hypothesis testing, want to know the false alarm
rate in advance and want to be able to adjust it. And if get a 'detection',
that is, reject the 'null hypothesis' that the system is healthy and well,
then want to know the lowest false alarm rate at which the particular data
observed in real time would still be a 'detection' and use that false alarm
rate as a measure of the 'seriousness' of the detection.

If make some relatively weak probabilistic assumptions about the data, then,
in calculating the false alarm rate, can get some probabilistically
'exchangeable' data. Then get to apply some group theory (from abstract
algebra) and borrow a little from classic ergodic theory to calculate false
alarm rate. And can use a classic result in measure theory by S. Ulam, that
the French probabilist LeCam nicely called 'tightness', to show that the
hypothesis test is not 'trivial'. See Billingsly 'Convergence of Probability
Measures'.

Of course, the 'most powerful' test for each given false alarm rate is from
the Neyman-Pearson lemma (the elementary proofs are not much fun, but there is
a nice proof starting with the Hahn decomposition, a fast result from the
Radon-Nikodym theorem), but for problems never seen before we do not have data
enough to use that result.

For the statistical hypothesis test, we need two special properties: (1)
Especially for monitoring systems, especially complex or distributed systems,
we should have statistical hypothesis tests that are multi-dimensional (multi-
variate); that is, if we are monitoring once a second, then each second we
should get data on each of a certain n variables for some positive integer n.
So, we are working with n variables. (2) As U. Grenander at Brown U. observed
long ago, operational data from computer systems is probabilistically,
certainly in distribution, quite different from data from most of the history
of statistics. He was correct. Especially since we are interested in data on
several variables, we have no real hope of knowing the distribution of the
data even when the system is 'healthy and well' and still less hope when it is
sick with a problem we have never seen before. So, we need hypothesis tests
that are 'distribution-free', that is, make no assumptions about probability
distributions. So, we are faced with calculating false alarm rates for multi-
dimensional data where we know nothing about the probability distribution.

There is a long history of hypothesis tests for from a single variable,
including many tests that are distribution-free. See old books by S. Siegel,
'Nonparametric Statistics for the Behavioral Sciences' or E. Lehmann,
'Nonparametrics'.

For multi-dimensional hypothesis tests, there's not much and still less when
also distribution-free.

Why multi-dimensional? Because for the two interacting systems in the example
in the OP, we guess that to detect the problem we would want data from both
systems. More generally in a large server farm or network, we are awash in
variables on which we can collect data at rates from thousands of points per
second down to a point each few seconds. So, for n dimensions, n can easily be
dozens, hundreds, ....

For such tests, look at "anomaly detector" in 'Information Sciences' in, as I
recall, 1999.

If want to implement such a test, might want to read about k-D trees in, say,
Sedgewick. Then think about backtracking in the tree, depth-first, and
accumulating some cutting planes.

For monitoring, there was long some work by Profs Patterson and Fox at
Stanford and Berkelely, in their RAD Lab, funded by Google, Microsoft, and
Sun. The paper in 'Information Sciences' seems to be ahead.

More can be done. E.g., consider the Rocky Mountains and assume that they are
porous to water. Let the mountains be the probability distribution of two-
variable data when the system is healthy and well. Now pour in water until,
say, the lowest 1% of the probability mass has been covered. Now the test is,
observe a point in the water and raise an alarm. Now the false alarm rate is
1%. And the area of false alarms is the largest we can make it for being 1% (a
rough surrogate for the best detection rate -- with a fast use of Fubini's
theorem in measure theory, can say more here). As we know, lakes can have
fractal boundaries, and the techniques in the paper will, indeed, approximate
those. Also the techniques do not require that all the lakes be connnected.
And for the dry land, it need not all be connected either and might be in
islands. So, it is quite general.

But may want to assume that the region of healthy and well performance is a
convex set and try again. If this assumption is correct, then for a given
false alarm rate will get a higher detection rate, that is, a more 'powerful'
test.

Still more can be done. But, really, the work is essentially some applied math
with, at times, some moderately advanced prerequisites from pure and applied
math. Theme: The crucial content of the most powerful future of computing is
in math, not computer science. Sorry 'bout that!

~~~
nostrademons
All of this is done at Google - there's extensive monitoring for all
production systems, with alerts firing once parameters move outside of their
statistical "normal" range. In practice the range tends to get set more by
"How tight can we make it before we get really annoyed by these pagers?" than
any rigorous statistical method, but the run-time operation carefully measures
average & extreme values and determines when a spike is just a momentary
anomaly vs. when it's worth paging someone.

The problem is that inevitably the _next_ problem is exhaustion of some
resource that nobody even thought of as a resource. In my time at Google, some
of the resource limits I've faced have included (besides the obvious
RAM/disk/CPU): binary size, size of linker inputs, average screen size of
websearch users, time required for an AJAX request, byte-size of the search
results page, visual space on the monitoring UI, size of a Javascript alert
box, maximum length of a URL in Internet Explorer, maximum length of a request
in Google's load-balancers, time until someone submits a CL touching the same
config file as you, and time before a cosmic ray causes a single-bit error in
memory.

File descriptors are a comparatively obvious resource, but nobody thought of
them because in previous normal operation they never even came close to
running out. Should someone set an alert on "wind speed running through the
Columbia River Gorge" (which would've been responsible for an actual outage
had there not been significant redundancy in the system, BTW)?

~~~
graycat
> All of this is done at Google - there's extensive monitoring for all
> production systems, with alerts firing once parameters move outside of their
> statistical "normal" range. In practice the range tends to get set more by
> "How tight can we make it before we get really annoyed by these pagers?"
> than any rigorous statistical method, but the run-time operation carefully
> measures average & extreme values and determines when a spike is just a
> momentary anomaly vs. when it's worth paging someone.

Really, "All"?

First question is, 'parameters'. Usually, and likely at Google, they are
monitoring just one parameter at a time. This is a really big bummer. E.g.,
even if do this for several parameters, are forced into assuming that the
'healthy and well' geometry of the parameters is just a rectangle. Bummer that
hurts the combination of false alarm rate and detection rate.

Next, the "normal range" is a bummer because it assumes that what is healthy
and well is just a 'range', that is, an interval, and this is not always the
case. The result, again, is a poor combination of false alarm rate and
detection rate.

Again, yet again, please read again, just for you, as I wrote very clearly, to
do well just must be multi-dimensional. I doubt that there is so much as a
single large server farm or network in the world that is doing anything at all
serious working with multidimensional data for monitoring. Not one.

Next, your remark about false alarm rate points to a problem I point out is
solved: The method, with meager assumptions, permits knowing false alarm rate
in advance and setting it, actually setting it exactly.

For how many and what variables to monitor, yes, that is a question that can
need some overview of the server farm or network and some judgment, but there
is some analytical work that should help.

For "rigorous" statistics, the point is not 'rigor' but useful power. Being
multidimensional, knowing false alarm rate and being able to adjust it, etc.
are powerful.

