
Human Error in Software - antirez
http://soveran.com/human-error.html
======
antirez
I really loved this post, and experimented what Michel says with my hands in
the context of Redis. The Redis code base is in general not rocket science, is
pretty understandable and does not use too complex ideas. I simply don’t trust
my ability to write complex and correct code at the same time, and so far this
approach worked well, Redis is quite stable all considered. However the file
implementing replication, replication.c, gained complexity very fast compared
to the other parts of Redis over the years. It’s 2500 lines of code, while for
example cluster.c is 5000 lines of code, but even at 2x the size, cluster.c is
simpler to understand. The reason is exactly what Michel says: a code that
matches a mental model. How replication.c got its complexity is easily
explained. Redis replication was extremely simple at start. Just streaming
replication between master and slave(s). You connect, get the initial payload
representing the dataset on the master, and starting from there is just
streaming of write commands received by the master (from the clients) to the
connected slaves. Then we added partial resynchronization. Later we added in-
memory replication (called diskless replication) for environments with slow
disks, chained replication, and so forth. Instead of redesigning the code in
order to cope with the new complexity, I usually approach the issue in a
different way: I modify what I’ve with the minimal set of changes to make it
working. This usually is a good strategy in my opinion, since to make
something _truly_ general sometimes makes it more complex than having
something simpler with a few exceptions to handle corner cases. However this
approach does not scale a lot, eventually the code is structured like if the
problem was simpler, but has a lot of exceptions. When you reached this point,
you can no longer build a mental model of how the code works. The result was
that we (Redis Labs Redis core team and I) recently found a number of bugs in
corner cases that could arise mixing diskless replication with PSYNC with
other unexpected events. While fixing those bugs I started to refactor
replication.c in order to make it structured in a way that it is possible to
create a mental model about it again, so that the actual layout of the code
reflects a bit more what are the moving pieces: the action of creating the
initial synchronization payload, the slave attempting a partial
synchronization, and so forth. There is still work to make, but in general,
its very important to write code for which a simple mental mode exists _and
matches_ the code layout.

~~~
fapjacks
Through the years, Redis has become and stayed my favorite database
technology. Simple, lightning-fast, with plenty of functionality to do
anything you need. This post is off topic with respect to the OP, but I just
wanted to take a minute to say thanks.

~~~
slowernet
I made the decision to use Redis/Ohm and Cuba for a personal project several
months ago and I've never felt more connected to and in command of what I'm
doing. I owe both these gentlemen a debt of gratitude.

------
kevinr
While I like James Reason's book and I find its discussion of mental models
instructive in my code and in my life, I do have to break with him on his
conclusions, which chiefly amount to the takeaway that systems at the level of
complexity we're building today are impossible to run safely, and we should
shut them all down. (I exaggerate, but only a little.)

A book I like which responds to Reason's is Nancy Leveson's Engineering a
Safer World (free PDF from the MIT Press, even!
[https://mitpress.mit.edu/books/engineering-safer-
world](https://mitpress.mit.edu/books/engineering-safer-world)) which says,
okay, if we don't want to shut these systems (like the Internet) down, how can
we run them safely, and provides some guidance.

I gave a short talk about it at Facebook's most recent Security@Scale
conference in Boston a couple weeks ago, the video of which is here:
[https://www.youtube.com/watch?v=e_-n5wX8okQ](https://www.youtube.com/watch?v=e_-n5wX8okQ)

Edit: To tie this back to the OP, I think while it's desireable for software
to be as simple as is reasonable given constraints, it's decreasingly possible
to say that all software can be built so simply that analytic reduction works,
and we need tools to help us cope with software systems which exhibit emergent
complexity.

------
j_h_s
"the rational behavior would be to read the code, understand what it does, and
reject it if it doesn't work for their use case."

This just isn't practical in most cases. I just don't have time to read the
source of every tool I use. No matter how much we strive for simplicity, the
fact of the matter is that nearly any software system that is useful these
days is going to be too big for every developer who uses it to read its
source. You have to accept that in a lot of cases, you're going to need to use
tools whose inner workings are obscure to you.

Simplicity in programming is great, but we passed the point of understanding
all the software we used a LONG time ago.

~~~
nulltype
The inner workings may be obscure, but I really like it when the outer
workings are not. I use the Google Datastore (although Postgres would work
here too), which I'm sure is super complicated internally. Externally though,
it has certain properties that form a fairly simple mental model.

With that mental model, you can predict from reading some code what the
possible error cases or race conditions could be, or what the state of the
datastore entity would be after running some code against it. Perhaps that's
not precisely "Software Complexity", but "Library Complexity" instead.

I was going to use Redis as an example, but the internal workings are probably
too easy to understand.

------
TheAndruu
100% agree with this, particularly on the part emphasized:

"An advanced programmer can create a program that is correct, but complex and
hard to understand. For the purpose of creating an accurate mental model,
<emphasis>even the program's correctness is of secondary importance: code that
is understandable can be fixed.</emphasis>"

~~~
douche
That's the old Kernighan quote[1]:

"Everyone knows that debugging is twice as hard as writing a program in the
first place. So if you're as clever as you can be when you write it, how will
you ever debug it?"

[1]
[https://en.wikiquote.org/wiki/Brian_Kernighan](https://en.wikiquote.org/wiki/Brian_Kernighan)

------
mwsherman
I think Go is optimized around this idea: that the average reader will have an
accurate mental model. (This of course implies trade-offs with other
considerations.)

