
Why code that never goes wrong can still be wrong - matt_d
http://www.pathsensitive.com/2018/01/the-three-levels-of-software-why-code.html
======
jasode
Instead of first digging into the author's _" Definition #3"_ and _" Level 3
Design/Logic"_, I would recommend reading the 1985 paper _" The Limits of
Correctness"_ by Bryan Cantwell Smith.[1]

I think it's one of the top 5 papers every programmer should read. It's a
short and easy read that makes one think about the limits of _models_ , the
limits of _specifications_ , and the limits of _formal verification_. After
grokking BCS's insights, there seems to be (an unintended) hubris in James
Koppel's statement, _" I now have two years experience teaching engineers a
better understanding of how to [...] make code future-proof_" in his framework
of removing defects from "Level 3" design.

(As a side note, there's also a cosmic irony as that paper starts with an
anecdote about the "correctness" of detecting ballistic missiles in 1960 given
the recent false alarm in Hawaii.)

[1]
[https://www.student.cs.uwaterloo.ca/~cs492/11public_html/p18...](https://www.student.cs.uwaterloo.ca/~cs492/11public_html/p18-smith.pdf)

~~~
zitterbewegung
The incident in Hawaii was user error and is a UI design problem.

~~~
jasode
_> a UI design problem._

When a military general or Secretary of Defense asks if the 1960s missile
warning system is working "correctly", it means it's working _incorrectly_
when the moon is categorized as an incoming enemy attack.

Likewise, when a general asks if the 2018 warning system is working
"correctly", it means the system is designed _incorrectly_ if the pixels of
text on the screen[1] lead operators to trigger the unintended action.

Categorizing one error as a "radar error" and another as a "UI error" as if
they are 2 disconnected concepts of correctness is missing the point of BCS's
essay.

When people think of something working "correctly", they want the whole system
to work correctly. They will not explicitly enumerate all the subcomponents
nor all the subdisciplines such as "UI design".

[1]
[https://twitter.com/CivilBeat/status/953127542050795520](https://twitter.com/CivilBeat/status/953127542050795520)

~~~
tritium
In this instance, the phrase " _code that never goes wrong_ " does not imply
recognition of human factors affecting an operational system.

~~~
alanbernstein
> _(Where “wrong result” is broadly defined to include performance bugs,
> usability bugs, etc.)_

The author seems to disagree - isn't bad design a usability bug?

------
psyc
If code is wrong, yet nothing goes wrong, it's miles ahead of almost all of
the software I battle with to get work done every day.

~~~
therealmarv
exactly. Also: Never touch a running system. I'm sure there is so many "wrong"
software out there but as long as it works good enough we are not doomed
hopefully.

~~~
Ace17
How much time does it have to run without failing to qualify for "safe" ?

------
mikekchar
I think this is a bit overthought. There are 2 classes of errors: Errors that
can manifest in a problem (we usually call these "bugs") and errors that can
not manifest in a problem (these are usually called "software errors"). The
second class of error is not really an issue until you change the code (or
some other factor) and the error manifests as a problem (and then it is a
bug). You might imagine some off-by-one error, where by chance it doesn't
affect the result. However, if you modify the code, then the off-by-one
suddenly matters and you get a bug.

Apart from that, software either has some functionality or it does not. It's
actually not particularly meaningful to discuss whether or not software was
originally supposed to have functionality, unless you just want to finger
point. There may be a bug that stops the functionality from working correctly
or it may be that you got the requirements wrong and implemented the wrong
thing, or it may be that you never even tried to implement something. It
doesn't matter -- the functionality doesn't exist.

When you are designing code, or implementing, or refactoring it, or whatever,
then you need to consider how easy it is to modify the result. It's tempting
to look ahead and assume that you know what you will need in the future, but
you should also consider that you might be wrong (The YAGNI (You Aren't Gunna
Need It) rule is surprisingly effective). Generally speaking, as a programmer,
your goal should be high throughput of development throughout the lifetime of
the project. Attempts to cut corners are usually repaid with interest later
down the road. Similarly YAGNI work not only ends up being wasted, it
complicates the code base and slows you down later. Because of this, you
should usually attempt to just keep the code as simple as possible to modify
in general. You should also make the functionality you have developed as
accessible as possible (i.e. hiding your functionality under 1000 layers of
abstraction is usually a recipe for slowing you down later, even if it looks
cool now).

Worrying overmuch about if your code is wrong is probably a class of YAGNI. As
long as it is easy and obvious to fix, then it doesn't really matter if it is
wrong. Concentrate on making sure that it produces the result that you want
and that you can modify it if it turns out to be wrong. Incidentally, it will
be wrong. Eventually. You can pretty much count on it.

~~~
TimJYoung
This is fine if the bug is _only_ triggered when you modify the code. Many
bugs that classify as #3 (from the article) are not _only_ triggered by
modifications to the code. For example, many lower-level languages have
runtimes with sub-allocators that grab chunks of memory from the OS and dole
them out in smaller bits to the calling code. This means that an off-by-one
error on the read of an array can cause absolutely zero problems when you run
the same binary 1000+ different ways, but have the user change the order of
operations in the application in one particular way and boom !, now you get an
AV/segfault in your application. It's all down to the how the OS MM and the
runtime sub-allocator work and what the user does (and in what order). The
solution is to make sure to use a sub-allocator that allows you to test your
application with rigorous allocation tracking, which will catch these types of
bugs, but not all developers are so thorough.

~~~
Retric
It's also possible for two off by one errors to cancel out. The code is not
correct as implemented, but you need to fix both locations to avoid noticeable
bugs.

------
lisper
Whenever anyone starts talking about formal proofs of correctness I like to
tell the story of the RAX bug:

[http://spinroot.com/spin/Doc/rax.pdf](http://spinroot.com/spin/Doc/rax.pdf)

This was code that ran on a $150M spacecraft, so getting it right really
mattered. It had a formal proof of correctness. We tested the living shit out
of it. And it _still_ failed.

~~~
brokenmachine
_> One of the errors found with SPIN, a missing critical section around a
conditional wait statement, was in fact reintroduced in a different subsystem
that was not verified in this first preflight effort. This error caused a real
deadlock in the RA during flight in space._

So they used code that they had already discovered an error in, in a different
subsystem. I would not contend that that means that software verification
doesn't work.

~~~
lisper
That sentence is not a completely accurate account of what happened. The error
was not in a "different subsystem", it was in the same subsystem as the
verified code: the executive. The code was structured as a verified kernel
with unverified application code written on top of it. The verification proved
that the code would be free of deadlocks if the application code used the
kernel operations exclusively, and did not directly use any primitives like
mutexes, but it turned out that this design rule was violated.

------
sirclueless
I don't think there's any particular reason to apply the label of "bug" to
erroneous reasoning that cannot result in erroneous behavior.

If you can't in actuality cause a flaw in a program despite the presence of
unsafe operations or faulty reasoning, then I would say that by definition
there is no bug. Rather I might say that the code is brittle and perhaps
unmaintainable, with risks for people who modify it later. Which is of course
bad, but thinking pragmatically there's not much different between a latent
pitfall like this and some poorly documented or gnarly spaghetti code that is
equivalently difficult to maintain.

Fixing what the author calls "Level 1" and "Level 2" problems is sort of a
mandatory imperative, as you can't realistically ship software with prominent
issues at these levels. But shipping software with latent "Level 3" problems
is something that happens every day, and it might be entirely correct to do
the hacky and expedient thing rather than the "correct" thing at this level --
this is not a luxury you can afford at other levels, which makes this level
fundamentally not a "bug".

That doesn't mean it's not worth thinking about erroneous reasoning. Erroneous
reasoning is often the source of actual bugs that slip through a test suite.
And if you're using loose reasoning and relying on assumptions holding true in
the future, this is the sort of thing you should be documenting next to the
code for the benefit of future maintainers. And indeed we should be working to
write more maintainable, safer software, but not under the guise of a "bugfix"
\-- this is just good software engineering practice.

Thinking about "Level 3" problems is good. But be pragmatic. The reason you're
doing it is because they make software difficult to reason about, and hence
maintain. Investing time into fixing these problems helps future maintenance,
but there are probably a lot of things you can do to improve software
maintainability and you do need to actually ship the software at some point.

~~~
loup-vaillant
> _I don 't think there's any particular reason to apply the label of "bug" to
> erroneous reasoning that cannot result in erroneous behavior._

Undefined behaviour. Can't go wrong now on current compilers with current
architectures, but who knows what will happen at the next rounds of
optimisations.

------
twic
The opening account of what it means to be right seems to me related to the
"justified true belief" model of knowledge:

[https://plato.stanford.edu/entries/knowledge-
analysis/#KnowJ...](https://plato.stanford.edu/entries/knowledge-
analysis/#KnowJustTrueBeli)

I wonder if that inspired it, or if the author just ended up heading in that
direction>

------
kulu2002
Good article. I think this is more relevant for Model based development and
auto code generators. In auto-generated code there usually exist may
unreachable paths and un-resolved conditions which only surface up when code
is run over certain period of time or some environmental conditions are met.
Visualising Level 3 as explained in article is easy in model based development
but it's consequences on Level 1 and Level 2 are difficult to predict. These
things are of topmost priority in safety critical systems. E.g. [1] Toyota's
unintended acceleration case [1] [https://embeddedgurus.com/barr-
code/2013/10/an-update-on-toy...](https://embeddedgurus.com/barr-
code/2013/10/an-update-on-toyota-and-unintended-acceleration/)

------
hinkley
"Just because it works, doesn't mean it isn't broken."

~~~
unwind
As a frequent poster of C answers on Stack Overflow, this is a succinct way of
expressing the realities of programming in a language with undefined behavior
("UB").

Many newcomers to C seem to think that just because they wrote something, and
it compiled, and it had the expected output, they wrote a correct program. But
since there can be UB which results in _any_ (hence undefined) behavior,
including whatever the person expected, that really isn't true. Unfortunately
that seems to go against many people's expectations/intuition. :/

~~~
MaxBarraclough
A valid pet-peeve.

<ramble>

Conflating a working build for good code, isn't specific to C. Students being
taught Java (or any other safe language with an explicit
verification/compilation phase) can also be tempted to assume that just
because their code finally compiles ok, then it must be the program they hoped
to write.

Then there's the issue of the code working for one or two particular
combinations of inputs, but not all. Again, failure to test well is not
specific to C.

There is the possibility that the UB only rears its ugly head in very rare
circumstances, but again, obscure bugs slipping past testing isn't specific to
C.

Where C is unique is that, as you say, even where the resulting binary
_always_ gives the right outputs (on your current compiler+platform), that
doesn't mean you have a 'truly correct' C program, even assuming that platform
independence is a non-goal.

To put it another way, exhaustively demonstrating the correctness of the
resulting binary, doesn't prove the correctness of the C program. This isn't
the case for all other languages, however - sometimes exhaustive testing of
the binary _can_ prove program-correctness.

(At least, if we ignore non-deterministic runtime factors like threads'
scheduling and RNGs. One could concoct a deterministic subset of Java that
would have this property.)

Regarding multi-platform code: It's not just UB that gives C the ability to
behave so differently between different compilers/platforms. If you switch
platform and are suddenly subjected to a new width of unsigned int, your code
might behave differently in this new environment (it might fail to wrap-around
when it used to, say) even if it never invokes UB.

</ramble>

> that seems to go against many people's expectations/intuition

In an ideal world we'd take the edge off by requiring all new C programmers to
use something akin to Clang's 'ubsan' runtime UB detector.

------
PaulAJ
The trouble with formal specification is that for a lot of systems the
specification is a substantial fraction of the length of the program it is
meant to be specifying. In fact if you code in a high level language like
Haskell it is actually possible for the code and specification to be the same
length. At this point finding errors in the specification is just as difficult
as finding errors in your code by inspection.

What it comes down to is that a formal specification is a program we don't
know how to compile.

The exceptions to this are specifications that are non-constructive; they
define an allowable behavior without defining how the result is computed. An
example would be a specification for TCP that specifies that the data must be
presented to the receiving program in the same order it was sent; the behavior
is simple but the underlying mechanisms are complicated. However this is often
not what the specification looks like. More usually its like the specification
of a bank transaction, which says that afterwards the sending account must be
debited and the receiving account credited. Well gee!

The real problem is that the "correct" behavior of the program is complex and
contingent, and making sure you have covered all of the issues during
requirements gathering is very difficult. And you can't put a formal
specification in front of the customer and say "Is this what you want it to
do?" because they can't read it, any more than they can read source code.

------
kazinator
A program is wrong if it doesn't meet its explicit requirements, plus certain
common implicit requirements for good engineering (unless they are explicitly
waived by the former).

An example of an unwritten requirement might be, say, that the word processing
program doesn't violate the user's privacy or security. That might not be on
the list of functional requirements for a word processor, but would be
implicit.

The problem is that requirements can be deficient in some ways. The program is
the most visible deliverable related to the requirements and so it takes the
blame for requirement problems.

Requirements can be outright wrong and that's when they are 1) contradictory
in some way (there exists a subset of the requirements which cannot all be
simultaneously implemented) or 2) unclear: they have multiple different
interpretations (unintentionally) and such.

When requirements are clear consistent, then they are subject to opinion:
someone would like the requirements to be more complete. Or for some of them
to be something else entirely. This turns into a criticism of the program: why
isn't it that way, rather than this way. People blur the boundary between this
kind of criticism and a criticism of correctness. "How can this be a correct
word processor; it has only three levels of undo?". (But the requirement
specification was written such that it calls for three levels of undo; how can
the program not be correct on grounds of that requirement?)

------
donatj
I went on a similarly themed rant [1] a while ago when talking about why
static analysis is important, mostly out of frustration that my coworkers seem
to think it's largely unnecessary.

Trying to get people to listen makes you feel like Cassandra. No one listens
even though you're right. lol

[1]
[https://donatstudios.com/StaticAnalysis](https://donatstudios.com/StaticAnalysis)

------
skybrian
This article makes some useful and clear distinctions, but it's worth pointing
out that the goal of level 3 (guaranteed compatibility with new versions) is
impossible even in principle when most libraries and environments don't have
formal specifications that they guarantee they will stick to in future
versions.

For example, there's no guarantee a web app will work in a new browser or even
a new version of a browser. Browser vendors try to avoid breakage for _most_
apps but not necessarily _your_ app. They will make backward-incompatible
changes to rarely used API's.

A web app usually will work with a new browser, but this is an argument based
on statistics, not logic. There is good reason to run automated level 1 tests
against beta versions of browsers, to get an early warning of breakage.

Reasoning based on API specifications has a lot in common with mock-heavy
testing. It can sometimes find logic bugs that are your own fault, but it's no
good at finding breakage that is not your fault (but may still be your
responsibility).

------
virgilp
There are in fact more levels - the definition of "level 3" is hand-wavy
enough that it may capture everything, but there surely is at least a level 4,
illustrated by the recent Hawaii missile alert: the software worked "as
designed", but if it's easy to misuse it, can you actually argue that it was
"correct"? Somewhat different (and ironically, illustrated in the article
itself) are the very-reasonable-but-wrong assumptions about the environment:
Take the case of Microsoft Windows & SimCity - who fixed the bug? Microsoft
did... for all intents and purposes, even if in a technical/theoretical sense
the Windows memory manager was "correct" \- in a very practical sense, it was
buggy and needed to be fixed. This occurs in practice far more frequently than
we are willing to admit.

~~~
neo2006
UI design is a different disciplines than software design, the Hawaii missile
alert issue was a UI issue the software presenting the UI worked as intended.
The issue is that we consider UI design as software and we perform UI testing
(which is not even testing the UI but the software underneath) and we say
everything is good we are ready for prod.

~~~
virgilp
It's really just an easy-to-understand symptom of the general problem: if you
write/design software that is easy to misuse, that's really a faulty software.
If you want a different example - I think Tony Hoare's "billion-dolar mistake"
statement is an admission of the fact that 'null' can really be interpreted as
a language design bug. Technically speaking, a language design can't be buggy
if it's coherent - it just "is". But in practical, everyday terms - the fact
that null leads to lots of problems, very often, could/should be considered a
"language design bug".

That was kinda' my point. The article claims that "bug" is not just "coding
errors", and I agree - and furthermore, I say that "design considerations are
still bugs by that definition"

~~~
neo2006
I don't agree with the author, IMO bugs are coding error, the crash or the bad
behaviour coming out of it is just a symptom. Not having any symptom does not
mean that I do not have any defect/disease, with the appropriate set of tests
the doctor could find that I have a disease but with no symptoms. To come back
to the auth example on simcity, the software had a coding error and the proof
of it is that when the symptoms were visible (with the windows 3.1 malloc
version) we needed a change in the code to fix the error. The other only
category that do not fell under coding error for me is design/architecture
error and Hawaii issue fell into that category and it's not bug because it do
not prevent us from use the software as intendant it just make the user more
error prone or not able to add feature easily or wait too long or worst, no
symptom at all until you want to modify expand the software.

~~~
kulu2002
NFRs or Non Functional Requirements are root cause of bugs in most of the
cases. Functional requirements are easy to meet. It is NFR where our
assumptions come into picture. As an example consider selecting improper data
type to represent something. When this datatype fails to accommodate given
quantity/ overflows creates the bug. But this will not happen immediately.
Even such issues can easily escape rigorous testing. Check 911 outage case for
that matter [1]. I think it was more than just a coding error - A bad design
decision. [1] [https://www.theverge.com/2014/10/20/7014705/coding-
error-911...](https://www.theverge.com/2014/10/20/7014705/coding-
error-911-fcc-washington)

~~~
virgilp
> Functional requirements are easy to meet.

That really depends. Coding against specs is just like walking on water: easy,
if frozen :)

~~~
kulu2002
Yeah .. I mean apart from 'Functionality' these are to be taken into
consideration; [https://msdn.microsoft.com/en-
in/library/ee658094.aspx](https://msdn.microsoft.com/en-
in/library/ee658094.aspx)

But again ... You try to make system foolproof and nature will create better
fools :-D

------
debt
This is kind of my argument to champion logical robustness over design. Design
feels good and logical robustness is hard and difficult.

But the latter can be achieved and some cases the design must be discarded in
some ways to achieve it.

I took a grad course about formal verification. Basically, there are ways to
mathematically ensure a program will never go wrong.

[http://old-www.cs.dartmouth.edu/~cs50/data/tse/wikipedia/wik...](http://old-
www.cs.dartmouth.edu/~cs50/data/tse/wikipedia/wiki/Formal_verification.html)

------
djinnandtonic
I would still argue level three does not exist - future proofing is futile
over the short or long term, and only maintainance is the solution. Good
article otherwise though!

~~~
sirclueless
This doesn't really work as a blanket statement. Future proofing and trying to
guarantee bug-free code is worth something -- for example if I'm building a
space shuttle to go to the moon, I want some degree of certainty that software
bugs will not cause that mission to fail, and I might be willing to spend a
lot of time and money to ensure that doesn't happen.

I think a more accurate statement of this sentiment is that investing time and
effort into preventing future bugs and problems stops being worth it after a
certain point (and perhaps earlier than many software engineers would think).

~~~
boilerupnc
Interesting read about the level of effort and process invested in the Space
Shuttle's software (~1996): [https://www.fastcompany.com/28121/they-write-
right-stuff](https://www.fastcompany.com/28121/they-write-right-stuff). For a
TLDR, this answer ([https://space.stackexchange.com/questions/9260/how-often-
if-...](https://space.stackexchange.com/questions/9260/how-often-if-ever-was-
software-updated-in-the-shuttle-orbiter#answer-9271)) is great.

My favorite excerpt: The Shuttle software consists of ca. 420,000 lines. The
total bug count hovers around 1. At one point around 1996, they built 11
versions of the code with a total of 17 bugs.

------
known
Designed to fail?

