
How Is Critical Life or Death Software Tested? - pablobaz
http://motherboard.vice.com/en_uk/read/how-is-critical-life-or-death-software-tested
======
acomjean
I wrote software for Radars. Kind of important (not like plane software). We
used Ada alot, which in my estimation helped. Software was reviewed. Tests
were reviewed. Reliability was favored over other things (for example
recursion was discouraged). We used Ada's constrained types (this value is
between 1 and 99, if it goes out of range, throw an exception).

For external hardware inputs, we had software simulating the messages when we
couldn't get the real equipment in the lab. We were writing software for a
radar that didn't yet exist.

And it was tested and tested and tested and tested.....

The when the pieces came together in integration. They tested and tested. An
when they built the physical radar, the software worked. A few configuration
things here and there needed adjusting.

Actually testing was built into the software. When it came up it would talk to
the physical parts to make sure everything was communicating ok before it
could start running.

Was it bug free? Probably not. You can't test every possible scenario. But we
took the work seriously. It took longer to get code done than any other
environment I've been in, but the code ended up very solid quality.

~~~
outworlder
> Actually testing was built into the software. When it came up it would talk
> to the physical parts to make sure everything was communicating ok before it
> could start running.

I've been wondering about this for a while. We tend to run unit tests,
integration tests, whatever tests, while the software is in development.
However, once it is in "production" (for whatever definition of production),
usually no tests are performed. At most, there's some sort of metrics and
server monitoring, but nothing in the actual software.

It's a work of fiction, but Star Trek has these "diagnostics" that are run,
with several levels of correctness checks, for pretty much everything. In your
typical app, it could be useful to ask it to "run diagnostics", to see if
everything was still performing as expected.

~~~
dugmartin
Early in my career I worked on military flight data recorders, including the
development of the software for the F-22's "black box". Those systems have
SBIT, IBIT, PBIT and MBIT sub-systems were BIT is "built in test" and S =
startup, I = initiated, P = periodic and M = maintenance. I remember making
the Star Trek diagnostic joke myself when I was assigned the SBIT work.

Each BIT does varying level of testing based on it's runtime budget but there
are a lot of very basic tests that don't make much sense until you see your
first field report of some register bit "sticking". Its much better to ground
a plane that can't add 1+1 than to find that out in mid-flight.

~~~
derekp7
What types of errors would cause one of the tests to fail? Is it mostly
testing for hardware errors, or are there any software logic errors that could
make it to production, but be caught by one of the tests several months down
the road? The only software related items I can think of are edge cases where
a built in test is based on real time input. Kind of like running the
calculations through multiple independent implementations of the software.

~~~
npatrick04
There's actually a decent probability of memory corruption in space
applications due to radiation. So in addition to checking communication across
busses, application checksums are typically run continuously.

------
airza
My experience with testing avionics controllers was: Everyone seemed to have
the correct idea (that bugs were basically Not Allowed.) The company set up
enough testing so that those bugs were eventually eliminated. However, the
difference between good projects and bad projects was mostly the amount of
time and money that this took.

My favorite was one where we had an entire test harness written in Python that
could completely control the operation of the device being tested in a way
that resembled human input. Code was first written in Ada against a monolithic
requirements document, while testers wrote their standard test cases against
the same document. After the exhaustive amount of testing that took place by
developers, testers themselves had the freedom to create contrived test cases
that might have escaped the attention of devs (What if we just turn the
machine on and off 10 times, because why not?).

This had the advantages of a formal software process as well as the ability to
exploit human creativity. It also led to me losing a bet that a doppler radar
can't be fooled with an empty potato chip bag.

~~~
nchelluri
Care to elaborate on the potato chip bag story? Sounds intriguing.

~~~
mbrameld
My wild guess loosely based on my time spent writing software to analyze
observed radar signals: I believe the parent probably opened the bag along the
seams to get one flat piece of reflective (on the inside of the bag) material.
I know weather balloons are often reflective so they can be tracked by radar.
I'm guessing a full-size potato chip bag opened flat would be big enough for
the radar to see, couldn't tell you if that's what the parent meant by fooling
the radar or if something else was done to the bag to make the radar interpret
it as an aircraft.

~~~
airza
This is pretty much it. My boss was definitely a level 99 duct tape
programmer, and this was one of the many examples of his ingenuity.

------
Aleman360
Worked on anesthesia machines for a few years. Since both hardware and
software is involved, the testing was quite extensive.

* Lots of manual testing. While we did unit testing and some automated integration testing, most defects were found using exhaustive manual testing by trained engineers.

* Randomized UI testing. Used UI automation to exercise the UI with various physical configurations of the system. Would often run this overnight on many systems and analyze failures every day.

* Extensive hazard analysis. Basically, we wrote down everything that could possibly go wrong with the system (including things like gamma radiation), estimated the likelihood and harm, and then listed mitigations. The entire system could run safely even if there was full power failure. "Fail safe"

* Detailed software specifications, each of which was linked to manual test cases. Test cases were signed off when executed.

* Animal testing for validation. We went to a vet school and put a bunch of dogs under and brushed their teeth.

* Limited release for production. We would launch the system at one or two hospitals and monitor it for a few weeks before broader release.

~~~
manarth
You just have to look at Therac 25 for the risks of relying on software
interlocks alone. One of the prevailing pieces of feedback was the lack of
_hardware_ interlocks - whether that's possible on anesthesia machines I don't
know…but the prevailing wisdom is to use/include hardware interlocks wherever
that's feasible, for any critical life-supporting equipment.

~~~
Aleman360
Not sure what you mean by "interlocks", but the hardware was quite
distributed. Each critical component had its own board and industrial
microcontroller. And we had various levels of watchdogs keeping track of
system health at all times.

~~~
david-given
Interlocks are usually fairly crude safety measures, normally in hardware, to
make sure that particular combinations of events cannot happen.

The Therac-25 is a famous comp.risks cautionary tale. Among the many, many
design misfeatures (if you haven't come across it, it's worth a read) was the
one that killed people:

It was capable of providing two kinds of radiation therapy; electron beam
radiation and X-ray radiation. It worked by having an electron beam generator
which could be operated at either high power or lower power. Low power was
used directly. High power was only used to irradiate a tungsten target which
produced X-rays. (I'm simplifying here.)

You can probably guess what went wrong; people were directly exposed to the
high power electron beam. Several of them died.

The obvious interlock here (which apparently previous versions had) was to
have a mechanical switch which would only enable the high-power beam when the
tungten target was rotated into place. No target, no high power. Simple and
relatively foolproof (although it's possible for interlocks to go wrong too).

~~~
pdkl95
Really, every engineer should read the report[1] from the Therac-25
investigation. I would hope anybody that is working on anything that could be
potentially dangerous has _already_ read it.

There problems in the Therac-25 went a lot further than just the bad design of
the target-selection, which _had_ an (badly designed) interlock. It checked
that the rotating beam target was in the correct position to match the
high/low power setting (and NOT the 3rd "light window" position without _any_
beam attenuator).

While many design choices contributed to the machine's problems, you could
probably say that two big design failures lead to the deaths associated with
Therac-25. One was this interlock, which _failed_ if you didn't put it in
place (there was no locking mechanism, either, just a friction stopper). If
the target was turned slightly, the 3 micro-switches would sense the wrong
pattern (bit shift)... which was pattern for one of the OTHER positions.

There was also a race condition in the software that would turn on the beam at
a power _MUCH_ higher than it is ever used. This race was only triggered when
you typed in the treatment settings _very quickly_ , which is why the
manufacturer denied there was a problem: when they tried to recreate the bug
by carefully - that is, very _slowly_ \- following the reported conditions, it
never failed.

Therac-25 is an incredibly powerful lesson in what we mean by "Fail Safe", and
why it is absolutely necessary to have _defense in depth_. Fixing the target
wouldn't have fixed the race condition power-level bug. Fixing any of the
software wouldn't have fixed the bad target design that could be turned out of
alignment. Oh, and they had a radiation sensor on the target (which could shut
off the machine as _another_ independent layer of defense... but they mounted
it on the turnable target, so the micro-switch problem allowed the sensor to
be moved away from the beam path.

The really telling thing, though, is how the _previous_ model acted. It was
not software controlled, and was an old-style electromechanical device. It
turns out the micro-switch problem existed there as well (among other
problems)... and it would _blow fuses_ regularly. Which was yet another layer
of safety. It turns out that when they upgraded it to a software-based control
system, they got cheap and took out all those "unnecessary" hardware
interlocks and "redundant" features. There is a lot of blame to go around, but
this is where I put most of the responsibility. You never assume one (or even
a few) safety feature will work - the good engineer assumes it will all break
at any moment, and makes sure that it will _still Fail Safe._

> (although it's possible for interlocks to go wrong too)

If there is one lesson to learn from the Therac-25, this was it. Things break,
mistakes happen, and when you're building a device that shoots high-energy
x-rays at people, you need to assume that everything _did_ go wrong, and make
sure the rest of the device can safely handle that situation.

[1]
[http://sunnyday.mit.edu/papers/therac.pdf](http://sunnyday.mit.edu/papers/therac.pdf)

~~~
manarth
_" The really telling thing, though, is how the previous model acted…it would
blow fuses regularly"_

Good. When a fuse blows, it shows something is wrong, and needs fixing.
Replacing the fuse with a nail or something else that doesn't blow is a sure-
fire way to set the thing on fire. Bad enough for a desk-lamp, a little worse
for radiotherapy machine.

Sounds like people were irritated by fuses blowing, and decided to simply
short-circuit the fuses instead.

~~~
pdkl95
The people _using_ the machine a the hospital would replace the (expensive)
fuses when they blew. It was the _manufacturer_ that made the later model (the
Therac-25) that didn't have the fuses (and other "old" hardware features).

Obviously, something was still very wrong. User error (or other bugs? I'm not
sure) in the older hardware and the infamous race condition in the software-
controlled Therac-25 was causing the beam to turn on some shockingly high
amount of power. The better design of the older models _saved people 's lives_
by simply blowing fuses when the power went too high.

You could, perhaps, blame the poor communication between the hospitals and the
manufacturer, because the fuse problem _should_ have cause a bit of a panic
among the engineer who designed the machine.

------
cesarbs
> As for the code itself, its perfection came as the result of basically the
> opposite of every trope normally assigned to "coder." Creativity in the
> shuttle group was discouraged; shifts were nine-to-five; code hotshots and
> superstars were not tolerated; over half of the team consisted of women;
> debugging barely existed because mistakes were the rarest of occurrences.
> Programming was the product not of coders and engineers, but of the Process.

Serious question: where do I get a job like this? It's my dream way of
programming professionally.

~~~
acomjean
Large corporations with government contracts have this kind of work. Probably
financial institutions too.

The coding proces is slower than most people are used too and can become
frustrating.

~~~
DannoHung
Financial institutions do not necessarily do things this way. Some parts
might, but I don't have any experience with them. The parts I _do_ have
experience with it is utterly a miracle that anything works.

~~~
noir_lord
> The parts I do have experience with it is utterly a miracle that anything
> works.

Is my feeling about everything industry I've worked in.

The stuff that runs telecoms (mostly billing side) particularly is the stuff
of nightmares.

~~~
aswanson
Seriously? _Billing_ code? I would have imagined that code would be so subject
to customer complaint that it would be forced into quality.

~~~
gtirloni
I've worked in a support team for a telecom billing system and I was tasked
with interacting with our development team to investigate bugs in production
(and eventually moved to the development team). These systems are created just
like any other commercial system, without any formal proofs and minimum
requirement docs. To make things worse, they have to be flexible enough to
support all and any billing plans that the business might come up with, so
there is a lot of moving parts.

As other people have said here, nobody wants to touch it. Developers would
often limit themselves to fix just a small portion of the code even though
they thought the overall system could be improved in many ways, for fear of
breaking something, causing a few million dollars of damage and getting fired.
There was no assurance that any part of the systems should work like this or
that.. only some vague expectations.

You're right, that would be a systems that should be built from scratch with
that kind of concern but unfortunately it's not.

~~~
annnnd
It is a nice challenge how to rewrite such a system. One way would be to build
a parallel system, identify all inputs and duplicate them to the parallel
solution and then compare the outputs; in case of discrepancies fix the
erroneous system. Once the systems produce same results (or once the new
system produces better results than the old one) you just switch the systems.

The rewrite doesn't have to be complete; it can (should) be done in pieces of
course.

~~~
noir_lord
"Replace in place" is fairly common.

The big issues is duplicating the system while it is still morphing in
production for all the edge cases, it often feels like trying to paint a
moving bus.

------
bargl
I'm currently writing software that is non-critical for satellites. It's non-
critical in the manner that if we get things wrong our company will lose
millions of dollars but the satellite won't burn any resources that it can't
get back.

We are currently porting code over from C++ to a C# system with parallel
computation. The current system has been flying for a long time but has no
testing and is tied to a bad UI. So we are re-writing.

That said, accuracy is number one. We have a pretty solid method for testing
so far. We know have some robust input scenarios and we know that we want to
get a specific output. So we are able to do fairly robust automated
"regression" testing. If the numbers don't match then we have an issue, and we
have to fix that before moving on.

After every validation that the new code gives us acceptable margins of error
we wrap it up with unit tests so that we can then modify the code to try to
optimize. Our testing is integrated from the highest level of the code down (I
know backwards) but that's how we know we can validate the input.

We have a lot of testing and a long schedule. If this weren't critical
software we'd have a much shorter turn around on what we are writing. We also
work very closely with subject matter experts on every change we make. We have
a guy who's been working with this software (and the subsequent theory) for 20
years. He's open to change, but he also validates everything so we don't
accidentally change the output when we're optimizing.

~~~
david-given
C# is an odd choice for avionics software, surely? It's got a really
heavyweight runtime and nondeterministic behaviour due to the garbage
collector. I assume this is the satellite's application layer and isn't real
time?

...and what are you running it on? I wasn't aware of any embedded operating
systems which supported C#!

~~~
snops
I'm guessing its actually running on a ground station, providing commands or
analysing data from satellites, hence why its non-critical and the need has
arisen for parallel computation, presumably to speed it up.

Many ARM embedded systems can support C# with the open source .net micro
framework ([http://www.netmf.com](http://www.netmf.com)), which doesnt require
any OS, and was originally developed for Microsoft's SPOT watch. I haven't
used it myself, and I agree with you that it doesn't sound, ideal at first for
realtime apps, but none or soft realtime embedded applications are common too.

~~~
bargl
Yeah it's a ground station application. So it isn't running on any embedded
hardware. We'd have done something different for that.

Performance isn't the only thing we are considering. We want to get improved
performance, but our old analysis code was extreemly hard to maintain, so that
went into the decision as well. Honestly, I'd probably pick another language,
but I wasn't on the project when it started.

I personally would have liked to do this with F# because of how functional it
is at it's core, but that's cuz we have a lot of Microsoft expertise in house.

Also one thing with the engineering apps is that anything where engineers (not
software engieners) don't have to learn a new language is going to be an
easier sell.

------
henrik_w
The story of the Boeing engineers flying on the test flights is a perfect
example of "skin in the game" (from Antifragile by Nassim Nicholas Taleb).

Here's what I wrote about that in a blog post on Antifragility and SW
development:

At the end of the book, there is a chapter on ethics that Taleb calls “skin in
the game”. To have skin in the game, you should share both in the upside and
downside. Taleb quotes the 3,800 year old Hammurabi’s code: “If a builder
builds a house and the house collapses and causes the death of the owner – the
builder shall be put to death”. It is interesting to view this from a software
development perspective. I have never worked on software where people’s lives
were in danger if the software failed, but I would not be willing to submit to
Hammurabi’s code if I did. But I think a little less extreme form of skin in
the game is actually very good. Being on call for example. If the software you
wrote fails, you may get called in the middle of the night to help fix it. I
have been on call at most of the places I have worked in the past, and I think
it has a lot of benefits. It gives you an incentive to be very thorough, both
in development and testing. It also forces you to make the software debuggable
– otherwise you yourself will suffer. Another way of introducing skin in the
game is dog-fooding – using the software you are developing in your daily
work. I have never worked on software that we have been able to dog-food, but
I think that is another great practice.

[http://henrikwarne.com/2014/06/08/antifragility-and-
software...](http://henrikwarne.com/2014/06/08/antifragility-and-software-
development/)

~~~
dbdr
The examples you are giving are all about sharing the downside. What are the
options for the upside?

------
markbnj
It's essentially an analog of the concept of tolerance in the physical world
of manufacturing and assembly. The less your tolerance for error the more
formal and carefully controlled the process, and the more money spent in
testing, verification, feedback, and improvement.

And yet you can still measure one value in the metric system and another in
English units and drill a smoking hole in Mars. It was sort of striking to
read Charles Fishman's statement about the software being bug free, followed
immediately by the supporting fact that the last three versions had one bug
each. If they had one bug, how are you 100% sure they didn't have two?

~~~
pwnna
> It was sort of striking to read Charles Fishman's statement about the
> software being bug free, followed immediately by the supporting fact that
> the last three versions had one bug each. If they had one bug, how are you
> 100% sure they didn't have two?

I think you're being overcritical here. It's really really really difficult to
reach 100% in any real form of measurement. So I think when they mean "bug
free", they probably mean the chance of a bug is below some threshold of
probability. The famous six sigma rule comes in mind.

I do grant you that this is not specified explicitly in the article, but they
do say: "each 420,000 lines long-had just one error each. The last 11 versions
of this software had a total of 17 errors. Commercial programs of equivalent
complexity would have 5,000 errors."

~~~
markbnj
>> So I think when they mean "bug free", they probably mean the chance of a
bug is below some threshold of probability. The famous six sigma rule comes in
mind.

No doubt, and that was in fact my point. I think most of us would be very
reluctant to use the phrase "bug free," and in this case his statement was
obviously meant to be in stark contrast to that reluctance.

~~~
pwnna
I think this is more of a technicality that's very subjective. I feel that
"no" depends on the probability of something happening as well as how
catastrophic it is.

For example, after buying a lottery ticket, I can pretty comfortably say that
I'm not going to win. Sure, there is a chance, but is it really going to
happen? No.

As another example, if I put down a waterbottle on my bike seat, there's a 1%
chance of the water spilling onto the sidewalk. At this point, I'm also
comfortable saying that it's not going to happen. However, if the waterbottle
has a 1% chance of exploding when I put it down, I don't think I'll be
comfortable saying that it won't happen. The chance of that must be much much
lower before I can accept it.

I would like to give the person who said that the benefit of the doubt. I'm
sure they understand the implication of bug free, but it's likely just easier
to say that rather than explaining stats/tradeoffs to a journalist.

------
zyxley
For people interested in the idea of formal verification, you may want to look
at TLA+
([https://en.wikipedia.org/wiki/TLA%2B](https://en.wikipedia.org/wiki/TLA%2B))
and PlusCal
([https://en.wikipedia.org/wiki/PlusCal](https://en.wikipedia.org/wiki/PlusCal)),
which have been mentioned on HN before.

They're a specialty system for writing code (and mathematical proofs) where
every possible system behavior for a given range of inputs can be examined for
safety (outputs within allowed ranges with no unexpected behavior) and for
liveliness (the expected progression from one output to another).

------
ef4
It's all just a question of cost.

We know how to write software that comes arbitrary close to perfection. But as
defects asymptotically approach zero, cost skyrockets.

The interesting question is what technologies can bend that cost/quality
curve.

~~~
jerf
This is why this discussion sometimes frustrates me. A lot of the defects we
have are because you aren't willing to pay for the sort of software that
wouldn't have defects. It's natural to read that as a sort of cynical
accusation, but instead, I mean it straight... you really aren't willing to
pay what it would take, and you shouldn't be. A $1000 Facebook-access app for
your phone (that still somehow has some sort of economies of scale going for
it, but that's another discussion) might not crash on you and might take a lot
fewer phone resources, but there's no way it's going to be $1000 better than
what we get today for free for the vast bulk of users.

On the flip side, the cavalier attitude developers who are on the very, very
low side of the curve, where huge quality improvements can be obtained for
very cheap, towards those cheap practices also frustrates me. What do you mean
you can't be bothered to run the automated test suite that I have handed you
on a silver platter on your build server? Are you serious? How can you pass up
something so cheap, yet so powerful? And I don't just mean, how can you not be
bothered to set it up, etc... I mean, as a professional engineer, how can you
justify that?

~~~
manarth
"What do you mean you can't be bothered to run the automated test suite that I
have handed you on a silver platter on your build server?"

As devil's advocate, why not just run this for me (e.g. on every commit/every
push)? Much like the web usability ethos "Why make me think" \- why make me
work? The lower the barrier to testing - ideally zero, it just happens without
the dev having to do anything - the more testing will happen.

I don't often get the chance to set things up this way, but when I do, each
dev works in their own git branch, and sends a pull-request with their
changes. The test server(s) then run the complete test-suite on the branch,
and either note the PR with "Tests passed" or emails the dev with "Tests
failed" and the reasons. Devs don't need to think about running tests,
reviewers/release managers don't need to even consider PRs until the "Tests
passed" message shows up…saves time and effort for everyone, and improves code
quality. The cost is simply the initial setup time.

~~~
jerf
"As devil's advocate, why not just run this for me (e.g. on every commit/every
push)?"

In my real-life experience with a multiple-team environment, which is where
the question came from, _my_ running the tests doesn't do any good if you're
going to consider it "my" server and simply _ignore the results_.

The key point here isn't a technical one. The key point here was, as an
_professional engineer_ , how can you justify not taking such a great
bang/buck quality option that will far, far more than pay for itself?
Explaining how I can be a professional engineer on your behalf more than
misses the point.

------
brudgers
No discussion of mission critical software is complete without mentioning
Margaret Hamilton:

[https://en.wikipedia.org/wiki/Margaret_Hamilton_%28scientist...](https://en.wikipedia.org/wiki/Margaret_Hamilton_%28scientist%29)

------
istvan__
The article bit misleading, almost suggests that software correctness is
achieved by testing, while this is definitely not true. Just for the record,
none of the other technical fields using testing for safety critical
engineering products as the main solution to ensure safety. This is not too
different for software components in mission critical use cases. There are
several ways to build a reliable system out of unreliable parts (like
combining 3 different units by the 2 out of 3 principle, etc.) but testing is
just not the way. It is a nice to have thing.

------
coderjames
I think the article might be mistaken about one point: "For one thing, the
Boeing approach is going out of style or has mostly gone out of style,
according to SE poster Uri Dekel (handle: Uri), a Google software engineer."

It absolutely has not gone out of style in avionics software engineering. As a
person who writes software for avionics, I can say that extensive design
reviews at every step combined with rigorous testing is exactly how we build
software. That's how its done at every avionics software company I've ever
worked at (3 so far). Formal methods are generally still too cutting-edge and
complicated for many people in this industry.

So maybe Google doesn't bother with design reviews, but those of us writing
life-or-death software definitely do.

~~~
SnacksOnAPlane
He meant the practice of sending the software engineers up on the first
flight.

------
jarrettch
I've been wondering about this lately. My stepdad has heart failure and they
put a heart pump in his chest. It regulates blood flow and settings can be
changed, etc. Leave it to a dev to think "How much testing has gone into this
thing?" Even one minor slip-up in his blood flow, either too high or too low,
could mean a stroke and possibly death. Or, God forbid, the thing crashes
somehow and stops working.

That machine is much easier to code than a space shuttle obviously, but I
still wondered about it. The tech has to be rock solid. Even one malfunction
could cause so much despair in a family, and could also cost your company
millions.

~~~
mrsteveman1
I don't know much about heart pumps, but I do have an implanted
pacer/defibrillator, and I'm currently not too happy with the people who made
it.

I've been told by my cardiologist (and engineers working for the manufacturer)
that it doesn't "fail safe" if the battery level drops too low to keep the
device running (which is inevitable if it isn't replaced after 7-8 years, but
it can and sometimes does happen prematurely and without warning).

In that situation, not only can it suddenly become unable to correct an
arrhythmia (as expected), it could actually _cause one_ all by itself, or pace
above 200BPM for no reason.

No one I've talked to in the healthcare industry seems at all surprised about
this for some reason. They just started monitoring it more often the closer it
got to the "replace me now" indicator level.

------
alkonaut
I think in the "normal" software industry we have a skewed picture of quality,
simply because it's not the primary focus. For everyday software, it's fine if
it works to 95%. Even bugs you _have identified_ must be weighed against new
features, and features often win. For the customer it's better to have a piece
of software that has all the features they need, but calculates a bad result
every 1/1000 times, or crashes a couple of times per day, than a program that
doesn't have all the features. It's also the "release early release often"
thing where the cheapest testers are your end users. That is far from the
"release once patch never" of rockets. So feature bloat and suffering quality
isn't really due to bad practices, it's an active choice.

------
VLM
Could have at least mentioned DO-178C

[http://en.wikipedia.org/wiki/DO-178C](http://en.wikipedia.org/wiki/DO-178C)

------
x0x0
I worked on an emr.

Each developer got between 1/2 to 3/2 qa people, in addition to dedicated qa
engineers. You had to submit detailed test plans for features. Surprisingly,
the company didn't expend much effort on unit tests -- they were there, but
not heavily emphasized.

------
amelius
If Amazon can do it, then surely an airplane manufacturer can do it [1]

[1] [http://cacm.acm.org/magazines/2015/4/184701-how-amazon-
web-s...](http://cacm.acm.org/magazines/2015/4/184701-how-amazon-web-services-
uses-formal-methods/fulltext)

~~~
packetslave
that Amazon paper is talking about design/algorithm verification using formal
methods. This does NOT verify that the design or algorithm in question is
actually implemented correctly and matches the design perfectly.

~~~
lambdaelite
And as I recall, TLA+ doesn't tie into code generation which doesn't help with
the verification process.

------
mariopt
It depends on the software budget and managers. When I worked a company one
the CEO's had an idea about adding a camera to road semaphores so that cars
wouldn't stop is the road is empty. I asked: What happens if the sun light
hits the camera too much? The guy laughed in my face and told I was being
ridiculous. I left the company some months later for other reasons but It was
pretty scary to me to hear such words at that time. If you're writing software
that might cost someone else's life, the budget shouldn't be a limitation nor
time. I'm not aware of any software security regulation, If I'm going to
release critical software, Does some entity exists to validates my code? This
kind of stuff should exist to block companies that only see the profit of
getting a contract.

~~~
manarth
Genuinely, in road safety, the company typically considers and weighs the cost
of fixing an issue, vs the cost of lawsuits in the event of death/injury.

One paper on the topic talks about the Ford Pinto fuel system design:
[http://users.wfu.edu/palmitar/Law&Valuation/Papers/1999/Legg...](http://users.wfu.edu/palmitar/Law&Valuation/Papers/1999/Leggett-
pinto.html)

The GM ignition-switch recall also sparked a similar debate:
[http://en.wikipedia.org/wiki/2014_General_Motors_recall](http://en.wikipedia.org/wiki/2014_General_Motors_recall)

So it's not uncommon that economics outweighs risk-to-life in a lot of
businesses.

~~~
rodgerd
Which is one reason people should be a lot more suspicious of "tort reform"
than they are.

------
rexignis
Additional reading material: some of NASA's own rules for safe code (with
explanations of each! love the stack one, free memory management :P).

[http://spinroot.com/gerard/pdf/P10.pdf](http://spinroot.com/gerard/pdf/P10.pdf)

~~~
rzzzt
JPL also published a C coding standard, which details language constructs that
one should and shouldn't use in a mission critical embedded system. Some of
the rules make a reappearance there (the "Power of Ten" article is mentioned
in the introduction).

[http://lars-lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf](http://lars-
lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf)

------
guelo
The kind of process needed to achieve high quality is just not that fun. Most
of the programmers I know would run screaming complaining about the
bureaucracy and red tape.

------
mavdi
Not sure if true, but I heard a couple of researchers in my uni wrote some
code for Boeing. They wrote the code in Prolog. I assume because it's easier
to formal test it. And they didn't trust the compiler, so they had to check
the generated machine code line by line. Apparently.

~~~
tjr
If it was DO-178B/C Level A, then yeah, they'd have to inspect the machine
code. Regardless of trusting the compiler.

And the compiler would have to be inspected too.

------
zobzu
"There's a serious move towards formal verification rather than random
functional testing," he writes. "Government agencies like NASA and some
defense organizations are spending more and more on these technologies.
They're still a PITA [pain in the ass] for the average programmer, but they're
often more effective at testing critical systems."

this part is kinda cool because there's also always been a movement for toward
this for opensource/commercial security software, at least for the kernel. Of
course, we have no such thing today (as in i doubt anyone here runs such a
system in production), but the interest is there.

------
jokoon
It's true that writing software requires only computers, and that's it would
be too expensive to test it in real situations, when the stakes are high maybe
it's also important to do live testing ?

~~~
tormeh
Hardware in the loop (HIL) is a sort of middle ground, where all the expensive
stuff that breaks are replaced by ordinary computers, but the actual control
hardware is not.

Corporate video by people who do this, explaining it:
[https://www.youtube.com/watch?t=116&v=YpxPAuHNpdM](https://www.youtube.com/watch?t=116&v=YpxPAuHNpdM)

~~~
neurotech1
The mission systems avionics on a F/A-18F costs around $1-2m per aircraft.
This includes Displays, PowerPC based AMC boards, power supplies etc. Hardware
in the Loop testing quite practical and was ran as an engineering flight
simulator. Missions were flown in the simulator, and problems located without
costing $20k/hr per aircraft for an actual flight test.

SpaceX do HIL tests with the Falcon 9 & Dragon spacecraft.
[http://www.spaceflightnow.com/falcon9/003/120424date/](http://www.spaceflightnow.com/falcon9/003/120424date/)

------
imrehg
Okay, the LightSail spacecraft is not a "life or death" thing, but, as it was
in the news yesterday[1] seriously logging into CSV file onboard can crash the
system and have to wait until it reboots itself (don't react to soft reboot
either)?

Feels sad, that a lot of lessons learned are getting lost along the way...

[1]: [http://www.planetary.org/blogs/jason-
davis/2015/20150526-sof...](http://www.planetary.org/blogs/jason-
davis/2015/20150526-software-glitch-pauses-ls-test.html)

------
crucini
I think the right answer here is to develop languages or libraries that
enforce the constraints you want.

I think we should beware of adding layers of "silver bullet" technologies that
promise to fix the last crisis, but increase the friction of development.

To illustrate what I mean, PLCs are normally programmed in ladder logic. From
that level, I don't think you can crash the machine or corrupt memory. So the
risks are limited to the "operating system" if you will, which can be more
mature and tested than the "application".

------
lambdaelite
Testing is certainly very important, but I think it should be emphasized that
testing is but one part of the _software development life cycle_ that produces
an acceptably safe product.

------
anoopelias
I always thought that the way to ensure zero failures at a level is to ensure
accuracy at the next level.

For example, if all trains runs with a minute accuracy at every points, then
there won't be any collision by mistake.

------
Davesjoshin
As a professional tester, I find this pretty interesting. Always wondered how
these ultra critical programs were tested. (I work in the creative website
space)

------
danieltillett
Have languages been written that are designed to be maximally orthangonal to
each other in regards introducing bugs? If so what combinations are used?

