Ask HN: How did NASA make reliable software if they didn't invent unit tests? - sebastianconcpt
======
PaulKeeble
I used to work in aerospace as a hardware/software engineer and while my team
did use unit tests that wasn't how software was tested and qualified.

The process in aerospace is vastly slower and that can be signified by the
fact that on average a programmer doing aerospace produces just 1000 lines of
code a year. There is a clear reason why aerospace engineers produce a lot
less code:

\- Documentation - You get a pile as tall as your desk for a few thousand
lines of code, the software is designed to the degree of knowing precisely
what the maximum N can be going into a function and its exact runtime, every
function has a fixed maximum and minimum values and a runtime associated with
it preallocated before anyone writes code. Then the code is checked against
the documentation.

\- Bench testing - We had a complete dev/test environment in which all in dev
hardware/software could do missions and had all the different teams and their
avionics device within a network interacting. You then flew countless
"missions" (real cockpit parts but swivel chair + simulator) testing all the
different scenarios and what you just did.

\- The wider team of engineers working on the avionics software would print
out the entire code and all branches and "walk" the entire system from
beginning to end with documentation in hand. Every line is validated by
someone from a different team and every decision scrutinised.

\- System test - multiple phases of. Completely separate teams would test
every new release to death rigorously, against the clients specs, against the
documentation produced and also with their own independent simulator and
cockput setup. Not only that but there was a second team doing the same thing
trying to catch the first in missing something.

\- Then it spends years being tested in prototype vehicles before finally
being signed off as ready.

End to end it took about 10 years to get 60k lines of code released working
full time on one avionics device. We had unit tests but only for the purpose
of testing the hardware not really for our own software beyond startup tests.

All of that is the rigour coming out of one principle, every time you find a
problem in the code at any point its not the individuals fault its the teams
fault and you work out the genuine cause and ensure it can't slip through
again. Everything must be testable.

There is quite a lot of parallels with unit testing and indeed it could be a
better way to capture tests for aerospace, potentially a slightly less man
intensive way as to run all the tests everytime a new release is produced, but
it wouldn't look anything like what your typical commercial company does as to
them its not worth reducing the 10 bugs per 1000 lines of code they currently
have down to more or less 0. Unit testing in aerospace would be about
efficient repetition of running known tests much as it is in the commercial
space but not driving any form of design process, but I could see test
definitions being produced from the function lists. We did a lot of
specification testing and limiting of language to allow that to occur and make
the code more straight forward.

~~~
kazinator
The only problem with producing 1000 LOC per year is that your brain will rot
from being so narrowly focused on this one thing.

It could be a tolerable job with a programming hobby on the side (FOSS
projects or whatever) in which you produce another 15,000 lines to get your
"coding fix".

~~~
zerohp
The process described is not narrowly focused. It's full system design,
implementation, and validation. Engineering is more than just coding and can
be much more satisfying.

------
invaliduser
Unit Tests are not the one and only metrics for software quality. Believe it
or not, before unit tests were coined, we managed to build reliable software
too. Actually a lot of the internet was actually built that way.

But your assumption is very interesting, because it states a very common
misconception of quality (whatever that means, reliability, maintainability,
resilience, etc):

* "Unit tests imply reliable software". Well, it does not, you'd need reliable unit tests for that, and from what I've seen in the industry, not every unit test out in the wild increases reliability.

* "Without unit tests, you can't have quality software". That is just not true either. First, not all tests are unit tests, end-to-end (integration/functional/whatever) test are extremely useful and raises the quality too. Even without any automated test, the quality of documentation, contract programming, manual testing, etc also help a lot.

Also, note that apart from testing, a lot of different practices contribute to
reliability. Separation of concerns, the language level of abstraction, the
overall experience and involvements of the developers, and many others.

~~~
teej
Could to into more detail about what techniques NASA specifically used to
achieve quality software? I'm honestly very interested. I feel like you didn't
answer the spirit of the question "How did NASA build highly reliable
software" while pigeon holeing on "without unit tests".

~~~
Cacti
Their hardware was essentially fixed, well defined, and limited in scope.

They reviewed every single piece of code many, many times.

They were draconian about memory handling, loops, preprocessor directives, and
recursion.

All functions had to check all return types from other functions.

No more than one level of pointer dereferencing.

All code bodies had to fit on a piece of paper.

/edit should add, they used their tools thoroughly: compilers were run on max
warning levels and absolutely no warnings were accepted, ever. They also used
several static analysis tools on the code (under-appreciated these days!).

~~~
yardie
I'd also add they had excellent, mature coders.

None of this crunch time bullshit, though I'm sure it happened.

I've been on one of those binges. If another coder hadn't seen my errors, due
to diminishing sleep, I'd probably keep banging my head against the wall. Just
compounding the situation.

~~~
danudey
I can't count how many times I stayed late at the office working on a problem
for 6 hours straight, and then after a good night's sleep found and fixed the
issue after 20 minutes at work the next day. Sleep is king.

~~~
yardie
I have latent dyslexia which seems to emerge after too many hours at a
computer screen. I've come back from a good nights rest to find variable names
that are just slightly misspelled, recursive loops that are unreachable, and
functions that are incomplete because I was distracted by a bug hunt.

------
abtinf
Unit testing is not a method of achieving reliability, but of accelerating
development. They do not prove that a system functions correctly, which is
more in the domain of systems and integration testing.

Also, testing is only one small part of building reliable systems. Instead,
reliability requires actual engineering practice, which is non-existent in the
software world except in a few life-critical domains.

NASA and their contractor's software engineering practices have been covered
extensively elsewhere. Here are some links.

[https://www.fastcompany.com/28121/they-write-right-
stuff](https://www.fastcompany.com/28121/they-write-right-stuff)

[http://history.nasa.gov/computers/Ch4-5.html](http://history.nasa.gov/computers/Ch4-5.html)

~~~
rootusrootus
My takeaway from reading the fastcompany.com article is that it's not process
improvement per se that provides quality software, it's money. NASA has
invested a great deal more time, and therefore money, into developing software
than a typical commercial software company can afford. We will need quite a
lot more programmers necessarily getting paid a lot less if we want
commercially viable software production within spitting distance of NASA
quality expectations.

~~~
Spearchucker
That's just it. If a bug in software can cause death you're not going to use
agile. You're going to use a waterfall-like process, and you're going to have
stringent quality gates at every single transition - from envisioning, through
requirements analysis, design, development, test, release, and into operation.
It's the only way. I know because I built a military guaranteed messaging
system. Tech aspect was small, but it took 24 months.

~~~
e12e
Then again, agile has some relation to lean (which of course is a _production_
method, not a _design_ method) - and cars probably have a much higher chance
of killing people than space craft do.

I think it is more a case of "what you can get away with" in terms of domain
knowledge and risk: If there's only some money on the line - taking a hit from
time to time from bugs, bad design etc - will often be cheaper than slowing
down the design and production process. Especially if a system with (some)
bugs can start earning money on day X, while a "perfect" system would require
some N > X days more to start earning money.

Waterfall is a perfectly valid process - but for it to work, it puts a very
high demand on the domain knowledge of those involved, as well as information
management. And in _reality_ I think very, very few large projects get away
with _no_ iterations for sub-modules.

I was working with a crew that was refurbishing deep sea drilling platforms -
originally built in the 70s. One particular task was moving anchor winch
engines, along with ventilation (vent-hole had to be displaced due to moving
the engine(s)). On the design the crew got from engineering, the hole was just
moved some 10 meters. But the engineer was clearly working from out-dated
drawings - moving the hole as indicated would have lead the team to cut
through half of a fuse board, some computer controls for some other sub-system
and through an interior supporting strut. In the end the solution turned out
to be prototyping a longer shaft with pvc pipe, and then welding the thing in
place, in pieces. The one-day job turned into a two week job - so the cost of
the single mistake on a drawing cost an enormous amount of money (but of
course, there were many such mistakes, so who knows how early the rig could've
been out of dock anyway...). On the other hand, it got fixed.

I absolutely agree that you need more than "just agile", though. You need (a
probably multi-stage) verification process, for example. In the example above,
the welding and angle of the pipe was inspected first by the crew leader, and
later by external inspectors -- that particular system didn't directly affect
life or death, but as it was venting from from propulsion engines (I think) -
an error leading to rust or system suspension would be very costly.

------
wwkeyboard
In addition to everything else mentioned here they spent a TON of money[1].
For example they spent what would be 3.6 billion 2016 dollars on Guidance and
Navigation. If your current project could drop a few hundred million on QC I
bet quality would go way up regardless of automated testing or not.

[1][http://history.nasa.gov/SP-4029/Apollo_18-16_Apollo_Program_...](http://history.nasa.gov/SP-4029/Apollo_18-16_Apollo_Program_Budget_Appropriations.htm)

~~~
kakali
I think your number includes design and building one of the first solid state
computers.

~~~
wwkeyboard
The "few hundred million for quality control" number still stands. In addition
they were able to draw on everyone from the people designing the transistors
through to the programers to debug problems. Not something many modern
projects can do.

------
noblethrasher
This article by Walter Bright (creator of D, also hangs out on HN as
'WalterBright) might be illuminating as well as instructive:

[http://www.drdobbs.com/architecture-and-design/safe-
systems-...](http://www.drdobbs.com/architecture-and-design/safe-systems-from-
unreliable-parts/228701716)

------
dsl
Your submission falsely implies that unit tests create reliable software. But
that aside...

Avionics software is subjected to multiple rounds of peer review (often by
different companies), integration testing, and in some cases multiple
different implementations are developed and black box tested against each
other for different outputs given the same inputs.

------
hollander
Back in the times when we counted in bytes or kilobytes, programs were simpler
to manage. One person could overview all code. Now we have programs with
million lines. Nobody can oversee that.

~~~
aardvark291
"One person could overview all code." Not sure that's true. Imagine reviewing
the code of the Apollo Guidance Computer, just to take an example from the
60's. The stack of paper containing the source code printout is more than five
feet tall: [http://skeptics.stackexchange.com/questions/31602/is-
this-a-...](http://skeptics.stackexchange.com/questions/31602/is-this-a-photo-
of-margaret-hamilton-standing-next-to-apollo-project-code-that-s)

~~~
nostrademons
It's on GitHub:

[https://github.com/chrislgarry/Apollo-11](https://github.com/chrislgarry/Apollo-11)

Reviewing it would be non-trivial, but in terms of absolute code size, it's
not much larger than a moderately-popular open-source project today. It's
significantly smaller than React, for example:

[https://github.com/facebook/react/tree/master/src](https://github.com/facebook/react/tree/master/src)

Edit: Downloaded both repositories. 'wc' puts the AGC at roughly 60K LOC,
while just the 'src' directory in React is 70K.

------
njharman
Unit tests aren't really about testing (in the validating sense) software.
They in fact are quite deficient in that role. No integration tests, no end-
to-end tests, no fuzz testing, etc.

Unit Tests are foremost a software development tool. They force (in order to
be unit testable) separation of concerns, definition of interfaces (i.e. the
"contract" with function's users), etc. Mechanically, they also enable
validating that internal refactoring has not violated that contract.

~~~
gwbas1c
As big of a fan as I am about unit tests, I find their value somewhat dubious.
They take a long time to write, and when they fail, it often means the unit
test needs to be updated. Mocking interactions among objects takes a long
time.

What I find has a better ROI is using the same unit testing tools, but writing
what unit test purists will call integration tests. Basically, using unit test
tools where most of a program is instantiated, but I/O is mocked, is much more
useful. When these tests fail, it usually indicates a bug instead of a part of
a unit test that needs to be updated.

~~~
njharman
This is why/how they are a development tool. "Mocking interactions among
objects", often breaking, taking too long to write (I expect to spend 30% dev
on units, 30% on code rest thinking, planning, etc). Those are all indicators
that your code/architecture is bad. Too complex, too coupled, too much
concern, not layered into clean interfaces, etc.

------
efnx
A while back someone posted some programming guidelines issued by nasa. It
detailed requiring assert statements at least every 10 loc. It spoke of many
of other things as well, but that was one that stuck in my head.

~~~
jdavis703
Is this it this: [http://pixelscommander.com/wp-
content/uploads/2014/12/P10.pd...](http://pixelscommander.com/wp-
content/uploads/2014/12/P10.pdf)?

If so it looks like functions can't be more than 60 lines (rule #4), and that
the "assertion density" (rule #5) must be two asserts per function.

------
collyw
According to this, unit tests are one of the least effective ways of catching
bugs.

[https://kev.inburke.com/kevin/the-best-ways-to-find-bugs-
in-...](https://kev.inburke.com/kevin/the-best-ways-to-find-bugs-in-your-
code/)

------
typetypetype
I think you mean to say "automated, script-based unit tests". You can test
code manually.

~~~
jpindar
Back in the day, people used to do something called "desk checking".

------
twunde
From
[http://link.springer.com/article/10.1007/BF01845743#page-1](http://link.springer.com/article/10.1007/BF01845743#page-1)
1) Integration testing, 2)Systems testing, 3)load testing and 4) user
acceptance testing. This would have been done with simulation tools alongside
manual testing. Think about the way hardware testing is done and extrapolate
those techniques to software.

~~~
ubercode5
This is right on money.

The test engineers as well as the implementation engineers also have a lot of
experience within their specialized areas. This provides a lot of insight into
expected results when running these different types of tests and scenarios.

Also, these tests are long duration. Multiple iterations of simulations can
take years to complete.

------
nickpsecurity
Cacti had a nice summary. Some of their methods are also detailed in NASA's
Software Safety Guidebook that was recently posted. My comment below tells you
which pages have good stuff plus, as usual, has a link to the PDF itself.

[https://news.ycombinator.com/item?id=12016046](https://news.ycombinator.com/item?id=12016046)

Here's a recent comment with so-called Correct-by-Construction methods for
software development that knock out tons of defects. Usually don't have unit
testing. The cost ranged from being lower (eg some Cleanroom projects) due to
less debugging to around 50% higher (eg typical Altran/Praxis number). Time-
to-market isn't significantly effected with some but is sacrificed for others.
So, you don't need several hundred million in QA as some suggested.

[https://news.ycombinator.com/item?id=12081603](https://news.ycombinator.com/item?id=12081603)

------
geonnave
Some thoughts I have:

\- I think they did apply methodologies of reliability that worked in
hardware, to software (e.g redundancy, automatic fault
detection/tracing/correction)

\- Also, in the early days, software was so tied to hardware, and hardware was
faaar less complex, so they probably _could_ understand everything that was
happening in the system at a given time

------
ThomPete
I think it's important to remember that the kind of software written for any
kind of flight control or guidance system are very very very far from the kind
of dynamic software normally done today.

There where more likely to be hardware errors than software errors, thats how
close to the hardware they wrote.

------
abalashov
Really? Because, how could quality assurance, software testing, and formal
verification have existed before some hipster buzzword for a particular
approach to it came about?

------
andersthue
This is an article about the team writing code for the space shuttle, it is
not so much about how they tested but more about the (insane) amount of time
and energy spent on making sure everything worked.

[http://www.fastcompany.com/28121/they-write-right-
stuff](http://www.fastcompany.com/28121/they-write-right-stuff)

------
avalenciano
Talk by Dr. Gerard Holzmann (leads the JPL Laboratory for Reliable Software):
[https://www.youtube.com/watch?v=x6yDubV3a9I](https://www.youtube.com/watch?v=x6yDubV3a9I)

You can also search for the paper/final report: "NASA Study on Flight Software
Complexity" to expand the topic.

------
carapace
Margaret Hamilton

~~~
nickpsecurity
I suggest always including links when you drop her name as significance will
be lost on many.

[https://en.wikipedia.org/wiki/Margaret_Hamilton_(scientist)](https://en.wikipedia.org/wiki/Margaret_Hamilton_\(scientist\))

[http://htius.com/](http://htius.com/)

~~~
sebastianconcpt
Indeed. Thank you for the links

~~~
nickpsecurity
The lessons learned from Apollo article and 001 features should show how
badass they were. Note recent and future tooling in model-driven tooling aim
for what hers already did two or more decades ago. One of first, high-
assurance toolchains.

------
mbrodersen
To create really reliable software you either use formal methods (seL4 kernel,
CompCert etc.) or you spend a huge amount of money and work really slow. NASA
is spending a lot of money and working really slow. In other words, they are
not in any way state-of-the-art. Just brute force.

------
daxfohl
By writing unit tests? Unit tests existed decades before they were
popularized. Nobody "invented" them.

------
daxfohl
The funny thing is it's not just "NASA way back when". Even now, the _most_
critical code (e.g. implantable heart defibrillators) likely has less "unit
test" than advert code that posts junk to your facebook wall.

Waterfall is still the more sensible approach in some industries.

------
yeukhon
My wild guess is just a lot of testing. Rigorous testing. Note a lot of early
NASA code were assembly.

------
owiejg
I think they spend a lot of time doing very detailed code reviews.

------
daxfohl
Who invented unit tests? Does that person have a monopoly on highly reliable
software?

What an _awesome_ patent trolling opportunity that would be!

------
PaulHoule
Code review.

------
tuned
Asserting about involved variables each 10 lines of code (:

------
nekopa
Process too.

