
NASA Software Safety Guidebook (2004) [pdf] - Tomte
http://www.hq.nasa.gov/office/codeq/doctree/871913.pdf
======
gibsjose
As a NASA flight software engineer who has created and worked with coding
standards for critical software, I would highly recommend Gerard Holzmann's
(JPL) Power of 10 Rules
[[http://spinroot.com/gerard/pdf/P10.pdf](http://spinroot.com/gerard/pdf/P10.pdf)].
This is a set of the 10 most important (and thus highly restrictive) embedded
flight software rules. It gets straight to the point and, at only 6 pages, is
much easier to apply to actual software development than this ~400 page tome.

EDIT: Another great point referenced in the Power of 10 rules: a requirement
is only valuable as long as you can enforce it. With some exception, if you
can't enforce it with a static analysis tool or something of the sort, it
might not belong in your coding standards.

------
mysterypie
I don't doubt for a minute that NASA has highly competent programmers who
write bulletproof ultra-reliable software, but they didn't get their knowledge
or skill from this tedious bureaucratic tome.

After skimming the document for 15 minutes, I didn't find a single thing that
would be insightful to experienced software developers or their managers.
Things like, "Increased complexity means increased errors at all levels of
development." It should be possible to write a useful book about software
safety, but this is not it.

~~~
nabla9
I think many people fail to understand that safety critical software can't
rely on "highly competent programmers". No matter how competent, programmers
can't be trusted to write safety critical code.

Safety critical software is all about the process and organization. The
process must be so good that you can hire total newbie to write code and none
of his sloppy work would get into the final product.

What you need is software engineer who writes 20 lines of code and can then
justify each of the 100 lines in a resulting spreadsheet that is full MC/DC
code coverage in processor instruction level. Then you need two testing
engineers who can do the same for code that others have written. Then you need
everyone to go trough every change, justify them, rewrite tests and provide
reports.

~~~
ben_jones
To extrapolate on this the skillset to identify "highly competent programmers"
AND populate your organization with them exclusively either does not exist or
at least is unreliable enough that it can never be assumed as a permanent
state of your organization: as proven by all long-term software projects and
the organizations that created them.

~~~
termain
Including those at NASA.

------
stuartmalcolm
The thing to bear in mind is that not every organisation needs 'NASA-level'
quality. IMHO The Capability Maturity Model is a must-read for everyone
interested in professional software development.
[https://en.wikipedia.org/wiki/Capability_Maturity_Model_Inte...](https://en.wikipedia.org/wiki/Capability_Maturity_Model_Integration)

~~~
pestaa
What do I look for in this model if I'm merely employed as a software
developer?

Other than the fact you can quantify how developed your employers' processes
are.

~~~
gte525u
The model is really just an abstract compilation of best practices and
probably isn't helpful for just line-level developer. Personal Software
Process (PSP), with it's pluses and minuses, is a more concrete implementation
aimed at a single developer. You could in theory do it within the framework of
your company's development process. Overall it's not that dissimilar from
Joel's checklist[1] with an emphasis on metrics.

[1]
[http://www.joelonsoftware.com/articles/fog0000000043.html](http://www.joelonsoftware.com/articles/fog0000000043.html)

------
fao_
I find this more useful and informative, to be honest (While C specific, it
can be adapted for other langauges as well): [http://lars-
lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf](http://lars-
lab.jpl.nasa.gov/JPL_Coding_Standard_C.pdf)

------
nickpsecurity
This actually has a lot of good information. It's just that you have to get to
it to know that. The sections talking about methods like OOP are pretty good
summary of pro's and con's. The PDF Reader p93 "Good programming practices..."
has good ones common in safety-critical embedded & some OS development but
that I see almost no other C programmers doing. There's some common advice
mixed in. That whole section is worth finding the document as low-level
programmers will find at least 5 things they didn't think about. At least.

Section 6.4 (p109) is a nice overview of requirements analysis benefits and
types. 6.4.1.2 is high-assurance requirements which most projects don't have.
Appendix H checklist is decent. 6.6.3 (p126) covers many useful analysis often
done at compiler or type system level in CompSci. Follows with basic, but
inadequate, explanation of formal specs.

Section 7.4.2 (p141) goes into all sorts of techniques for fault-tolerance.
7.4.4 (p145) talks language considerations for reducing defects. Following
sections are limiting complexity & designing for easy maintenance. If only
enterprises and their management read that stuff... 7.5.2 onward talks Design
Analysis with examples nicely rating benefit vs effort required. First and
interface failure lists plus especially design constraints are good as people
overlook some subset of them usually.

Section 8.4 (p169) is where coding & testing practices begin. Quite thorough
with cost benefit analysis I mostly agree with on coding side. Remember that
requirements & design already knocked out most issues with code basically just
implementing a precise spec. That's why some get "Low" rating when, in tossed-
together coding, they might otherwise have high impact.

Ch 9 (p179) is main section on testing. It could be subsetted but pp 182-183
is nice, exhaustive list of what to look at. 9.4 (184) nice list of testing
types. Nevermind, the latter sections are even better. Section 11.1.4 (p210)
on languages, compilers, etc is pretty thorough with a sound, uncommon
recommendation on language used. :) CASE tools (pp236) has nice list of
capabilities for general, SW tooling worth imitating.

pp264 has list of common, human errors. p273 has list of questions to ask
about dependencies, esp 3rd party software.

So, contrary to mysterypie et al, I find the document to have about everything
you need to know to write software that either doesn't fail or handles failure
well. It's meant to get you started on every aspect so you can follow up on it
with specialist texts. It also drops literally hundreds of useful heuristics
and list items that help you achieve your goal. Many of them are non-obvious.
Quite a few would've prevented failures I see regularly on HN from otherwise
competent developers. I'm for trimming the fat out of this thing to make it
the reference text on high-assurance system development that it deserves to
be. Plus, collecting together with it key information it references (esp
specialist guides) so people can selectively look up and master pieces at a
time.

~~~
cybertronic
I also liked that bit (p48): "“Cutting edge” techniques or processes may not
be the best choice, unless time and budget exist to allow extensive training
of the development and assurance teams. Staying with a well understood process
often produces better (i.e. safer and more reliable) software than following
the process-of-theyear."

~~~
nickpsecurity
Oh yeah! On Schneier's blog, I often post Nick P's Law of Trustworthy
Technology: "Don't put your trust in a new method until there's been at least
10 years proving or improving it." Also, "tried and true beats novel or new"
if high-integrity development.

------
adrianN
_Be aware of potential problems if you split control of software configuration
management (e.g. having software documents is maintained by a project or
company configuration management group, and the source code version control
handled by the programmers). It may be difficult to keep the documents (e.g.
design) synchronized with the code. Someone with configuration management
experience should be “in charge” of the source code, to enforce change control
and comprehensive documentation of changes._

Great idea, forbid normal developers from officially changing the source code
and let an expert handle this important matter. Don't even think about using a
modern VC system, IBM has fully integrated Enterprise SCM(tm) ready for you.

 _Change control is an important part of developing safe software. Arbitrary
changes should be avoided. Once a piece of software has reached a level of
maturity, it should be subject to a formal change control process. What that
leve l of maturity is will vary by group. It could be when the component
compiles, when the CSCI (which may contain several components) is completed,
or when the whole program is at its first baseline._

Oh, you noticed that this 500 line copy paste function could be replaced by
three 80 line functions? Sorry buddy, before you can change this you have to
fill out this form in triplicate and what for a decision from three middle
managers. Maybe we can fit this change in the next release, but don't get your
hopes up.

~~~
Kirth
> Oh, you noticed that this 500 line copy paste function could be replaced by
> three 80 line functions? Sorry buddy, before you can change this you have to
> fill out this form in triplicate and what for a decision from three middle
> managers. Maybe we can fit this change in the next release, but don't get
> your hopes up.

Because as it turns out, writing mission critical, one-off software is
difficult. Touching a piece of code that's firmly embedded in the system could
have serious consequences. (Of course, it shouldn't; any mistakes should be
caught by testing but we all know how that goes.)

These are rules for a place that makes massive, expensive and dangerous
things, not a webdev shop.

~~~
informatimago
Until somebody tries to drive a surgical robot using your webdev javascript
components...

And don't dismiss the idea: they do use Microsoft Windows in surgical robots
(including eye laser robots, which is why I'll never have it).

~~~
notalaser
I've worked on such devices before. I don't know how it goes for the surgical
robots you mentioned, but _in general_ , this can be made in a sane manner.
Um, using a Windows application for the UI. I wouldn't be partial to operating
a surgical level through a web interface, but only because of the unpleasant
UI latency and the inconvenience of doing anything that's not HTTP from it.

The way this is (normally/sanely) done is that the control code for the robot
runs on its own CPU and does all the failsafe, real-time-contrained stuff. The
robot's control code doesn't receive control-level instructions, but messages
of the form "move to position X,Y,Z" or "move N units on axis Y" (which it
validates and, ideally, against which it also has mechanical limits, i.e. the
mechanism itself cannot move to an invalid or dangerous position). The
application that gets user input and sends these messages doesn't need to be
NASA-level stuff as long as it runs on another CPU.

In more recent years, this CPU and the one that runs Windows and the nice
interface have been starting to come in the same box (and, more recently, even
on the same die), but until not so long ago these were usually separate boxes,
connected through serial, USB or Ethernet. This is still common, I've finished
the firmware for one such device just a few months ago.

I don't know if you can actually pass through FDA's process with a robot
that's running Windows, but frankly, I wouldn't try it. But there's nothing
inherently unsafe in running the UI on general-purpose software and hardware,
as long as the control code is robust enough in its validation and the
interface between the two is correctly designed.

