I agree with almost all of it except for one thing: gotos have pretty much one legitimate use, as C's equiv to finally {} as part of a try {} block, ie, that specific form of cleanup after error management. NASA implies they have no legitimate use.
longjmp banning is also slightly questionable (although I can see why because it is very easy to do wrong). I use it inside of my code as part of an STM implementation (so begin_tx() setjmps[1], abort_tx() longjmps; its faster than manually unwinding with if(tx error) { return; } spam in deep call stacks.)
Using longjmp for this makes writing code much easier (no needing to error check every single tx function call), so less chance for bugs to slip in.
1: The only ugly part of that is begin_tx() is a function macro, which I prefer never to use in code that is executed; I tolerate it in "fancy template-like generator" setups, though.
Anything that makes it "very easy to do wrong" is not a good idea in flight software. A missing semi-colon in FORTRAN once completely killed a space mission.
I remember reading semi-colon in the report ;-) . I can't find the PDF that I believe I read this in, unfortunately, and I'm not sure if it's publicly available. However, this wikipedia article points to a hypen and suggests that there's some folklore involved, and that revisions have been made in the past: http://en.wikipedia.org/wiki/Mariner_1 .
Edit: Andrew, thanks for pointing out my ignorance, I've never coded FORTRAN, and have been thinking it's a semicolon this entire time. What fitting irony. Sort of makes for an even better story now. >_<
this seems like it may be definitive (well, it's a quote that gives a reference; it also fits with what i remember from the news at the time, which is that it was a do statement and related to the parsing of spaces, but i have no idea how reliable that is): http://catless.ncl.ac.uk/Risks/5.66.html#subj2
if it's correct than it was actually a decimal point instead of a comma.
A few summers ago I was an intern at JPL working on a static analysis suite for this exact standard.
Writing code checkers for these sorts of rules is a really interesting exercise and it helped me grow a lot as a programmer! I went from having no exposure to formal languages, parsing, and grammars to actively playing around with these concepts to try and help build more reliable software. It was a humbling, challenging, and incredibly rewarding experience.
Sometimes, a rule is extremely simple to implement. For example, checking a rule that requires that an assert is raised after every so many lines within a given scope is just a matter of picking the right sed expression. Other times, you really need an AST to be able to do anything at all.
A rule like "In compound expressions with multiple sub-expressions the intended
order of evaluation shall be made explicit with parentheses" is particularly challenging. I spent a few weeks on this rule! I was banging my head, trying to learn the fundamentals of parsing languages, spending my hours diving into wikipedia articles and learning lex and yacc. The grad students at LaRS were always extremely helpful and were always willing to help tutor me and teach me what I needed to learn (hi mihai and cheng if you're reading!). After consulting them and scratching our heads for a while, we figured we might be able to do it with a shift-reduce parser when a shift or reduce ambiguity is introduced during the course of parsing a source code file. This proved beyond the scope of what I'd be able to do within an internship, but it helped me appreciate the nuance and complexity hidden within even seemingly simple statements about language properties.
Automated analysis of these rules gives you a really good appreciation of the Chomsky language hierarchy because the goal is always to create the simplest possible checker you can reliably show is able to accurately cover all the possible cases. Sometimes that is simple as a regular language, but the next rule might require you to have a parser for the language.
For what it's worth, this is only one of the ways the guys at LaRS (http://lars-lab.jpl.nasa.gov/) help try to improve software reliability on-lab. Most of the members are world-class experts in formal verification analysis and try to integrate their knowledge with missions as effectively as possible. Sometimes, this means riding the dual responsibility of functioning as a researcher and a embedded flight software engineer, working alongside the rest of the team.
If anyone's interested in trying out static analysis of C on your own, I highly reccomend checking out Eli Bendersky's awesome C parser for Python (http://code.google.com/p/pycparser/). I found it leaps and bounds better than the existing closed-source toolsets we had licenses for, like Coverity Extend. At the time, it had the extremely horrible limitation of only parsing ANSI 89, but Eli has since improved the parser to have ANSI 99 compliance. Analyzing C in Python is a dream.
Gerard Holzmann (http://spinroot.com/gerard/) came to JPL from Bell Labs in 2003, and the period since he arrived has coincided with a time of greater prominence for methodologies of producing reliable software. Although many people contributed to the document in this post (see page 5), Gerard was the driving force behind drafting it and getting buy-in from the people who write flight software. The latter part -- cultural -- is as big a challenge as the technical stuff.
Another thing that happened around this time is getting licensing for Coverity and other tools, and introduction and promotion of static code verification, even for non-flight software.
> A rule like "In compound expressions with multiple sub-expressions the intended order of evaluation shall be made explicit with parentheses" is particularly challenging. I spent a few weeks on this rule! I was banging my head, trying to learn the fundamentals of parsing languages, spending my hours diving into wikipedia articles and learning lex and yacc.
Hmmm....how about this? If the code is parenthesized enough, then the precedence and associativity of the operators has no effect on the shape of the parse tree. So, if you take the expression and repeatedly make random changes to the operators and parse it, and you keep getting the same shape for the parse tree, it is sufficiently parenthesized.
I thought the sometimes-required-parentheses rule was interesting. Here's what I came up with in 30 minutes using ANTLR. Highly recommend it! The ANTLRWorks grammar IDE is incredibly useful -- it shows rules and trees visually, and can single-step the generated code so you can see your parse tree being built one token at a time.
The following grammar accepts input like 3 + (5 * 7) but rejects 3 + 5 * 7.
The key is that, if your expression doesn't start with a parenthesis, you know that all the operators at that level have to be the same. (I assume that sums or products of several things like 1 + 5 + (2 * 3) are permitted without parenthesizing further.)
Also, tool choice matters. ANTLR is an LL parser and Yacc/Bison are LR parsers; IMHO with LL it's much easier to understand what's going on. This grammar would need substantial rewriting for Yacc to deal with the fundamental differences between LL and LR parsing.
(edited to deal with HN markup issues related to asterisks and fix implementation bugs)
Oh if only the project that I had been working on followed any of these rules. Most of the code was generated from Matlab, but some had to be translated by hand. I'm not sure any of us knew this even existed...
Wait.... no malloc or sbrk? That means all space has to be stack allocated? That's a pretty serious limitation and would probably make it hard to do anything really interesting.
You don't want your space ship flight to be "interesting".
Note that it says "after task initalization". What this really implies is that you must use O(1) heap space, and you should get what you need ahead of time.
Having to deal with out of memory conditions 4 seconds after main engine turn-on is not a fun party. Neither is blocking on malloc() so you can prepare your struct course_adjustment to send to another task.
You don't want to do anything interesting in a space flight mission. Code affecting a space mission only has one chance to get it right in many cases. In general, memory management in these systems is extremely serious business. Think of all the ways that manual memory management in C has damaged the reliability of software you have written in the past.
In addition, a lot of the rules are created to make the task of reading source code easier, both for humans and machines. I remember Dr. Gerard Holzmann once half-joked in a meeting that he wanted to disallow any declaration of pointers except at static initialization. I sort of thought he was joking, but then he assured me that it was a serious consideration. He reminded me of the gravity of the situation and explained that $2 billion of public funds were on the line.
Disallowing pointer indirection would make the task of certain automated analysis techniques much, much simpler to perform. Adding a pointer indirection can really conflate matters sometimes.
But without pointer indirection and dynamic memory allocation, why even use C? The big idea of C is pointers. Aren't there languages designed for mission critical and embedded environments (Ada for instance) ?
This is a much larger discussion, akin to most religious wars in software ;-). There is a huge argument for what you're saying, but there's other forces at play. One aspect that plays into it is that C is a really nice layer on top of assembly, and there are a lot of extremely talented embedded C software engineers, (I'm assuming) moreso than the number of available Ada engineers. Also, keep in mind this is a pretty conservative domain. What has worked in the past is trusted much more than what might work better. Up until a decade or so, there was no operating system to speak of and most development used hardware controllers.
Also, C is not the only horse in town. In fact, I hear that Ada is actually pretty popular in space-flight. Other missions have successfully leveraged FORTH even. Using a Lisp read-eval print loop from many millions of miles away once saved the Voyager mission.
In my experience, I have never come across Ada in the spaceflight domain.
Among the big players in spacecraft flight software, there seems to be a divergent east-coast/west-coast preference for C and C++, respectively. In my estimation, this is the reason: there is a wide variety of target hardware and OSes and the need for FSW to be reused across all of them (embedded linux, VxWorks, QNX on PowerPC, SPARC, intel architectures). In terms of development environments and compiler toolchains, only ISO C (and to a slightly lesser degree C++) is supported by all of them.
Edit: Various instruments on spacecraft may be programmed in Forth or other nifty languages, for example, and there's a growing effort to make some of the more "interesting" challenges in spaceflight (autonomy, fault management, guidance-and-control, etc..) to be coded in custom domain-specific languages or other scripting languages like Lua.
What do pointers have to do with memory allocation? Pointers don't discriminate against the otherwise allocated.
That said, this system still uses a dynamic memory system - its somewhat non-optional when you use an operating system which has to maintain memory for stacks and its own resources.
These guidelines just forbid you to use the memory allocation routines after the task initialization phase to make memory allocation and usage completely predictable.
I brought up pointer indirections as a way of illustrating how stringent some of these rules are and what sorts of forces are at play. Sorry for muddying matters.
The point is that you can still do all of that stuff even on staticly allocated memory.
The other limitation in a lot of these systems is the underlying virtual memory system and some times there isn't one. Memory fragmentation issues are a huge problem when you have a couple kB to a few MB of physical RAM and a limited VM subsystem.
> That's a pretty serious limitation and would probably make it hard to do anything really interesting.
No it's not, and no it doesn't.
No dynamic allocation is a pretty standard precaution for safety critical software. It requires careful coding and design, but it eliminates an entire class of runtime errors and makes it relatively easy to put an upper bound on memory usage.
It's very common in embedded systems. It's not so limitating since you need to plan the exact size of your buffers anyway, to make sure the system has enough memory; might as well just allocate them statically. It's also rare to use linked lists or other structures that grow dynamically in that kind of software.
It doesn't mean you have to allocate on the stack. It means you have to declare buffer sizes at compile-time rather than at runtime. Instead of saying char ∗buf = malloc(size_of_my_data()), you just say char buf[MAX_SIZE_OF_MY_DATA].
would probably make it hard to do anything really interesting
The Rockbox music player firmware has the same restriction and it doesn't seem to prevent it from running Doom or decoding FLACs.
It just means no dynamic allocation. This is not an uncommon restriction for certain classes of embedded systems (engine controls, avionics, medical, etc.) I've worked primarily in this environment and haven't found it to be too much of a nuisance.
But that's AFTER startup. I think this is a very nice pattern for servers. Do as much work as possible at startup time. Do anything that can fail, including allocating memory.
And then in the request path, be miserly about what you are willing to do. "Modern" web servers like nginx seem to be designed this way. Node JS's HTTP parser makes a point of not allocating any memory.
I think you'd be surprised how far this pattern can go.
Actualy it is very wise. I started out doing COBOL with Jackson Structured Programming standards and in that when I started my first job I was totaly unaware of the goto verb. Even got told to use it. This was years ago and real-world standards have caught up with education standards, maybe not cutting edge but at least on a sharp surface if you know what I mean.
With C I have never used a goto, sure you can compact your code, but is it managable later on by somebody else and by avoiding goto's you also tend (at least I have found it to be so) to get more structured, easier to follow code. Also smaller functions albiet more of them as well I'd say from what I have experienced.
Also remember a goto may be fine for what the program is to do today, but what about down the line and changes. In that as much structure and in that control is the ideal.
Some might say if you want to code goto's then code elsewere in assembler.
It is harsh, but there again so is space (sorry had to say it).
Do you write parsers or state machines very often?
goto has its uses. It shouldn't be used indiscriminately, but (especially since C does not have tail-call optimization) it is the best approach sometimes.
It's a bit strong to say that C doesn't have TCO. The language doesn't require TCO, but compilers are free to implement it (and indeed many of them do).
"Do you write parsers or state machines very often?" Nope, not at all. So I can't comment further on that area beyond saying for every rule there is an exception - that is the rule. As a rule goto's are bad in highlevel languages.
If you have a state machine with several mutually recursive states, you can end up with a dangerously deeply nested call stack* . (Particularly if the state machine is reacting to an infinite stream of data and never actually returns.) One way around this is moving as much of the logic as is feasible into one large function and use gotos instead of function calls, managing the accumulated data yourself. (In a language with tail-call optimization, you can just use function calls. Much better.) This is usually a bit cumbersome to manage by hand, but a good strategy when generating C code from a DSL. (Look at the output from lex, for example.) This is also common when implementing a virtual machine in C.
* Especially on embedded hardware, which may only have a few KB for the stack.
Another use case for gotos is handling cleanup on error, when writing code that has to be fault-tolerant. Here's a rough example, generalized from a VM I'm working on:
typedef struct thing {
int id; /* object ID */
int buf_sz; /* current size of the buffer */
char *buf; /* internal buffer */
foo *f; /* some other thing that needs alloc / init */
} thing;
thing *thing_new(int id, int buffer_size) {
thing *t = malloc(sizeof(*t));
foo *f = NULL;
char *buf = NULL;
if (t == NULL) goto cleanup;
buf = malloc(buffer_size);
if (buf == NULL) goto cleanup;
t->buf = buf;
f = foo_new();
if (f == NULL) goto cleanup;
t->f = f;
return t;
cleanup:
/* Avoid leaking memory if any part failed. */
if (f) foo_free(f);
if (buf) free(buf);
if (t) free(t);
return NULL;
}
I don't tend to use goto by hand much outside of that particular idiom, but it's common enough that it should be recognized, and the equivalent with ifs and multiple returns would be much worse: "if not B, free A; if not C, free A and B; if not D, free A, B, and C; if not E, ...".
This is ok as it is what is deemed exception handerling. Though personaly I like to handle as many known exceptions individualy. I also like to have all functions to return a status code and with that you can cleanup any possible exception and still have all your error handerling code in one place and allowing graceful handerling of any exception. It's a little bit more effort, but when you need that level of assurance you pay for it one way or another.
I would not really call that a state machine. (Or a parser.) Which were the topic at hand. All of my state machine have been for embedded targets and generally look like
Simple state machine, no gotos are needed. Works fine for non-simple ones too. And the first rule of fault tolerant code (particularly for embedded) is never use malloc.
I agree that goto is useful for error handling, but not parsers or state machines. Anyway, I'll play your bait-and-switch. Here is error handling without goto.
typedef struct thing {
int id; /* object ID */
int buf_sz; /* current size of the buffer */
char *buf; /* internal buffer */
foo *f; /* some other thing that needs alloc / init */
} thing;
thing *thing_new(int id, int buffer_size) {
do {
thing *t = malloc(sizeof(*t));
foo *f = NULL;
char *buf = NULL;
if (t == NULL) {break;}
buf = malloc(buffer_size);
if (buf == NULL) {break;}
t->buf = buf;
f = foo_new();
if (f == NULL) {break;}
t->f = f;
return t;
} while (0)
// clean up
if (f) {foo_free(f);}
if (buf) {free(buf);}
if (t) {free(t);}
return NULL;
}
Now normally I would never do that. More typically I would actually use the loop. Because I would not be malloc-ing, I would be trying to initialize a piece of hardware over SPI or i2c. The do...while would be replaced with a for(i=maxtries; i; i--) loop. After maxtries, the loop terminates and the peripheral is shut down.
And now, how do you break from within the 2nd and 3rd level of nesting? break can only break one of them. If only "break" had a label, so you could say "break toplevel;" instead of just "break;" (and name the relevant scope "toplevel", of course).
Turns out, you actually can! instead of "break toplevel;", you just write "goto toplevel;". There's another minor change, in that you have to put the name at the end of the scope, rather than the beginning of the scope, which is why people tend to name it after the next block (e.g. "goto cleanup;" in this case).
I was only replying to your "here's error handling without goto". And I'm not implying that it is impossible - just that your solution does not scale as is to more than one level of scoping.
There is nothing that absolutely needs goto (including error handling), because Turing completeness does not require goto. so you might wait forever; I'm not sure what it is that you guys are arguing about with respect to state machines.
(of note, I keep waiting for TCO proponents to show me an example in which the guarantee of TCO in scheme makes the world so-much-better. The claim always comes up, and every example I've seen so far requires at most adding two more lines in Python, and no adding of TCO)
These are guidelines for code which controls multi-million-dollar spacecraft. "No recursion" and "always predictable loop bounds" are also incredibly harsh, but they are helpful for verifying correctness.
And most of it seems to stem from the old "goto considered harmful" that has been spread from generations to generations.
gotos have a pretty important niche in error handling blocks. You can see them all over the linux kernel, arguably one of the biggest (and most successful) C projects out there.
We're talking about flight software, not software in general. Implying that the people who drafted this guideline were blindly accepting dogma is simply wrong. Look at the list of contributors (page 5) -- it includes Ritchie, Kernighan, and Doug McIlroy -- in addition to several others who have spent their careers writing flight software.
> I am not defending the bans, but find it strange that people would be surprised by them.
yes, i think a lot of replies here are speaking past each other, making valid and not actually contradictory points, but not really replying to each other.
if you read the entire doc it is clear that they are pushing c in a very safe, but somewhat unusual direction. there's no dynamic memory for example, so the main reason that most of my c code uses goto - to free memory on failure in a "catch" - is irrelevant.
taken as a whole, it's not what i would call normal c use, and i don't think it's very useful for most other people as a guideline, but it is internally consistent and, for the specific use case, reasonable.
This document reinforces my opinion that most coding standard documents suck. I’ve seen a countless number of coding standards from different companies (some of the companies I even worked for) and they all sucked. No exception. Even though coding standards have some common sense advice and guidelines which is generally helpful for producing code of good quality, the amount of arbitrary irrational rules and beliefs that coding standards writers put into the standards and try to enforce through the standards actually end up hurting the quality of the code produced by developers trying to follow those rules.
Case in point with examples from the NASA JPL coding standards for C:
* no direct or indirect recursion
What is it, FORTRAN-77? Some algorithms are way easier to implement recursively whereas the iterative algorithm can be much less straightforward and buggier. Think sorting: it’s easy to prove that the recursion is finite and that the implementation of the algorithm is correct. Do they use sorting in NASA or is it prohibited by this rule?
* no dynamic memory after initialization
FORTRAN-77 again! While dynamic memory management can be challenging in real-time systems and the generic malloc/free implementation is not acceptable, it doesn’t mean that statically pre-allocated fixed-size memory is better. It inevitably leads to brittle code ripe with excessive memory use, bugs like static buffer overruns, and sometimes even inability to use dynamic data structures like linked lists. To work around this restriction, a developer can construct a linked list structure in a statically allocated memory, but doing so is essentially equivalent to creating your own dynamic memory manager which is more likely to be poorly implemented than a good dynamic memory manager. Instead of denying the use of dynamic memory they should develop memory managers with acceptable performance characteristics.
* The return value of non-void functions shall be checked or used by each calling function, or explicitly cast to (void) if irrelevant.
Given that there are a lot of library functions in C that return some error code rarely useful, this rule leads to code littered with (void) casts: “(void) printf(…)”, “(void) close(…)”, etc. Along with the littering the rule doesn’t make the code any more robust because it encourages to use (void) casts to ignore error codes and therefore error codes will likely be ignored rather than handled correctly.
* All functions of more than 10 lines should have at least one assertion.
This leads to littering code with assertions in those functions that don’t necessarily have anything to assert and that are accidentally longer than 10 lines (for example, due to mandatory parameter validation checks. I hope parameter validation checks are not assertions, are they?).
* All #else, #elif and #endif preprocessor directives shall reside in the same file as the #if or #ifdef directive to which they are related.
This is just a bizarre rule. What developer puts #ifdef in one file and #endif in another? Unless of course he’s drunk or high but I hope that’s not how NASA develops its software.
* Conversions shall not be performed between a pointer to a function and any type other than an integral type.
Wait, pointers to functions should be converted to which integral type? They are a number of integral types: char, short, unsigned long long. Which one do I choose? Why not void* or intptr_t?
* Functions should be no longer than 60 lines of text and define no more than 6 parameters.
Finally a good rule. But what does the explanation say? “A function should not be longer than what can be printed on a single sheet of paper in a standard reference format with one line per statement and one line per declaration.” Printed on a sheet of paper? Is this still how code is reviewed in NASA?
And before you say "these coding standards are for a special kind of software that runs on space flight control systems," embedded devices these days are more powerful than desktop computers ten years ago. Embedded sortware grew beyond draconian restrictions a long time ago and it's much closer now to non-embedded software.
Let's not forget that NASA did use Lisp in their systems and they were able to solve pretty difficult problems remotely with help of Lisp REPL (http://www.flownet.com/gat/jpl-lisp.html). Lisp code certainly can't be subject to any of the restrictions from these coding standards, which is another indication of how irrelevant these coding standards are for producing robust software.
longjmp banning is also slightly questionable (although I can see why because it is very easy to do wrong). I use it inside of my code as part of an STM implementation (so begin_tx() setjmps[1], abort_tx() longjmps; its faster than manually unwinding with if(tx error) { return; } spam in deep call stacks.)
Using longjmp for this makes writing code much easier (no needing to error check every single tx function call), so less chance for bugs to slip in.
1: The only ugly part of that is begin_tx() is a function macro, which I prefer never to use in code that is executed; I tolerate it in "fancy template-like generator" setups, though.