
Rules for Writing Safety Critical Code - plumeria
http://spinroot.com/p10/
======
TickleSteve
A lot of these rules are just good practice for general embedded s/w
development (see also MISRA), such as fixed upper bounds on loops, etc are
intended to minimise dynamic situations and assist analysis.

Just to be clear... Following these rules does not get you any kind of safety
related certification. There is a lot more work, both up-front in requirements
& analysis, and after implementation to gain any kind of certification.

~~~
PinguTS
Following some or all of those rules would be wise to be used by any
developer, especially Web developers. That would eliminate a number of bugs.
Bugs, that seems get fixed over and over again.

~~~
TickleSteve
exactly... higher level s/w has a lot to learn from embedded techniques
regarding performance & reliability.

~~~
eterm
At a higher level, especially in web development, readability is often more
important than performance & reliability.

Time to market and how easy it is for a new team member to pick up the code is
more important than catching every bug or sub-par performance because patching
and deployment is relatively cheap.

~~~
TickleSteve
readbility (and more importantly, understandability) _is_ critical in these
systems, this is one of the things that higher level languages can learn.

Higher level s/w has piled on the abstraction layers and that hides what is
really happening on the h/w level.

Web s/w has gone in direction of simplicity-by-increasing-abstraction.
Embedded s/w gains simplicity by _removing_ it.

Of course, some knowledge of the h/w is necessary for this, but thats a
prerequisite to performant systems anyway.

~~~
sacado2
This. The very fact that lower-level, famously unsafe languages (namely C) can
actually deliver guarantees that _no_ higher-language can offer was kind of an
epiphany.

~~~
TickleSteve
exactly... unfortunately (IMHO) the s/w world is moving in the wrong
direction...

Abstractions need to be appropriate (and shallow in order to be be easily
analysable). A degree of 'mechanical-sympathy' is needed.

------
SCHiM
However from the top of my head I can give you one example that is likely used
in very security-critical applications that also uses goto:

The linux kernel.

The kernel coding conventions actually state that this is because over-zealous
nesting is more harmful than using goto statements.

>The rationale for using gotos is: \- unconditional statements are easier to
understand and follow \- nesting is reduced \- errors by not updating
individual exit points when making modifications are prevented \- saves the
compiler work to optimize redundant code away ;)

[https://www.kernel.org/doc/Documentation/CodingStyle](https://www.kernel.org/doc/Documentation/CodingStyle)

~~~
TickleSteve
security-critical != safety-critical.

Linux would never get chosen for any higher level safety-related s/w.

~~~
fit2rule
Linux is deployed all over the world in safety-critical applications .. if you
take a train in Europe, parts of China, Brazil, some parts of the US .. Linux
is running along, track-side, protectin' yo' ass.. Linux is used in systems
that have SIL-4 and ISO 27001 up the wazoo, its just that in the
commercial/industrial use, 'tis all extremely well documented. And tested.
Tested, tested, tested. Onboard, offline, staged, etc. Test the hell out of
all safety-critical software.

No goto un-tested. No branches not taken.

~~~
TickleSteve
In those systems, the Linux kernel tends to be compartmentalised, basically
being run as a task under the control of the real RTOS while the safety
critical & hard real-time portions are run in a _proper_ RTOS.

The other situation is that the actual control is deferred to a _proper_
RTOS/micro combination which can do the correct thing when Linux goes off the
rails.

The Linux kernel is not a hard-real-time system, it doesn't have complete test
coverage and its inclusion in highly-safety-critical systems is as a lower-
safety-level subsystem. You will not find a Linux task controlling (for
example) a GPIO enabling an X-Ray beam in human rated medical devices.

BTW: I can promise you that there are _many_ untested goto's in the Linux
Kernel...

~~~
fit2rule
"proper" RTOS.

What part of Linux precludes it from being a proper RT OS in the hands of
industrial developers whose purpose it is to certify it as such?

>BTW: I can promise you that there are many untested goto's in the Linux
Kernel...

Not in a development environment looking for SIL-4 certification, there isn't.
Perhaps in the public sphere you may feel that is an adequate position, but in
industrial computing a significant part of the value is in the testing and
real certification of the Linux kernel for certain ratings.

>The Linux kernel is not a hard-real-time system, it doesn't have complete
test coverage and its inclusion in highly-safety-critical systems is as a
lower-safety-level subsystem

I assure you, I have found none of these statements to be truly true.

~~~
TickleSteve
The linux kernel is tuned for throughput, not latency and certainly not
bounded-latency.

My background is industrial/defense/medical s/w on hard real time systems.
e.g. I've worked on train braking systems, high throughput radar systems,
medical devices, lots hard real-time control stuff....

I personally have never seen any evidence of anything greater than SIL-2
certified Linux ([https://www.osadl.org/SIL2LinuxMP.sil2-linux-
project.0.html](https://www.osadl.org/SIL2LinuxMP.sil2-linux-project.0.html))
and that is only in very narrowly defined context.

Like I said earlier, any higher certification, Linux would be run as a
comparmentalised, lower-level subsystem. The real control would not be via a
Linux controlled task.

Until you can point me to the test-suite for a Linux kernel& driver set that
gives me 100% code-coverage, then Linux will never get past this point.

Dont get me wrong, I like Linux and it has a valuable place in embedded
systems.... but hard-real-time control and safety-related systems is not it.

As a final word on the subject, dont take my word for it:
[http://www.hse.gov.uk/research/rrhtm/rr011.htm](http://www.hse.gov.uk/research/rrhtm/rr011.htm)
"it is not likely to be either suitable or certifiable for SIL 4 applications"

"It is unlikely that Linux would be useful for SIL 4 applications and it would
not be reasonably practicable to provide evidence that it meets a SIL 4
integrity requirement. "

...And in
([https://www.osadl.org/fileadmin/dam/presentations/61508/6150...](https://www.osadl.org/fileadmin/dam/presentations/61508/61508_paper.pdf))
"One of the main conclusion: Linux is not suitable for SIL4."

~~~
fit2rule
>Until you can point me to the test-suite for a Linux kernel& driver set that
gives me 100% code-coverage, then Linux will never get past this point.

Of course, this is proprietary to the whims of the companys' deploying SIL-4
Linux: THALES, et al.

[http://ti.tuwien.ac.at/ecs/teaching/courses/hwswcode_vu_WS_2...](http://ti.tuwien.ac.at/ecs/teaching/courses/hwswcode_vu_WS_2011/hwsw-
codesign-student-presentations/gv1-thales)

[http://ercim-news.ercim.eu/en75/special/tas-control-
platform...](http://ercim-news.ercim.eu/en75/special/tas-control-platform-a-
platform-for-safety-critical-railway-applications)

[http://www.ibm.com/developerworks/library/l-real-time-
linux/](http://www.ibm.com/developerworks/library/l-real-time-linux/)

(BTW, HSE isn't exactly an industrial authority on much ..)

~~~
TickleSteve
> Of course, this is proprietary to the whims of the companys' deploying SIL-4
> Linux: THALES, et al. I've worked with Thales companies many times in the
> past. Never seen any evidence of Sil4 Linux. And no, this is absolutely
> _not_ at the whim of the companies deploying it.

>[http://ti.tuwien.ac.at/ecs/teaching/courses/hwswcode_vu_WS_2...](http://ti.tuwien.ac.at/ecs/teaching/courses/hwswcode_vu_WS_2..).
No mention of Linux.

> [http://ercim-news.ercim.eu/en75/special/tas-control-
> platform...](http://ercim-news.ercim.eu/en75/special/tas-control-
> platform..). 1 mention of Linux... no mention of the role it plays.

>[http://www.ibm.com/developerworks/library/l-real-time-
linux/](http://www.ibm.com/developerworks/library/l-real-time-linux/) Verifies
my point. Only the kernel-in-kernel/hypervisor systems get hard-real-time
performance. This is necessary for safety-critical work.

> (BTW, HSE isn't exactly an industrial authority on much ..) HSE == Health &
> Safety Executive... they are very much involved in this.... they're the
> government authority related to this type of stuff. They license software
> for use in Nuclear Reactors and the like (along with other agencies).

My points absolutely still stand.

~~~
fit2rule
Hint: I was on the team that released TAS PLF 2.x, and it is definitely SIL4,
definitely hard real-time, and definitely a well-maintained Linux project
which many other divisions (i.e. Space) also come to use as a reference
platform. I suggest you simply missed it: its running an awful lot of rail
transportation systems around the world, though...

~~~
TickleSteve
Hint: you're mistaken at the role Linux played in it. Go study your
architecture diagrams. The system as a whole is SIL-4, thats not the argument.
Linux _certainly_ isn't, that I can guarantee. The work to provide full
traceability trees and 100% coverage thru the Linux kernel is many man-
centuries of work. That has not been done. The SIL rated side will be a
different kernel/hypervisor.

If what you're talking about involves RedHat MRG, then that product has a
completely different kernel. Its not standard linux.

------
jacquesm
In 'C' or 'C++'.

Though some of the rules are portable to other languages.

I'd add: have someone else familiar with the code base look over your code,
add tests for your functions, test exhaustively where feasible, document your
code and document your reasoning behind the code as well, make it look good,
try to avoid being clever.

~~~
AtmaScout
I agree completely with avoid being clever. If the situation requires you
write clever code, make sure it is commented fully.

~~~
mannykannot
Where 'fully commented' includes your arguments for its correctness. A person
who is writing 'clever' code that is not backed by an argument for its
correctness is not being clever: the best he can hope for is lucky.

------
erikb
I never worked in a safety critical environment, so my ideas are probably
flawed.

My feeling says that limiting loops and not using some language constructs can
enhance quality that much. Checking and rechecking requirements, having
different levels of testing, and most importantly have static and unit test
checks on the source code abstraction level has a huge impact, though.
Example: Even if you use only simple constructs like if and for you still run
into problems with complexity because some problems are complex. Now the
complexity simply is spread, which in some regards might make it easier to
handle but in other cases makes it much harder to see the dependencies.

~~~
TickleSteve
You are correct in those points, and implementation rules like these do not
gain you a safety accreditation.

There is a lot of upfront analysis of the system as a whole and the algorithms
before a line of code is written. Complexity and dynamism is minimised and all
code paths are checked in the final code. This is one of the few places where
100% code-coverage _is_ needed. you need to account for _every_ path in the
code, all the way down to bare-metal.

Also, a common thing to so is to have h/w fail-safe systems (iff possible),
due to the difficulty in getting s/w approved.

------
chopin
I am not sure whether I understand rule #2 correctly. My main use case for
loops is iterating over an array or a list of statically unknown size,
therefore the loop is limited to the actual size of the array or list. Would
that count as statically provable? At least in Java, these cannot hold more
than 2^31 elements for the standard library collections. For this use case,
would it be required to have a hard-coded upper bound additionally?

~~~
gte525u
It's closely coupled with rule #3. An array at least in the safety critical
embedded software have a fixed size determined at initialization. You may have
a sentinel value you're looking for, but, the loop should never exceed the
preallocated size.

There are another set of arguments /against/ using stopping the loop at the
sentinel values because it may make the software less deterministic. But,
that's another topic.

~~~
erikb
Less deterministic? Isn't it quite deterministic if something stops when a
static limit is exceeded?

~~~
victorNicollet
This helps ensure that if your software runs in 10ms on the test data set,
then it will always run in 10ms on all data sets. In general, it is
recommended to avoid "best case" optimization, so that your critical software
is always running with "worst case" performance.

EDIT: On re-reading your message and the message it is responding to, I
believe you are meaning opposite things by "sentinel".

------
kazinator
This is misleadingly titled; it should be called:

 _Rules, Mostly Pertaining to C, For Writing Safety-Critical Code (with Rule 0
Removed)_

(Rule 0 is: Don't use C for safety-critical code!)

(And you know there was a Rule [0], obviously!)

------
lectrick
This applies to machine-level C code and no high-level languages. Functional
high-level languages prevent most of these out of the box, for example

~~~
sacado2
This is far from true.

> 1 Restrict to simple control flow constructs. Details give the precision :
> no recursion allowed. Functional languages are out, there.

> 2 Give all loops a fixed upper-bound. Func languages give no warranty on
> that either.

> 3 Do not use dynamic memory allocation after initialization.

You have no access to memory layout, but func languages heavily rely on the
creation on-the-fly of data and rely on garbage collection (which is
forbidden, too).

> 4 Limit functions to no more than 60 lines of text. > 5 Use minimally two
> assertions per function on average. > 6 Declare data objects at the smallest
> possible level of scope. > 7 Check the return value of non-void functions,
> and check the validity of function parameters. OK, that might be easier with
> func languages.

> 8 Limit the use of the preprocessor to file inclusion and simple macros. Not
> relevant

> 9 Limit the use of pointers. Use no more than two levels of dereferencing
> per expression. OK, no pointer manipulation with these languages.

> 10 Compile with all warnings enabled, and use one or more source code
> analyzers.

Name one professional code analyzer for any func language of your choice.

~~~
lectrick
> Details give the precision : no recursion allowed. Functional languages are
> out, there. > Func languages give no warranty on that either.

Actually, what you can do in this case is when you call recursively, pass back
an incremented counter variable as part of the state, and if it passes some
upper limit, handle it appropriately. You don't need to "avoid recursion" just
to get control over the number of times something loops.

> and rely on garbage collection (which is forbidden, too)

Fair enough, but if I work in a language that doesn't even give me much
control (if any) over the GC, then how is this relevant to non-C coders,
again? Which was my original point.

> Name one professional code analyzer for any func language of your choice.

Erlang (and by association Elixir) has Dialyzer
[http://www.erlang.org/doc/man/dialyzer.html](http://www.erlang.org/doc/man/dialyzer.html)
which is a static code analyzer. Static code analysis has been endorsed by
John Carmack (and of course, all the 3 year old links to the blog post are now
dead, sigh)

For anyone (like me) who is hooked on Elixir,
[https://github.com/jeremyjh/dialyxir](https://github.com/jeremyjh/dialyxir)
provides a bit of easier Dialyzer management

Elixir (and I suppose Erlang) has this concept of "specs" which you don't have
to use, but which help code analysis if you do... [http://elixir-
lang.org/docs/v1.0/elixir/Kernel.Typespec.html](http://elixir-
lang.org/docs/v1.0/elixir/Kernel.Typespec.html)

~~~
sacado2
> Fair enough, but if I work in a language that doesn't even give me much
> control (if any) over the GC, then how is this relevant to non-C coders,
> again? Which was my original point.

This is relevant to users of assembly, forth, C++, Fortran, Ada, Rust and
probably a few others. Beside the very last one (which is way too young),
these are the languages embedded developers used most of the time, precisely
because of their virtues regarding the 10 points in the original post.

I have to agree with you, though, regarding this list functional languages are
probably as good as, or even better than most other high-level languages, but
they clearly fail on most requirements.

Thanks for the reference about Dialyzer, I'll check it out.

------
lordnacho
What about using smart pointers? There's a whole class of potential errors
related to using raw pointers.

~~~
TickleSteve
smart-pointers are useful for automatic resource tracking.

In embedded systems, very, very little is dynamically allocated. Everything is
as static as possible. no new/delete malloc/free, etc. Its all static data.

In general, embedded systems have taken the opposite tack of solving the
memory allocation issues of malloc/free... we just don't do it. Desktop &
server type applications use garbage collection/ARC type tracking. you wont
find anything like that on small systems and certainly not in safety-critical
code.

~~~
lordnacho
Good point, I forgot the context. Very similar to certain trading systems,
where you also simply pre-allocate everything and avoid potential latency
issues, along with giving fine grained control.

