

Ask HN: How to develop software that could be run on a spaceship? - kluck

How to create a software that is so reliable and stable that it would use very limited processing power and recover from all(most all) failures. So I thought the one true challenge for software is when it has to run on a spaceship!<p>What tools and development strategies could one use today to achieve such a goal?<p>(Note: This is not about spaceship software in particular but how &quot;lessons learned&quot; from spaceship software development could be valuable for non-spaceship software)
======
seren
Food for thought : [http://spinroot.com/p10/](http://spinroot.com/p10/) (2006)

From people who are actually writing code for spaceship and exploration robot.

However, I am not sure how applicable are these kind of advices to other
domains. I am working on critical medical devices, so we are already following
most of these recommendations, but if you are designing a website managing
millions of users, the advices would look totally different. It would be silly
to write everything from scratch in C without any kind of dynamic memory
allocation.

I am not convinced the former is necessary _better_ or more challenging than
the latter, it is just a different context.

~~~
kluck
The criteria for software quality that this is mainly about are reliability
and efficiency
([https://en.wikipedia.org/wiki/Software_quality#Measurement](https://en.wikipedia.org/wiki/Software_quality#Measurement)).
I think these are important for all software domains.

------
brudgers
Margaret Hamilton [0] coined the term "software engineering" to describe the
standard to which for Apollo had to be designed, written and tested. It was a
high standard because her team was embedded in a larger engineering culture of
engineering excellence where shortcuts and being in love with one's own ideas
meant people might die needlessly.

Adopting that attitude may be a place to start.

Good luck.

[0]:
[https://en.wikipedia.org/wiki/Margaret_Hamilton_(scientist)](https://en.wikipedia.org/wiki/Margaret_Hamilton_\(scientist\))

------
mtimjones
Around 26 years ago, I developed firmware for geosynchronous satellites in a
"safe" subset of Ada87. I still think it's a wonderful language and wish there
were more opportunities to develop with it. Since then, I've worked solely in
C and assembly.

------
sqrt_minus_1
Money. Time helps.

Creating reliable software-intensive systems is something we know how to do,
we have repeatedly done it in the past, and we are doing in the present. Only
problems are it ain't cheap in terms of money or time, so to be cost
effective, the costs associated with failure or malfunction must be high.

Safety critical systems (where human health can be endangered by system
failures and malfunctions) are one such system. They're usually developed
under some sort of regulatory oversight (e.g., FAA, FDA). There are a set of
minimal government standards that must be met, much of these related to
documentation, and much of the documentation relating to risk analysis and
quality management. There are also standards administered by industry groups
---often, these are "voluntary" in name but for liability reasons practically
required (in my experience). There's going to be some sort of (mandated)
monitoring of the product after release (which feeds into the quality system)
to correct design flaws, defects, etc. You have regulatory affairs people on
staff to handle this area. The safety critical system itself is developed
under some sort of formal product lifecycle management system. There will be a
formal set of requirements. Early in the process, there will be a risk
analysis (continually updated) of the design. There will be an a priori
development plan, timeline, and cost estimates. As for development models, the
waterfall model is alive and well, in part because it coöperates with the
demanding documentation requirements; waterfall does seem to be on the decline
in favor of more agile models (but not too agile!).

The software itself may be assembler if you're really resource constrained,
but more likely, it will be C or C++ with a restrictive coding standard, or
real-time Java, or possibly Ada. If you're fancy (and progressive), there
might be a system model with automatic code generation. The tooling you adopt
(e.g., from Atego) will assist with verification and validation, as well as
requirements traceability. Most people, I think, find software work under
these conditions to be boring and frustrating---it's certainly limiting.
You're maybe generating 100 SLOC a day? The system is almost certainly running
on some sort of real-time OS like vxWorks or QNX, and the hardware, software,
and any associated libraries are (as much as possible) validated for use for
your application. You can, given the right circumstances, use COTS software
(a.k.a. SOUP, software of unknown pedigree), but then the validation becomes
your responsibility.

The HN crowd hates it, but in my experience on these sorts of projects,
degrees and certifications are required and expected except at perhaps the
very lowest level of employees. Electrical and mechanical designs are probably
stamped by a real engineer; depending on the nature of the software portion
(and where it's being developed) the software design might get stamped too,
but more likely, it's being supervised by someone with a bunch of industry-
relevant certifications.

I've glossed over a ton, and skipped even more.

In my opinion: I think that the most important lesson to learn from safety
critical projects (and good engineering practice in general) is that tools are
not the major driver of quality: proper engineering processes and quality
management are the driver of quality. A "right-sized" product lifecycle
management plan, including a "right-sized" quality system are critical and
necessary to predictably and reliably produce a quality product. All else
being equal, I'd predict that a C application developed under a lightweight
PLM and QMS will be superior to one developed under Agile and Rust, or Scrum
and Haskell, or whatever the du jour fad might be.

~~~
seren
I agree on 90% of your post and detailed description.

However, on the conclusion, I am fairly convinced that the process is
necessary but not enough to produce reliable system. I would say the most
important parameter is "People".

If you have a team where people trust each other, where you can report odd
issues without fear of reprisal, where people really want to deliver something
safe, it will work, otherwise, with a dysfunctional team, using the exact same
process, you'll get a terrible mess.

~~~
sqrt_minus_1
I did neglect people and am wrong in doing so; I think what you say is
absolutely correct.

