
Erlang programmer’s view on Curiosity Rover software - deno
http://jlouisramblings.blogspot.com/2012/08/getting-25-megalines-of-code-to-behave.html
======
pron
I absolutely love Erlang and think that, along with Clojure, it provides a
complete ideology for developing modern software.

But the article implies (and more than once) that the rover's architecture
borrows from Erlang, while the opposite is true. Erlang adopted common best
practices from fault-tolerant, mission-critical software, and packaged them in
a language and runtime that make deviating from those principles difficult.

The rover's software shows Erlang's roots, not its legacy.

~~~
nirvana
How is that possible since Erlang dates from the early 1980s, and the Rover's
OS is from the 1990s?

~~~
makmanalp
Almost all spacecraft software is written a in similar fashion, not just
Curiosity's.

------
Tloewald
Back in the 90s there was a software engineering fad (unfair term but it was
faddish at the time) called the process maturity index, and JPL was one of two
software development sites that qualified for the highest rank (5) which
involves continuous improvement, measuring everything, and going from rigorous
spec to code via mathematical proof.

This process (which Ed Jourdan neatly eviscerated when applied to business
software) produces software that is as reliable as the specification and
underlying hardware.

~~~
vonmoltke
It may be a fad for the industry at large, but it's a requirement in US
government contracting (as CMMI). It goes beyond software, too. My former
employer just got their systems engineering up to CMMI Level 5[1] and was
working hard on getting electrical engineering there (they are only at 3).

[1] Software had been there for a few years.

------
1gor

       Any _robust_ C program contains an ad-hoc, 
       informally-specified, bug-ridden, slow 
       implementation of half of Erlang...
    

<http://c2.com/cgi/wiki?GreenspunsTenthRuleOfProgramming>

~~~
gruseom
But in this case that's almost completely wrong. For example, "bug-ridden"?
Outfits like NASA use classical techniques (code inspection etc.) to ensure
that their software has exceedingly low error rates. This has been well
studied. Such an approach works, it's just too expensive for most commercial
projects. As for "slow", how likely is that?

On another note, it's pretty cool that the first three names credited in the
JPL coding standard document (which is linked to at the bottom of the OP and
is surprisingly well written) are Brian Kernighan, Dennis Ritchie, and Doug
McIlroy.

~~~
GuiA
>But in this case that's almost completely wrong.

On the second line of linked article:

"This is a _humorous_ observation"

(emphasis mine :) )

~~~
gruseom
Humor doesn't make it applicable. Greenspun's line had a specific meaning. It
doesn't stick to this surface.

------
donpdonp
"Recursion is shunned upon for instance,...message passing is the preferred
way of communicating between subsystems....isolation is part of the coding
guidelines... The Erlang programmer nods at the practices."

Best "Smug Programmer" line ever.

------
rubyrescue
Great article. The only thing he left out is the parallel to Erlang Supervisor
Trees, which give the ability to restart parts of the system that have failed
in some way without affecting the overall system.

------
matthavener
The biggest difference to Erlang is VxWork's inability to isolate task faults
or runaway high priority process. (Tasks are analogous to Processes in
Erlang). VxWorks 6.0 supports isolation to some degree, but it was released in
'04, after the design work on the rover started. Without total isolation, a
lot of the supervisor benefits of VxWorks goes away.

~~~
vbtemp
Hm.. What do you mean by isolating task faults? I think a lot of that depends
on the underlying hardware, right (e.g., if the board has an MMU)? I know you
can insert a taskSwitchHook (I think it's called) that could be able to detect
and kill runaway high-priority processes.

Edit: in response to the reply, I suppose I should have mean tasks instead of
"processes" (which in VxWorks would be the RTP)

~~~
noselasd
VxWorks has no processes[1]. It has tasks. Basically, you write kernel code,
there's no user mode.

[1] As someone mentioned, VxVorks 6 did introduce processes and "usermode",
called RTP. As with most features of VxWorks, you compile that into your image
if you want the feature. But there's a lot of inertia, and much of the VxWorks
stuff I see doesn't use RTP yet.

------
vbtemp
The motivation for writing the software in C is this: Code Reuse. NASA and
it's associated labs have produced some rock solid software in C. In space
missions commonly the RAD750 is used (with it's non-hardened version, the
MCP), along with the Leon family of processors. Test beds and other ground
hardware are often little-endian Intel processors. VxWorks is commonly used on
many missions and ground systems, but so is QNX, Linux, RTEMS, etc... The only
common thing the diverse set of hardware, operating systems, and compiler tool
chains all support is ANSI C. This means that nifty languages like Erlang or
whatever - though there may be a solid case for using them - is not practical
in this circumstance.

I know some clever folks in the business have done interesting work on ML-to-C
compilers, but it's still in the early R&D phase at this point - the compiler
itself would have to be thoroughly vetted.

~~~
jensnockert
I didn't read it as arguing against C, just noting that there seems to be a
lot of commonality between the ways the code in the Mars rovers are designed,
and the way that robust Erlang applications are typically designed.

~~~
jlouis
Precisely. One thing very must against using Erlang for this problem is that
you need hard real-time behaviour. Erlang does not provide that. The other
point, you need static allocation almost everywhere, is also detrimental to
using Erlang for the Rovers.

That leaves you with very few languages you can use, and C is a good
predictable one for the problem. Its tool support is also quite good with
static verification etc. And it is a good target for compilation. As someone
else notes, most of those 2.5 Megalines are auto-generated.

------
pgeorgi
"We know that most of the code is written in C and that it comprises 2.5
Megalines of code, roughly[1]. One may wonder why it is possible to write such
a complex system and have it work. This is the Erlang programmers view."

Contrast this with <https://www.ohloh.net/p/erlang>: "In a Nutshell, Erlang
has had 7,332 commits made by 162 contributors representing 2,346,438 lines of
code"

I'm not sure if those roughly 154kloc really make a difference...

~~~
deno
The 2.5 MLOC of NASA code is mostly generated.

I’m not really sure what point are you trying to make…?

~~~
pgeorgi
From an erlang programmer's point of view, 2.5MLOC are a complex system.

On the other hand, every Hello World in Erlang drags in about 2.5MLOC of
liability (even if much of that is never run). And I doubt it's all
autogenerated.

So if anything, 2.5MLOC of generated NASA code is probably less complex than
the erlang runtime.

~~~
deno
The article is not about the MLOC. And anyway, that 2.5 MLOC of Erlang
includes all kind of libraries, like Mnesia, an entire database application.

------
jeremiep
Great article! I'd like to add that the D programming language also offers a
lot of features to create robust code with multiple paradigms, although the
syntax is heavily C oriented rather than functional.

'immutable' and 'shared' are added to the known C 'const' qualifier for data
that will never change (contrary to not changing in the declaring scope only)
data which is shared across threads, everything else is encouraged to use MPI
using the std.concurrency module.

Pure functional code can be enforced by the compiler by using the 'pure'
qualifier. There is even compile time function evaluation if called with
constant arguments, which is awesome when used with it's type-safe generic and
meta-programming.

There's unit tests, contracts, invariants and documentation support right in
the language. Plus the compiler does profiling and code coverage.

I'd be curious to test D against Erlang for such a system. (Not saying Erlang
shouldn't be used, it's the next language on my to-learn list, just that the
switch to functional might be too radical for most developers used to
imperative and OO and D provides the best of both worlds.)

~~~
misnome
I've been interested in D for a while, for these reasons and more - it's
features look nice, but it never seems to have gotten much
popularity/mindshare. Could you hazard a guess why?

~~~
pjmlp
In the early days there were some issues in the community which lead to the
Phobos/Tango divide in the standard library for D1.

This is now past history as the community has joined around D2, just known as
D, and strives to reach compliancy with the "The D Programming Language" book
written by Andrei Alexandrescu.

D2 development is made in the open with source code available in GitHub.

Besides the reference compiler, dmd owned by Digital Mars, there are also LDC
and GDC compilers available. Currently it seems that GDC might be integrated
into GCC as of 4.8 release.

Right now it seems more people are picking up D, mainly for game development
projects.

~~~
CJefferson
Indeed, the D1 splits, and and also questions being answered with "that will
be in D2". There is now a problem that there is not yet a consensus about how
D2 compares to C++11.

------
sausagefeet
Does anyone have any knowledge of why Ada isn't used over C? Specifically, it
seems like Ada gives you a lot better tools when it comes to numerical
overflows/underflows.

Also, what compiler does NASA use? Something like CompCert? What kind of
compiler flags? Do they run it through an optimizer at all?

~~~
vbtemp
See my post below - to reuse code cross platform. There's a diverse set of
compiler toolchains, operating systems, architectures. Only ANSI C is
supported by all of them. The compilers are specific to the target OS and
hardware, and flags are unsurprisingly the strictest possible for C89.

~~~
ibotty
i'd think that you can run ada generated code pretty much everywhere. even on
obscure hardware that works in space.

------
davidw
Great article and comparison, and a nice way of highlighting one of Erlang's
strengths.

However: I'm dubious that it's a strength many people here need. No, the
article did not say anything about that, but I am. A few minutes of downtime,
now and then, for a web site that's small and iterating rapidly to find a good
market fit, is not the biggest problem. And while Erlang isn't _bad_ at that,
I don't think it's as fast as something like Rails to code in, and have all
kinds of stuff ready to go out of the box.

That said, I'd still recommend learning the language, just because it's so
cool how it works under the hood, and because sooner or later, something will
take its best ideas and get popular, so having an idea how that kind of thing
works will still be beneficial.

~~~
timClicks
As you mentioned Rails, I thought I should mentioned Chicago Boss. It's a
blindingly fast Rails-inspired framework that takes many Erlangisms out of
coding in Erlang: <http://chicagoboss.org/>

------
DanielBMarkham
Message-passing better than STM? Wonder why?

~~~
matthavener
I think two reasons: 1) VxWorks directly supports message passing
(<http://www.vxdev.com/docs/vx55man/vxworks/ref/msgQLib.html>). 2) They seem
to prefer simple, obvious, "less magic" interfaces. STM is nice for its
"magic", but message creates very well defined, documented interfaces between
code.

~~~
deno
> STM is nice for its "magic", but message creates very well defined,
> documented interfaces between code.

The author’s previous post has a good overview of how message passing
contributes to that as well.

[http://jlouisramblings.blogspot.com/2012/06/protocols-in-
kin...](http://jlouisramblings.blogspot.com/2012/06/protocols-in-kingdom-of-
distribution.html)

------
thepumpkin1979
deno, I was wondering, if so similar to erlang, why not use erlang instead C?
What is the major drawback, footprint?

~~~
malkia
The OTP virtual machine takes a lot of memory. It's an interpretter, which
means much slower execution, as article and others above pointed - The Erlang
VM is soft-realtime, it can't guarantee that something would finish in certain
amount of micro or milliseconds, or if it does guarantee - it's too much for
what they need (just guessing here).

But the concepts are very similar - message passing being the way to
communicate between modules, rather than shared memory ways.

This brings another topic - the Linux vs Minix debate :) - I guess there are
right things to be done for the right time, and right target. It's just
getting all these things right is the hardest.

~~~
deno
> It's an interpretter, which means much slower execution

Only BEAM is an interpreter, there are HiPE and ErlLLVM backends as well. You
can also write NIFs — functions in C that can be executed within VM.

------
ricardobeat
So a Mars Rover is much closer to a browser/backbone/node.js app than I could
ever imagine. The basic structure is surprisingly similar to javascript apps
these days: isolated modules, message passing/event loop, fault tolerance.

~~~
jlouis
Node.js is cooperatively multitasked. VxWorks (and Erlang) are preemptively
multitasked. So the basic structure is quite different. If one of your node.js
events infinite loops, it is game over. Be it web server or rover. Not so
here.

~~~
jeremiep
Node.js is preemptive, there's only one thread running JavaScript but multiple
C++ worker threads doing work on behalf of the script.

vibe.d on the other hand is cooperative since it uses coroutines for
concurrency.

~~~
malkia
Please, explain what do you mean by that - "pre-emptive"?

My understanding (coming from C and OS terms) is that pre-emptive means taken
over. e.g. if I have a real OS thread it is being temporarily "paused" and the
resources (CPU/mem/io) are given to something else. At some point control is
restored back.

But this is without the knowledge or instructions from the thread itself. So
things like priority inversions are hard to battle with pre-emptive
multitasking - for example thread A with low priority holding a mutex, while
thread B with higher priority waits for it. (and no need for mutexes, if only
message passing is to be used).

~~~
jeremiep
The node.js worker threads are native threads, which are preemptive on all
current platforms. The JavaScript context is running an event loop which most
likely must perform locking on its message queue and callbacks to async
operations are queued for execution on a future tick of this event loop. All
of this seems very preemptive to me.

What seems like cooperation in node.js is really just async operations queuing
up on the event loop. Since requests are also async events, they get
interlocked with callbacks from existing requests.

To me, cooperation is when you yield the thread to another coroutine. This
saves the state of the call stack, the registers, everything; meaning you
don't force your user to keep that state in closures. The user code in a
cooperative environment feels sequential and blocking and results are passed
by return values, not by calling continuations.

Its also friendlier to exceptions since it doesn't lose the entire call stack;
with node.js you only get the stack since the beginning of the current event
loop's tick.

~~~
deno
> The JavaScript context is running an event loop which most likely must
> perform locking on its message queue and callbacks to async operations are
> queued for execution on a future tick of this event loop. All of this seems
> very preemptive to me.

There’s a single loop which blocks until the task _yields_ while waiting on
the result from one of the worker threads. That’s cooperation. All queued
connections are starved until that happens. In Erlang, or just with pthreads,
the connections are processed independently. Think separate event loops for
each connection.

> To me, cooperation is when you yield the thread to another coroutine. This
> saves the state of the call stack, the registers, everything; meaning you
> don't force your user to keep that state in closures. The user code in a
> cooperative environment feels sequential and blocking and results are passed
> by return values, not by calling continuations.

That has noting to do with how execution is scheduled. Cooperative scheduling
requires passing continuations[1], so the execution can be resumed while it
waits. The simplest implementation is to use callbacks, the way node.js does
it. Futures and deferreds[2] are a little bit more sophisticated (Python’s
Twisted, probably something for node.js exists as well), as they allow for
better composition. And of course you can hide the continuations entirely,
which can be done in both Scala (compiler plugin) and Python (gevent or using
generators), rewriting the direct control flow by breaking it on yield points
automatically (this is how exception throws work in most languages btw), but
the limitations inherent in having a single event loop per thread will still
exist.

[1] <https://en.wikipedia.org/wiki/Continuation-passing_style>

[2] <https://en.wikipedia.org/wiki/Futures_and_promises>

~~~
ricardobeat
> a single loop which blocks until the task yields while waiting on the result
> from one of the worker threads

Yes, node.js is cooperative, yet since all I/O is asynchronous the time spent
blocking is mostly dispatching and simple operations, it doesn't block while
waiting - that's where it's performance and high concurrency comes from. Doing
CPU-heavy work in the server/main process is a no-no.

~~~
deno
Obviously that approach is fine enough for many things. Before node.js, people
have written those kinds of servers in Twisted or Netty, with great results.
Netty based framework powers, for example, much of Twitter. I was just
explaining how the scheduling works :)

