
What really happened to the software on the Mars Pathfinder spacecraft? - skeletonjelly
http://www.rapitasystems.com/blog/what-really-happened-to-the-software-on-the-mars-pathfinder-spacecraft?reposting
======
ams6110
_Engineers later confessed that system resets had occurred during pre-flight
tests. They put these down to a hardware glitch_

This doesn't really ring true to me. Sounds like the sort of programmers who
are inclined to blame "cosmic rays" for odd behavior that they don't
understand.

I would think on a mission as expensive as the pathfinder, and knowing that
once the hardware is launched you will never have physical access to it again,
ANY anomalous behavior would have been tracked down to root causes during
testing.

~~~
gizmo686
>\Sounds like the sort of programmers who are inclined to blame "cosmic rays"
for odd behavior that they don't understand.

In fairness, they go out of their way to harden they computers against cosmic
rays. I am sure that they make a decision about what an acceptable amount of
cosmic ray induced error is, and design the system knowing that it will
happen. Having said that, I agree that having unexplained problems on the
ground should get an explanation before being launched. Even if it was a
hardware glitch, they should either have told the hardware people that there
was a glitch, or the level of error was within the designed for range and they
should not have been suprised when it happened in space.

~~~
nknighthb
In further fairness, Pathfinder's name was also its job description. It was
built on a relative shoestring and in a way, fulfilled its most important
function the moment Sojourner rolled onto Martian soil. It was basically a
combination prototype/advance scout.

~~~
mturmon
Absolutely correct. More context: Pathfinder was done during the Goldin era of
better, faster, cheaper.

------
taspeotis
Seeing this article reminds me of [http://www.fastcompany.com/28121/they-
write-right-stuff](http://www.fastcompany.com/28121/they-write-right-stuff)

~~~
jon2512chua
Interesting read.

Though correct me if I'm wrong, but I find the article rather biased against
iterative/agile way of writing software. Don't get me wrong, I believe there's
a time and place for over-2500-pages-of-specs-for-6000-LoC kinda process, but
the article seems to be saying that quick iterations are of the stone age/how
little children do it and that the way the on-board shuttle group software
team wrote their software is the "perfect" way.

~~~
taspeotis
It's been a while since I've read the article, but I skimmed over it and
didn't really get that vibe from it.

I felt it was more "these guys are extreme outliers in terms of software
quality, but also extreme outliers in terms of process."

------
kotnik
For a (much) more detailed story, go here:
[https://www.cs.duke.edu/~carla/mars.html](https://www.cs.duke.edu/~carla/mars.html)

~~~
mturmon
Thanks for this. It gets in to details about how e fix was deployed and says
more about why it was not caught during testing.

------
mdturnerphys
Did anyone else notice the "?reposting" in the URL? It looks like someone else
submitted this 8 hours earlier
([http://news.ycombinator.com/item?id=5991503](http://news.ycombinator.com/item?id=5991503)).

~~~
willvarfar
none of us are trawling /new and upvoting anything, sadly. Great posts
regularly don't get a single click the first time through the HN mill. To be
on the front page an article needs to be 1) clickbait title, 2) good content,
3) lucky, 4) posted when the Americans are awake.

Oh well.

~~~
lucaspiller
The last bit is the main thing I think. The original has exactly the same
title and original poster has a lot higher karma.

------
Sami_Lehtinen
I've been using OCC to avoid this problem with my apps, and I couldn't be more
happy. It mostly voids issues with priority inversion. (Not exactly, if using
pre-emtive scheduling and long running high priority tasks.) It doesn't matter
if something is stuck, looping, crashes or so, it still won't bring down the
whole system. + Offers great parallel performance as long as resources aren't
too widely shared. I got sick'n'tired administering (and creating) systems as
DevOp, when I were using traditional locks and all the problems which those
caused, even on top of poor performance.

~~~
raverbashing
What's OCC?

~~~
taspeotis
Took quite a bit of Googling, but it looks like it's optimistic concurrency
control.

------
dschiptsov
There were rumors that older generations of spacecrafts fly on lisp systems.
Nevertheless, you see, they had C-interpreter (REPL) there.))

~~~
dandrews
You're likely thinking of this essay: [http://www.flownet.com/gat/jpl-
lisp.html](http://www.flownet.com/gat/jpl-lisp.html)

~~~
lisper
Or this:
[http://www.youtube.com/watch?v=_gZK0tW8EhQ](http://www.youtube.com/watch?v=_gZK0tW8EhQ)

~~~
dschiptsov
Thank you for this. This is almost a meta-story, how "politicians"/"managers"
are always ruining everything that is good and true, meaning of which they
genetically incapable to grasp.

------
comatose_kid
I did some embedded sw on vx works based systems back in the day. Any embedded
engineer we interviewed had to know what priority inversion was.

~~~
joezydeco
Yeah, this has been a common topic in any embedded job interview I've had
since the 90s, and not just on VxWorks jobs. The Pathfinder case study is
pretty well known.

------
damian2000
TL;DR concurrency is hard

