Hacker News new | past | comments | ask | show | jobs | submit login
What really happened on Mars? (1997) (microsoft.com)
67 points by noir_lord on June 23, 2015 | hide | past | favorite | 19 comments



“We’ve been looking through the old Pathfinder software. We got duplicate computers up and running for testing. Same computers they used to find a problem that almost killed the original mission. Real interesting story, actually; turns out there was a priority inversion in Sojourner’s thread management and—”

“Focus, Jack,” interrupted Venkat.

Weir, Andy (2014-02-11). The Martian: A Novel (p. 121). Crown/Archetype. Kindle Edition.


I couldn't put that book down. It was probably one of the best bits of hard science fiction I've read recently.


Ahh... The classic "better than we ever imagined" turns out to be a nightmare. This should be required reading for those who test software.


I didn't understand that part. It describes "high data rate" as "best case". Does "better than we ever imagined" means they were overflowing in good usable data? Because too much data sounds like a problem.

Unrelated, note how difficult it is for people to accept 100% responsibility for a problem. For example:

> Did we (the JPL team) make an error in assuming how the select/pipe mechanism would work ? Yes, probably.

There's no "probably"! A mistake was clearly made.


Programmers and lawyers are notorious for adding conditions to their statements to avoid liability.

"Will this be done Thursday?" If you answer with a confident "yes" you're going to learn the cost of commitment when an unknown bites you in the ass.

It's always about degrees of probability.


This is science, all knowledge is both inductively reached and imperfect. More people should use the word "probably"—if you're going to assign blame, which is not always necessary, it is important to do so accurately.


Everything is probability. Why since out some statements as deserving of qualification?

You said all knowledge is inductively reached, why didn't you say "probably"?!


There are (at least) two ways of reading this. The first is probably how you are reading it: the assumption caused an error. In this case, it is probably not a "probably," since an error did indeed occur. A second way to read it is whether it was an error on their part to make the assumption, and that is harder to say since they could have just been following the data they had on hand (a "you couldn't have known" scenario).


I think he's speaking from the perspective of the mission, not from the perspective of the computer.

From a mission perspective, overflowing with data is a good place to be. From a computer perspective, overflowing with data is a problem that needs management.


what I find weirder is how quick people are to assign 100% blame when 99.9% of the time several people could have detected or prevented the problem.

People work as a team - in fact often teams of teams - yet we assign blame to individuals.


I experienced a layoff as a result of this once - essentially the product development cycle at this company did not properly account for implemenation, much less scaling for clients' use cases. QA was flawed as a result, and products shipped buggy. Deploys were based on releasing new features or for bug fixes for bugs that important people within clients noticed. The process ended up with me accidentally releasing a broken Android apk to production, and I ended up falling on the sword with no one else taking responsibility for the resulting quality.

Needless to say, that company bled a lot of high quality developers as a result, somehow turning a likely success to likely failure.

Startups should beware of such behavior - it is a great way to cause the company to self-destruct when you are not giving all stakeholders adequate say on deliverables.


Well, the properly might refer to if it was reasonable to make that assumption.


> No, we did not use the vxWorks shell to change the software (although the shell is usable on the spacecraft)

That adds new meaning to the phrase "remote shell". Can you imagine how irritating the latency would be trying to use a shell on Mars?


>Can you imagine how irritating the latency would be trying to use a shell on Mars?

I can I have Comcast. But on a serious note I think for applications like this a Puppet/Chef style recipe will be used. A complex series of checks and bounds before and after the changes have been applied and if anything sightly goes awry, revert back to the previous working condition.


Remote Agent on board Deep Space 1 was debugged that way:

http://flownet.com/gat/jpl-lisp.html


RS6000 processor! Wow.... I remember doing work on one of these back when I was in college. Turned out I became the only person in my state to be able to maintain the old RS6000 machines.

...haven't touched or heard of one in over a decade.


Is an embedded real-time OS like VxWorks still used in this role on this kind of mission today, or is a general-purpose OS like Linux used instead?


Pretty sure VxWorks is still used today for a lot of spacecraft; Curiosity[^1] and at least at one point the SpaceX Dragon capsule[^2] it looks like.

^1: http://www.windriver.com/announces/curiosity/ ^2: http://www.spacex.com/sites/spacex/files/pdf/DragonLabFactSh...


Yes, lots of spacecraft use traditional RTOSes like VxWorks, RTEMS, ThreadX, uCos, etc. Keep in mind that there are many computers on a large spacecraft - attitude control, command and data handling, instruments, and other microcontrollers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: