Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The First Bug on Mars (viva64.com)
49 points by AndreyKarpov on Dec 19, 2016 | hide | past | favorite | 9 comments



From "REALLY REMOTE Debugging: A Conversation with Glenn Reeves" (2010) [0]:

We did see the problem before landing, but could not get it to repeat when we tried to track it down. It was not forgotten nor was it deemed unimportant. Yes, we were concentrating heavily on the entry and landing software. Yes, we considered this problem lower priority. Yes, we would have liked to have everything perfect before landing. However, I didn't see any problem, other than that we ran out of time to get the lower priority issues resolved.

We did have one other thing on our side -- we knew how robust our system was because that is the way we designed it. We knew that if this problem occurred, we would reset. We built in mechanisms to recover the current activity so that there would be no interruptions in the science data (although this wasn't used until later in the landed mission). We built in the ability (and tested it) to go through multiple resets while we were going through the Martian atmosphere. We designed the software to recover from radiation induced errors in the memory or the processor. The spacecraft would have even done a 60-day mission on its own, including deploying the rover, if the radio receiver had broken when we landed. There were a large number of safeguards in the system to ensure robust, continued operation in the event of a failure of this type. These safeguards allowed us to designate problems of this nature as lower priority. We had our priorities right.

[0] http://www.drdobbs.com/architecture-and-design/really-remote...


I don't understand. Flip a flag. That was the bug fix. They instead tested around it, considered it a reasonable risk, launched a rocket to Mars without flipping the flag. What could they have been thinking.


First, note the phrase "could not get it to repeat when we tried to track it down." Secondly, I'm pretty sure that that flag wasn't just labelled SERIOUS_RUNTIME_ISSUE_ACTIVE; there were presumably other good reasons for the flag to be set the way it was, and other serious problems that flipping it might have caused, as far as they could determine pre-launch.


Nope. It was a semaphore, and it had a flag, and it was set wrong (contrary to policy), and they noticed it and documented it. Whether or not they could reproduce the problem they had already predicted the only reason NOT to fix the bug was, institutional inertia. I.e. can't change code because the process says we cant without an elaborate ritual.


Is it just me that finds this 'article' completely unreadable? There are diagrams given with very little explanation and there seems to be very little to guide the ready through what the problem was and how it was solved.


On a side note, I'm guessing many of us have seen the movie "The Martian", but whether you enjoyed it or not, I would highly recommend the novel. The detail that goes into how he survives is amazing, which is somewhat missing in the film.



The title is borderline clickbait considering the search for life on Mars.. It's really the "First [Computer] Bug on Mars".


I assumed that if life had been found on Mars, I wouldn't be hearing about it from an un-famous blog on Hacker News. It's just a pun, and that doesn't make it clickbait, just good headline-writing.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: