Things that Saved Apollo 13 (2010)

hudibras · on April 10, 2015

I always assumed the electrical short happened the first time they stirred the oxygen tank, not the fifth time. I makes me nauseous to think about what would have happened if the short didn't happen until the tenth time the button was pushed and two of the astronauts were on the Moon. Certain, slow death for all three...

kjhughes · on April 10, 2015

This original (2010) article is being expanded this month to "13 MORE Things That Saved Apollo 13":

http://www.universetoday.com/119747/13-more-things-that-save...

Unfortunately, it's basically only a teaser article at the moment:

    Over the next few weeks, we'll look at 13 additional
    things that helped bring the crew home safely.

tempestn · on April 10, 2015

The fact that they launched knowing about the pogo problem and its possible catastrophic implications is brutal. I could almost understand (not justify) it if they had yet to land on the moon, given the fervor of the space race, but given than milestone had already been reached, you would think that avoiding a disaster would be a higher priority.

Presumably (or at least hopefully), like the space shuttle foam, the problem was known, but not the extent or likelihood of its consequences.

outworlder · on April 10, 2015

Then there's the Challenger incident. The fact that the O-rings were failing was well known, there had been documented damage before that flight. They were a mission critical component, but someone decided that, since they were redundant, the fact that one set of O-rings was damaged was enough. But that was actually against regulations.

That they could fail in very cold environments was also known. And yet, the shuttle was ordered to launch anyway.

avereveard · on April 10, 2015

it's even worse. on top of corrosion issues, they were certified for ranges to x to y degrees of temperature, that day the temperature was below minimum certification range, so management reasoned that since it was outside test range, there were not enough data point to warrant a scrub.

saalweachter · on April 10, 2015

You know, I don't think that management made such a call is rare or unique to NASA, and I don't think that such a call cost lives is even that rare.

The problem is that the decision wasn't "push a button and kill a half dozen people", it was "launch and maybe have a problem or not launch and definitely waste X billion dollars and Y man-centuries of time". And we're terrible at those decisions. We - as individuals and organizations - are terrible at eating the sunk costs and paying definite to avoid maybe-catastrophes. Maybe sometimes it's rational but a lot of the time it isn't.

And it's not just "management". All of us make these kinds of decisions when we ignore bugs to meet deadlines or push a feature that isn't really working just because we spent so much time on it.

JamesSwift · on April 10, 2015

I understand the point you are trying to make, but I disagree that 'all of us' make these kinds of decisions. The catastrophes the vast majority of us will run into are nothing compared to the catastrophes at play here.

saalweachter · on April 11, 2015

The catastrophes involving space vehicles are certainly more impressive and public and obvious, but they aren't particular large in terms of loss-of-life. It's not hard to beat with a building fire or a large freeway accident, let alone a series of small failures or the really impressive failures like structure collapses or industrial-scale explosions. And while it's true that most of us aren't working on things which by themselves have the potential to cause these disasters by themselves, many of us are working on things that could spark an accident or cause one in progress to get worse. Any small electrical device could start a building fire. I've seen small USB devices, when left in a laptop in a bag, start smoldering. It was only luck that it was noticed before combusting. Nearly any component of an automobile could cause an accident -- the ignition switch, for instance. I don't remember if the floor mats were exonerated in the "stuck accelerator" cases (did it end up being driver error?). People use Maps applications and GPSs to find hospitals -- if you don't return the closest hospital or send people to a wrong address, people could die. Most of us don't work on Therac software that can directly shoot people to death, but it's not hard for your bug to be one of a chain of things that go wrong.

Meanwhile, a lot of us work in or contribute to a culture which promotes risk-taking. Risk-takers get a lot of the done and only sometimes have problems, and get promotions, bonuses, congratulated, and trips to tropical locales. Sometimes things go wrong, but a lot of times you'll be forgiven because you perform so well. If you're overly cautious, things may never go wrong, but you also won't get as much done and you'll probably not get as much praise, and might even be let go for being low-performing. This may not directly result in any deaths, but it contributes to an industry-wide culture where risk-taking is the norm and success is defined as "getting a lot right" and not "getting very little wrong".

avereveard · on April 11, 2015

I add that these are companies where management is there to organize. when management takes over engineering, then it's over.

see also http://www.amazon.it/Car-Guys-vs-Bean-Counters-ebook/dp/B004...

and here for something more close to my experience and a partial counterpoint, see http://www.joelonsoftware.com/articles/DevelopmentAbstractio...

it is most apparent in engineering focused companies.