Engineers at Thiokol also were increasingly concerned about the problem. On July 22, 1985, Roger Boisjoly of the structures section wrote a memorandum predicting NASA might give the motor contract to a competitor or there might be a flight failure if Thiokol did not come up with a timely solution.
Nine days later (July 31) Boisjoly wrote another memorandum titled "O-ring Erosion/Potential Failure Criticality" to R. K. Lund, Thiokol's Vice President of Engineering:
"The mistakenly accepted position on the joint problem was to fly without fear of failure and to run a series of design evaluations which would ultimately lead to a solution or at least a significant reduction of the erosion problem. This position is now changed as a result of the [51-B] nozzle joint erosion which eroded a secondary O-ring with the primary O-ring never sealing. If the same scenario should occur in a field joint (and it could), then it is a jump ball whether as to the success or failure of the joint because the secondary O-ring cannot respond to the clevis opening rate and may not be capable of pressurization. The result would be a catastrophe of the highest order-loss of human life."
Boisjoly recommended setting up a team to solve the O-ring problem, and concluded by stating:
"It is my honest and very real fear that if we do not take immediate action to dedicate a team to solve the problem, with the field joint having the number one priority, then we stand in jeopardy of losing a flight along with all the launch pad facilities."
Boisjoly recommended setting up a team to solve the O-ring problem
And there was a team working on it. IIRC, a redesign of the joint was being worked on. But no one suggested, before the evening before the launch, that launches should be stopped until the redesign was completed.
The fate of the Challenger engineers is depressing because they were essentially kicked out of their profession for knowing the right and moral answer. Thankfully, very few of us will be faced with these honestly life-or-death technical decisions.
I think there are parallels to right-or-wrong issues that we face in our own industry on a regular basis, for example:
Are phrases like "standard industry practice" and "covered by the click-through license" today's weasel words that rationalize us implementing immoral management demands?
> I think there are parallels to right-or-wrong issues that we face in our own industry on a regular basis, for example:
If our software deals with privacy and personal information then it can even become life or death in some countries. A dissident in a brutal authoritarian regime could lose his life if the software has a bug in it, for example.
The other area were moral choices take place is in licensing and patents. Can be on both large scales (like we see with big companies use busybox for their platform) or smaller scales (copying code from one project to another without even giving credit).
Yeah that is a good point. One can argue that technology is a tool and it can be used for evil. But with this stuff:
"without IBM's machinery, continuing UPKEEP and SERVICE, as well as the supply of punch cards, whether located ON-SITE or off-site, Hitler's camps could have never managed the numbers they did" [emphasis mine]
The idea is basically:
So there were not pocket calculators that they bought and then deployed in concentration camps and IBM can claim complete ignorance over the use. These machines need constant & professional service. Having them onsite meant that IBM sent technicians there probably. One had to wonder what those technician's response do the 24/7 billowing crematorium and emaciated bodies were.
However, on the other side (and I have not read the book, just looked at the topic many years ago superficially) I am not aware of any IBM engineers' stories about an actual site visit... So I don't know what to think and how guilty IBM really is...
I don't know, I think that IBM per se was really in a position to know how the census machines they had given to Germany were actually being used. Remember that this was before the days of satellite communication, and as part of their war strategy the British cut Germany's telegraph cables (just like they had in WWI) and intercepted packets of mail between the US and Germany whenever they could, and generally did a pretty good job of it[1]. I'm happy to believe that people who were nominally IBM employees were servicing the machines, but if IBM headquarters wasn't being told what was happening, and if the employees faced getting shot if they didn't do what they were told, I'm not sure how much you can say that IBM knew in any meaningful sense.
[1] This was a point of contention between the UK and US before the fall of France.
However, I was struck by wrong he got it. He was making a point about how just having the data available isnt enough - the modern challenge is to make sense of big data in a meaningful way through visualization. And to back this point up, he made it sound like a bunch of engineers didn't look closely at the data sheet for the rubber used in their o-rings and oopsy! The challenger blew up. Shrug, Who knew?
I'll tell you who knew. Roger. "The engineers" knew perfectly well what was going to happen. It was management who refused to listen to them and launched anyway.
Feyman's account of his involvement in investigating this disaster in "the pleasure of finding things out" was excellent. In one exercise, he bad the engineers and management write down what they honestly thought the failure rate of the shuttle was. Management quoted 1 in 100,000, while engineering quoted 1 in 100. The challenger disaster was a symptom of a complete breakdown in communication between engineering and management at NASA.
One of my college mechanical engineering professors was a former NASA engineer involved in determining risk and investigating problems. He gave a gave a lecture & led a discussion on the Challenger O-Ring case (and I have since read a lot of the reports). As you mentioned, the CMU lecturer wasn't even close to the real story. But it also ins't so simple as to say 'managment didn't listen'.
The engineers knew what was going to happen with a fairly high degree of certainty - enough that the launch shouldn't hvae happened. Then they turned to management and tried to present their findings. Clearly the business guys didn't want to be the ones to hold up the launch, they had a strong incentive not to. So the engineers had to convince them why the company should basically stick its neck out as one of hundreds and hundreds of vendors and stop this launch. The engineers didn't do the best job of communicating the certainty of the problem, the magnitude of the problem, etc. So it was a little bit of both.
That lecture was probably 8 years ago for me, but it definitely left a lasting impression. Today I work on systems that truthfully are more complicated than the space shuttle and also have a lot more lives at stake. "Intellectual honesty" is a phrase I live by. There have been times I fucked up some math and knew we needed to fix something - even though it would money and delay, but I stuck to my guns and made sure it happened. The alternative is too likely to lead to disaster. It keeps me up at night, and those are the near misses. I can't imagine what that guy went through knowing they almost stopped it, but didn't.
Interesting perspective, thanks for sharing. Can you say what you work on, or at least give us an idea? Systems more complicated than the space shuttle - that sounds really interesting and I'm not sure which ones those are.
I'd rather not share on HN because of how often it gets crawled by Google. You can generally find it through my LinkedIn which is easy enough to find from my HN handle.
I just read "The Challenger Launch Decision" (after someone here recommended it) and it has a more nuanced view of what went wrong. While there was a long history of concern about the O-rings, there was also a history of getting conflicting data, of implementing "fixes" that appeared to work, only to come back later. Even the engineers themselves, while expressing concern like the memo mentioned in the article, they did not, until the launch decision conference the night before, voice anything as strong as "stop the launch".
During reading the book, I was struck by two things: one was the "slippery slope" introduced by first seeing something and arguing it was not a launch risk. With the decision making system in place, that meant it was very difficult to then, when another datum came in, to say that the two data together implied something was wrong, because they had already argued that the first datum was not a risk. This is of course a well-known phenomenon, but it didn't seem like the system was well-equipped to deal with sparse and sometimes contradictory data.
The second was the feeling that what was happening was that the engineers were observing a random process and post-rationalizing cause and effect. (I had just read "Fooled by Randomness", which may have contributed to this.) Every piece of information was made to fit into some model, but it seemed like no one was considering that maybe what they were seeing was just inherent randomness. With that view, the progressively bad outcomes that happened before indicated that what was observed was a poorly-characterized random process with a fat tail. At some point, one rationalization was that they had a "safety factor of 4 left", but if you have indications of a fat-tailed process, that's not much to bank on.
It's detailed, but well-written and the quality of the analysis is extraordinary. If you want a preview, take a glance at the debris analysis on page 40. If you like understated drama, start at the comm transcript on page 42. ("Lock the doors.")
Yeah, I remember reading the CAIB report when it came out. Scott Hubbard, one of the board members, also came by UCSC and gave a talk about the investigation, like how everyone had assumed that a piece of foam couldn't do anything serious to the RCC leading edge until they shot a foam piece of similar size and speed at a spare with an air gun and it went clean through.
His argument (which the linked article takes to task) is that the engineers knew but were unable to convince due to their lack of ability to put together a convincing case for it. While Tufte's argument might have some merit, there was probably a lot more going on than just that. Perhaps even with a very compelling case presented, the launch would have gone through.
I think the point with that was that those who needed to know didn't know because they hadn't been presented the data in a meaningful way.
So you are correct that he knew, but you are wrong in claiming the lecturer got it wrong. It is exactly because those who makes the decisions don't have the necessary know-how that visualizing the data in a meaningful way is so much more important.
The decision makers have a very, very strong incentive to keep things going. And the PR circus surrounding the teachers made it worse in the Challenger case. Meanwhile, everyone else has a strong incentive to shut the hell up.
Besides, saying the engineers didn't "try hard enough" is essentially claiming the engineers were in charge. They were not. Managers are paid to manage their department, not just please the higher ups. When a claim is made that failure to address a problem will result in catastrophe and loss of life... really what more needs to be said, or drawn, to make the point? If you're not going to trust your own experts, why have them at all?
And how does the Challenger explosion in any way suggest that there aren't enough engineers? Certainly this one major issue had engineers urging for delay, so there doesn't seem to have been a lack of them.
You know that all of the shuttles risks are well documented, right? It's a mountain of issues that basically gets the "ok to be ignored for now" stamp at every launch.
After reading Feyman's account, I believe that no amount of visualization nor compelling case making would have made any difference.
1 in 100,000 means launching the shuttle 3 times a day for nearly 100 years and experiencing only one failure. If the logical side of their brain is that far out to lunch, why would you think that the problem was simply an insufficient appeal to logic?
Visualization is not as such about logic, its about interpretation and communication.
So the problem was more about insufficient appeal to emotion than to logic. Whether that would have been possible in the case of Challenger is another question, obviously retrospective analysis is always easier.
But in my mind there is no doubt that visualization are able to change peoples opinions and understanding. I have seen that first hand on some projects.
I don't dispute what you say, but I think it's too easy to absolve the engineers of any fault. If they knew that the public was being lied to, and that there was a good chance of people being killed, they should have been more vocal. When lives are on the line, you can't just say, "well, I told management ... "
"When he was pressed by NASA the night before the liftoff to sign a written recommendation approving the launch, he refused, and later argued late into the night for a launch cancellation. When McDonald later disclosed the secret debate to accident investigators, he was isolated and his career destroyed."
This is quite simply astonishing; everything I have ever learned in my engineering classes said to do what McDonald did and look what happened to him.
Having integrity would be easy if you never had to fight and the results always benefitted you. Life is not a fair thing.
Engineers place a lot of emphasis on having the right answer and almost no emphasis on ensuring their influence. It doesn't matter how right you are if nobody will follow your direction.
Yes, this to me is the most disturbing aspect of the story. It's one thing that the mistake was made in the first place; no one could be completely sure what would happen. But to then ostracize the people who warned of the possible catastrophe -- that's a corrupt culture.
may be slight offtopic, but this kind of news touches me. It shouldnt since NASA is a government agency and we all know how bureaucratic it can get, but for some reason NASA always stand to me as a well behave agency with strong morale. This is an exception from all other government entities. But this news proves my feeling were wrong.
Ronald Reagan was supposed to give a state of the union address a couple days after the launch. He wanted to use the success of the space program as part of that speech. While there was no direct order, it's clear that NASA management felt significant political pressure to push the launch forward or risk reduced funding from congress. It's clear they screwed up, just wanted to add some context to the climate when they made these decisions.
This comment needs more upvotes. The political pressure on NASA is extreme. The only people doted over more than the astronauts were the politicians. Look into the origins of the Triana satellite and its subsequent fate for a particularly ridiculous example.
"When the space shuttle Columbia burned up on reentry in 2003, killing its crew of seven, the accident was blamed on the same kinds of management failures that occurred with the Challenger. By that time, Boisjoly believed that NASA was beyond reform, some of its officials should be indicted on manslaughter charges and the agency abolished."
"NASA's mismanagement "is not going to stop until somebody gets sent to hard rock hotel," Boisjoly said. "I don't care how many commissions you have. These guys have a way of numbing their brains. They have destroyed $5 billion worth of hardware and 14 lives because of their nonsense." "
Yes, nothing has changed. Lessons were not learned. And this was known before the Columbia disaster, in fact not long after the commission on the Challenger. It bears striking resemblances to the financial crisis.
"Their pleas and technical theories were rejected by senior managers at the company and NASA, who told them they had failed to prove their case and that the shuttle would be launched in freezing temperatures the next morning. It was among the great engineering miscalculations in history."
Horsepucky: the engineers calculations weren't mis-anything. The historical record has long since proven that the engineers were repeatedly ignored by their management and by bureaucrats at NASA.
I spent 6 years as a structural engineer on space shuttle flights. A few thoughts:
While I don't know the people involved with Challenger - I was in 6th grade at the time - it goes well against my own experience that NASA management had anything but the interests of the crew in mind. To a fault. In fact, your average NASA employee doted on astronauts like a star struck little girl. What the crew wanted, the crew got. You could always tell who the astronaut was when you saw a group walking about the centers - he/she was the one whose every semi-whimsical comment extracted voluminous and polite laughter from the others in the group.
I was, however, working when Columbia blew up. In fact, my mission was supposed to fly on it when it got back. Although sad, I feel comfortable saying that most of the people working on these things sort of know it's going to happen from time to time. It wasn't exactly surprising to us or the crew.
Blaming managers and celebrating engineers is overly simplistic. The line is not as well defined as you might think. I had few - if any (I can't think of a single one, actually) - managers (either contractors or NASA employees) who were not experienced engineers.
The safety rules for a shuttle payload, let alone the actual orbiter, are voluminous and arcane. It is the primary reason that very little new technology comes out of the manned space flight program. Everything new is considered too dangerous because it hasn't been flown before.
This stuff is insanely dangerous. It is pretty damn easy to come up with a way some piece of hardware you're working on could kill someone. The complexity is enormous. The number of people involved is in the thousands, and they're spread all over the country. Different centers have different rules.
As a result, you make life-and-death decisions literally every day. It's not such a big deal, because there is a lot of formal process in place to make sure it gets done right. The the "standards" are what keep space flight as we know it as safe as it is. Are they or the processes by which they are enforced perfect? Hell no.
The system failed. People failed. But we knew this would happen, and we did it anyway because it's the price of exploring the frontiers. We learned from Challenger. We learned from Columbia. We will learn from the next catastrophic failure. NASA isn't perfect. In fact, you might say the bloated organization and government involvement makes this sort of thing inevitable. But I bet the small privateers exploring manned space flight will run into their own challenges.
Basically, what I'm saying is that we need to keep this in a larger perspective. Obsessing over one failure in what is a centuries-long quest is not helpful. Dissect it, learn from it, and move on.
I took a professional ethics class in college, and the professor was a personal friend of Roger. The whole class was about Challenger, and the incredible failure of judgement around it's demise. Roger came into one of our classes and spoke to us late in the semester.
After listening to tapes of the trials, interviews, reading transcripts, and reading articles it was very apparent to me that this was a failure of management. The lead engineer, during the discussions of whether to launch the night before, was arguing that the engineering evidence did not support a launch under the temperature conditions projected for the following morning. He was told by Morton-Thiokol business reps to "take off your engineer hat, and put on your manager hat".
Evidence points to this failure happening because NASA needed a PR boost for funding, and M.T. wanted to continue doing business with them delivering solid-state boosters.
Because Roger Boisjoly spoke to Congress during the hearings he was black-listed from his industry. At no point during the decisions leading up to that disaster did good engineering practices that could have prevented this destruction come into play.
> The system failed. People failed. But we knew this would happen, and we did it anyway because it's the price of exploring the frontiers. ... you might say the bloated organization and government involvement makes this sort of thing inevitable. But I bet the small privateers exploring manned space flight will run into their own challenges.
I bet they will do better, and go farther. They will know a disaster like that will likely ruin their company, so they will make damn sure that the communication process between managers and engineers doesn't break down, and that the process complexity is kept in check.
You're describing (and excusing) a bloated and dysfunctional system that sprang up around the need to manage the complexity of the space shuttle. People tried to fix the organization after Challenger, but the fact that Linda Ham stopped the request for imagery as described by CAIB shows that they failed or it reverted. And as your attitude shows, there is a bit of a fatalist perspective ("bloated organization and government involvement"). In the long term, it has to be fixed, or stuff will keep blowing up.
I'm not excusing it. And my hope is that you are correct on the smaller, leaner private launches. (I suspect, however, that they will kill some people too, and sadly, those people may not be as aware of the risks as astronauts are).
Some day, NASA will likely be seen as strange and primitive. That insanely high risk of death is the current state of the art, however. It will get worked out of the system over a very long period of time - but only if we don't lose the nerve to launch these things because someone might die.
How many people died crossing oceans back in the day because someone screwed up? Shit - they still die on ships, and in cars and in planes. Space ships are going to blow up, crash, and fail. It's just life.
It's interesting that you mention planes. Boeing and Airbus (with FAA's help I guess) have figured out a way to build astonishingly safe planes. Yes, their systems are simpler and get a lot more use, but the fact that there has been exactly one hull loss for a 777 with no fatalities in 20 years with 1000 planes might also be telling us something about them getting the organization right.
A safe 777 is hard to build, but a safe orbiter is an order of magnitude harder, with respect to the range of velocity, temperature, and pressure that the vehicle endures.
It's not a totally fair comparison to say "Boeing can do it, why can't NASA and their contractors"
With current technology, it seems obvious that if we want to (further) develop manned spaceflight, we should launch unmanned orbiters with crash-test dummies and telepresence surrogates, and keep doing so until we've established a safety record.
> In fact, your average NASA employee doted on astronauts like a star struck little girl. What the crew wanted, the crew got.
This is the question I've always had about the Challenger, but never heard addressed: How much of a factor was the crew's opinion considered to be? I strongly suspect that the astronauts themselves exerted informal but real pressure to prefer flying, since it would be their moment in the sun.
I can't opine on the crew's input to launch decisions because I didn't work in that area. I suspect that they have little say about it.
I did work on astronaut EVA training, however. In that case, the astronauts were king. No matter how silly the request, their wishes were catered to - but only in areas of usability, not safety. It turns out it's really difficult to connect wires in space, for example - small things matter a lot. I can remember at least one case where what they wanted was just plain dumb. But we did it anyway. Usually, though, they were pretty good about that sort of stuff.
You raise two main themes here: NASA/space travel has grown into a Byzantine set of rules, and also that it is an accepted cost of doing business that you'll lose lives.
Do you believe that we'd be better served by simpler (but potentially much more dangerous) craft flown by private industry? If not, why is the NASA approach better, seeing as how it is mired in both politic from without and bureaucracy/process from within?
Yes - I believe a loss of life is acceptable and inevitable. The state of the art is currently the bloated NASA system. My hope is that a leaner, private approach will be more effective and safer. I'm not sure I'm willing to make a prediction on whether that will be true or not. For one thing, NASA doesn't have the problem of profit to be concerned about.
I wish I could upvote your comments more than once. It's interesting that this forum that celebrates taking extreme (financial) risks in exchange for possibly great (financial) reward has a hard time with NASA scientists, engineers, and astronauts taking calculated risks. 100% safety is not a productive strategy.
Hell, even 20-50% safety isn't necessarily a productive strategy, if you pay too much for it.
I think the main trick is to reduce the unit cost/training investment in astronauts so that we can send up more, and making very cheap vehicles to get them there, so that we'll not have to worry about losing a big investment when (not if) something goes wrong. Putting hundreds of millions of trained meat into billions of dollars worth of aerospace tech is not sustainable.
If it would get me to the moon with a 1 in 3 chance, hell, give me a banana and call me Albert VII.
"Extreme financial risk might or might not kill people.
Space shuttles exploding will definitely kill people."
Don't make such a big jump from the first to the second. You could have also said, "Extreme space exploration risk might or might not kill people." Fact is, in both finances and space exploration, in rare cases it has led to the death of those involved (directly or indirectly). But those cases are rare and the rewards are so great, and so we press on.
Why would we allow a private craft to be more dangerous? They probably will be, but because they will be soon be subject to even more strict rules, and sneaky things will have to be done to compete in that environment (see the financial system).
When the person making the rules (the government) is no longer subject to the rules, but held responsible when they're broken, do you think they'll get looser or stricter?
And now not only does your company have to obey safety laws or whatever else applies, you've got insurance companies, shareholder lawsuits, and so on.
The Challenger/Columbia astronauts are (correctly) treated as heroes who sacrificed for the greater good. The Space Inc. astronauts who die will be considered victims of corporate negligence.
I have full confidence in SpaceX having safety at the fore-front of their manned spaceflight program. If they have a fatal mishap and loose the crew, the consequence is that they would effectively be out of business. They would go Bankrupt.
Loosing 7 lives is a tragedy. For the CEO, knowing loosing those lives will cost the company everything, they are going to pay more attention to safety, than a government bureaucracy.
After the Columbia disaster, I confess I was a bit shocked the heat shield was never examined in orbit to assess the damage that that could occur during launch. Would it be that hard to do?
I don't honestly know. I didn't work on the orbiter - I did the Hubble Space Telescope servicing mission payloads. I don't know what techniques they've figured out after the accident, but I'm not sure there was a reliable way to go out and look at the time. The shuttle's arm is only so long, and even if you could put an astronaut in postion to look, the EVA time required would likely be prohibitive. You'd need some way to inspect the underside of the orbiter from a satellite, earth, or some other vantage point. I'm sure google can tell us what they'd do today to help mitigate this kind of failure.
IIRC, two ways to inspect the orbiter's underside were developed - an inspection boom that could be attached to the end of the arm equipped with sensors (visual and laser), and a maneuver procedure before docking with the ISS - the orbiter would make a full rotation allowing it to be photographed from the ISS. During one of the first post-Columbia flights, a problem - a spacer which protruded from its proper position between two tiles was removed during an EVA. This rotation maneuver was the most shocking to me - as it showed it never occur to anyone just to rotate the shuttle so it could be inspected with a pair of binoculars. Such an inspection could be conducted as early as STS-63.
In a sense, every shuttle was an X-plane - it was as much a research vehicle as a commercial transport to LEO. One builds shuttles to learn how to better build shuttles and, in order to do that, learn as much as possible.
BTW, I envy you (in a good way, of course) more than a little. Working for NASA is a really cool thing.
Going to extreme lengths they could have examined the heat shield by repurposing spy satellites. But there was very little they could have done about it anyway.
My understanding is that the effort wasn't extended with the understanding that the crew were doomed regardless of what information was gathered.
My view is that even if the crew were doomed, gathering additional information in advance of reentry would have allowed for a better understanding of circumstances, better post-disaster modeling of what went wrong and how the orbiter failed (both in the launch-time foam strike, and in the reentry heat-shield penetration and structural failure).
Whether or not to inform the crew is yet another decision. Astronauts are aware that theirs is a highly risky venture, though with a low sample size, the specific odds are somewhat uncertain, though on the order of 4:100 per human space flight. If you're going to go on a space mission, you'd better be prepared to die.
If NASA refrained from assessing strike damage on those grounds, I feel a grave error was committed.
An engineer started arranging spysat imaging through "back channels" but was struck down by management - he/she didn't have the guts to escalate it into an official thing.
perhaps the culture change because of the challenger disaster...nothing like having the shuttle blow up, to make you reevaluate your priorities for safety
Different teams working on modules is a very different animal than one engineer saying "this item is going to blow up" for years and obviously everyone ignoring him... I find it particularly shocking how your answer suggests a strong "well, shit happens" attitude when clearly the potential AND a strong reason to make things better was right there.
What would really be interesting is why he "failed to make his case" according to executives.
The engineering team had approximately 3 hours notice prior to a teleconference to make a presentation. The telecon occurred the evening before launch, with a midnight deadline for the go / nogo decision to be made.
Due to the short timescale to build their presentation, they re-used info from existing presentations. Unfortunately, the same info had previously been used to demonstrate why their o-rings were safe. The NASA people basically said, 'last time you showed me this graph it meant things were safe, this time it means things are dangerous. What gives?'
The decision makers were looking for excuses to move forward, and that gave them an excuse to ignore the warnings.
What you're not seeing is that pretty much everything you work on has some risk of failure, and much of it could be catastrophic. Sorting through all of that is not easy. Yes, in this case, there were systemic and human failures.
It just goes to show you that even when everyone is paying attention, things still go wrong. Some people heard him and made the call that it was still safe. They were wrong. That, unfortunately, is the state of the art today (or it was back in the 80's). The alternative is to stay on the ground.
But yes, shit does happen, and nobody climbing abord the orbiter is under any illusion that it is a safe thing to do.
If you've read Fenyman's appendix to the report on the accident and investigation (http://www.ralentz.com/old/space/feynman-report.html), I don't see how you can possibly believe that the decisions made around the Challenger launch were made with the right process.
I have read it, and I do think there are defects in the processes. But what to you do about it? I'm sure there are many more unknown vulnerabilities in the orbiter that were never found out, but you keep trying and fixing.
Your willfully ignorant (and I don't mean that in a crude, insulting way) responses here lead me to think there remains some very serious cultural issues within NASA.
I think you've read something I didn't write. What you call willful ignorance I call a realistic assessment and acceptance of the risks of pioneering space flight.
Nobody is forced into an orbiter - people BEG for the opportunity. We gave them that opportunity, working in good faith to the best of our ability. Sometimes it doesn't work out. Sometimes things break. Sometimes people screw up. We all know the risks.
You can sit on the porch with a near 100% safety record or you can give it a try. Your choice.
Actually I have to say I am with "mistermann" on this... you make it sound like there is no other way. I can totally accept and understand that it cannot be all that safe to sit you on tons of rocket fuel, fire you into the oxygen-less and freezing depth of space and then hope you somehow make it onto another planet AND then do the same stunt from there back to earth. I get it, I can also understand the trade-off between "making it 100% safe" and "otherwise we'd never get lift-off".
What I cannot understand is: an unknown, unforseen contingency is a completely different thing than an engineer pointing out "this WILL fail, it will blow up and I have proof" and there really should not be any excuse for ignoring a warning like this... yes, you cannot make it 100% safe but you should at least aim to make it as safe as humanly possible given your current level of technology and knowledge... so, in my book overriding an engineer saying "this WILL fail and it'll blow up" is actually negligent man slaughter. When I get into my car in the morning and don't care that the brakes aren't working even my mechanic told me my brake lines were cut, what would you call that?
While this sentiment is understandable, it's not justified unless you know how many times people said "this will fail" and it didn't. We only have definite data on this one statement. You can not from that data conclude (I'm not saying there isn't other data) that this was negligent. If every engineer who disagreed with something said "this thing is going to blow up", eventually one would be right. But you can not then infer that that individual was any different than the others and that people should have known this. It's the "monkeys on typewriters" fallacy.
This is science and engineering not statistics. It is not a numbers game or "monkeys on typewriters" or how many bug reports we can file on the same issue to get said issue fixed!
At the end of the day, if even ONE person demonstrates scientific or engineering knowledge that shows a serious safety concern, then why would you actively choose to ignore it. Period.
NASA management - whether it be by organisational process and or personally identifiable decision making - failed in their responsibilities in spectacular fashion!
While I agree (especially with the last sentence), I would point out that the engineering behind these problems is rarely black and white, and hindsight tends to make it look more so than it is.
I do not believe that if someone knew with 100% certainty that Challenger would blow up that it would ever have launched. The trouble came in in the judgment of that risk. In this case, from what I've read, they got it wrong - very wrong[1].
You can argue about how certain they have to be, or how negligent people were to ignore estimated failure probabilities of whatever magnitude. But it's not like someone says, "this will blow up 85% of the time, period. Make a call." It's more subtle, complex, and less concrete than that.
1. Note that this is not equavlent to "if it blew up, they got it wrong.". Sometimes the small, properly calculated risk blows up on you just because you're unlucky - which is different from a miscalculated risk blowing up on you.
No hindsight was required to observe the following:
O-rings are supposed to seal on compression, not expansion.
As it is now, the O-rings are getting blown out of their tracks but still managing to seal the whole assembly quickly enough.
The above unplanned behavior, which is the only thing preventing a hull loss (and a crew loss since there's no provision for escape) is sufficiently iffy that sooner or later we're likely to run out of luck.
(I'd also add about the Columbia loss that NASA had a "can't do" attitude towards the problem they observed of the foam hitting the wing. Hardly a "crew first" attitude.)
You are leaving out data. The same engineers also, during that time, agreed to decisions that the problem had been fixed. Apparently, there was an established mechanism for any engineer working on the shuttle to file some official "bug report" that then would have required a thorough investigation. None of the engineers did, all concerns were voiced through informal channels.
Part of the conclusions of the Challenger post-accident report was that engineers were discouraged from filing such "official bug reports." Informal reports made in a briefing did not require investigation, so they were not discouraged in the same way.
When I visit a NASA center, there are posters up all over saying "If it's not safe, say so." Part of the reason for the 2-year grounding of the Shuttle fleet, post-Challenger, was to put in place a stronger culture of safety at NASA.
A commenter mentioned the false dichotomy of "engineers vs. managers". It's a hard call, as an engineer, to disappoint a manager (or a whole line of managers, all the way up) with a call to solve a possible problem. Civil engineers may be more used to this sort of accountability.
[...] "We all knew what the implication was without actually coming out and saying it," a tearful Boisjoly told Zwerdling in 1986. "We all knew if the seals failed the shuttle would blow up."
Armed with the data that described that possibility, Boisjoly and his colleagues argued persistently and vigorously for hours. At first, Thiokol managers agreed with them and formally recommended a launch delay. But NASA officials on a conference call challenged that recommendation.
"I am appalled," said NASA's George Hardy, according to Boisjoly and our other source in the room. "I am appalled by your recommendation."
Another shuttle program manager, Lawrence Mulloy, didn't hide his disdain. "My God, Thiokol," he said. "When do you want me to launch--next April?"
These words and this debate were not known publicly until our interviews with Boisjoly and his colleague. They told us that the NASA pressure caused Thiokol managers to "put their management hats on," as one source told us. They overruled Boisjoly and the other engineers and told NASA to go ahead and launch.
"We thought that if the seals failed the shuttle would never get off the launch pad," Boisjoly told Zwerdling. So, when Challenger lifted off without incident, he and the others watching television screens at Thiokol's Utah plant were relieved.
"And when we were one minute into the launch a friend turned to me and said, 'Oh God. We made it. We made it!'" Boisjoly continued. "Then, a few seconds later, the shuttle blew up. And we all knew exactly what happened."
What strikes me about the episode in terms of "general life lessons" isn't just "Listen to the engineers" (you should, though); it's that under the pressure to Get Stuff Done, there's a huge temptation to brush legitimate concerns under the rug. "These guys tell me this shuttle is unsafe, but space launch is never completely safe" --> "These guys tell me this user data isn't secure but no software is completely safe." Now that the newness of the space program has worn off a bit, it's easy to say "why didn't they just delay the launch?" but back in the day, it was an issue of national pride, and the managers, simple-minded as they may have been, were under an extreme amount of pressure to pull the launch off.
I guess it's just worth remembering that even if you're under pressure to ship, launch, or publish, if the guys whose job it is to know tell you to reconsider, you probably should.
I guess this is also a problem of different goals, according compensation and what you have to pay in case you were wrong... those guys having to push things through and make it happen in time, under budget get paid for just that - and not for having prevented a disaster. Just like bank managers make fat profits from taking on huge risks and then don't have to pay for it when it explodes in their face and nothing happens there; no person is responsible. It just happened.
(Another problem probably is communication between engineers and (project) managers - you can not be blatantly un-subtle enough... if the worst case is complete data loss then you paint them that picture in no subtle terms; if the worst case is catastrophic equipment failure, explosions, launching pad annihilated and several lives including the millions spent training them lost, then that's what you tell them and document in very direct terms.)
You can go through any catastrophe in a complex system and piece together a chain of events that show how "obvious" it was that it was going to happen. What you miss are all the chains that say every complex system is going to end in catastrophe, because when they don't end, nobody looks. It is kind of an anti-survivor bias.
It is also why it is a bad idea to make policy changes strictly off the cause of a single failure, and that is where things like commissions should help: you can move the focus to looking at the entire problem set for weaknesses instead of just leaving it with "make a better o-ring".
For more background on the engineer-manager disconnect that led to the Challenger disaster, it's worth reading about Richard Feynman's famous appendix to the Rogers report (even Wikipedia's summary is a fascinating read):
What I find most interesting is the contrast this article's discussion paints with the one we saw only 2 weeks ago, How Much Is an Astronaut's Life Worth?[1]. Some of the highly-rated HN comments included:
Space is dangerous. We should stop pretending it can be made "safe". It just gives politicians something to wag their tongues at when something inevitably goes wrong.
The problem here is that NASA is a political agency, not a scientific one. Each year, elected politicians sit down and decide how much they're going to get.
This provides thoughtful perspective on policy trade-offs. As Thomas Sowell has written, "The first lesson of economics is scarcity: There is never enough of anything to satisfy all those who want it. The first lesson of politics is to disregard the first lesson of economics."
I don't know science so please pardon me if my comment makes no sense in this context.
What if by some stroke of luck, Challenger was extremely lucky enough to not explode on that 1st launch...What would happen to these righteous people such as Roger? Condemned by the people around them as some over worrying, insane self righteous people who thought they know everything. If you get what I mean...
It's sad that the challenger explosion happened, but at the same time it helps to highlight an important issue which may otherwise remain buried.
In software/web development context, it's of course harder to say this thing is going to blow up because a serious technical debt usually only climbs in after a much longer time for which by then, the people responsible may have left . Leaving the next victim to clean it up. This also is a sad state when it comes to final year projects, where students try to do everything that could impress to the graders on the outside to get the top grade. While students who make the extra effort for a clean and maintainble backend did not get the top grades because the lecturer only looks at the outside during presentation.
That's one of my two must reads on this incident. I can't think of a better examples of how visualization illiteracy can contribute to disaster. The other one is Feynman's account of the investigation afterwards http://www.amazon.com/gp/aw/d/0553347845
What are the names of those above Boisjoly who ignored him and made the call to launch? Good to know the names of the heroes in this story, but good also to know the names of the villains.
It's worth reading the Wikipedia entry on the following report and Feynman's findings. Anonymous polling of engineers showed they estimated a general probability of catastrophic disaster in a shuttle launch between 1% and 2%.
So latimes.com's article blew it out of proportions.
2% chance of disaster is not nearly the same as "so certain was he that the shuttle would blow up."
You're conflating two different estimates: Boisjoly was certain that the particular launch would fail, and a survey of engineers gave a 1-2% estimate for failure across all flights.
Both of those predictions turned out to be extraordinarily accurate.
Engineers at Thiokol also were increasingly concerned about the problem. On July 22, 1985, Roger Boisjoly of the structures section wrote a memorandum predicting NASA might give the motor contract to a competitor or there might be a flight failure if Thiokol did not come up with a timely solution.
Nine days later (July 31) Boisjoly wrote another memorandum titled "O-ring Erosion/Potential Failure Criticality" to R. K. Lund, Thiokol's Vice President of Engineering:
"The mistakenly accepted position on the joint problem was to fly without fear of failure and to run a series of design evaluations which would ultimately lead to a solution or at least a significant reduction of the erosion problem. This position is now changed as a result of the [51-B] nozzle joint erosion which eroded a secondary O-ring with the primary O-ring never sealing. If the same scenario should occur in a field joint (and it could), then it is a jump ball whether as to the success or failure of the joint because the secondary O-ring cannot respond to the clevis opening rate and may not be capable of pressurization. The result would be a catastrophe of the highest order-loss of human life."
Boisjoly recommended setting up a team to solve the O-ring problem, and concluded by stating:
"It is my honest and very real fear that if we do not take immediate action to dedicate a team to solve the problem, with the field joint having the number one priority, then we stand in jeopardy of losing a flight along with all the launch pad facilities."
[1] http://history.nasa.gov/rogersrep/v1ch6.htm