Engineers at Thiokol also were increasingly concerned about the problem. On July 22, 1985, Roger Boisjoly of the structures section wrote a memorandum predicting NASA might give the motor contract to a competitor or there might be a flight failure if Thiokol did not come up with a timely solution.
Nine days later (July 31) Boisjoly wrote another memorandum titled "O-ring Erosion/Potential Failure Criticality" to R. K. Lund, Thiokol's Vice President of Engineering:
"The mistakenly accepted position on the joint problem was to fly without fear of failure and to run a series of design evaluations which would ultimately lead to a solution or at least a significant reduction of the erosion problem. This position is now changed as a result of the [51-B] nozzle joint erosion which eroded a secondary O-ring with the primary O-ring never sealing. If the same scenario should occur in a field joint (and it could), then it is a jump ball whether as to the success or failure of the joint because the secondary O-ring cannot respond to the clevis opening rate and may not be capable of pressurization. The result would be a catastrophe of the highest order-loss of human life."
Boisjoly recommended setting up a team to solve the O-ring problem, and concluded by stating:
"It is my honest and very real fear that if we do not take immediate action to dedicate a team to solve the problem, with the field joint having the number one priority, then we stand in jeopardy of losing a flight along with all the launch pad facilities."
And there was a team working on it. IIRC, a redesign of the joint was being worked on. But no one suggested, before the evening before the launch, that launches should be stopped until the redesign was completed.
I think there are parallels to right-or-wrong issues that we face in our own industry on a regular basis, for example:
Are phrases like "standard industry practice" and "covered by the click-through license" today's weasel words that rationalize us implementing immoral management demands?
If our software deals with privacy and personal information then it can even become life or death in some countries. A dissident in a brutal authoritarian regime could lose his life if the software has a bug in it, for example.
The other area were moral choices take place is in licensing and patents. Can be on both large scales (like we see with big companies use busybox for their platform) or smaller scales (copying code from one project to another without even giving credit).
Please don't downvote me - I'm not invoking Godwins Law here ;-)
"without IBM's machinery, continuing UPKEEP and SERVICE, as well as the supply of punch cards, whether located ON-SITE or off-site, Hitler's camps could have never managed the numbers they did" [emphasis mine]
The idea is basically:
So there were not pocket calculators that they bought and then deployed in concentration camps and IBM can claim complete ignorance over the use. These machines need constant & professional service. Having them onsite meant that IBM sent technicians there probably. One had to wonder what those technician's response do the 24/7 billowing crematorium and emaciated bodies were.
However, on the other side (and I have not read the book, just looked at the topic many years ago superficially) I am not aware of any IBM engineers' stories about an actual site visit... So I don't know what to think and how guilty IBM really is...
 This was a point of contention between the UK and US before the fall of France.
You'd be surprised. Check out the computer risks digest - always a sign of how pear shaped stuff can go:
However, I was struck by wrong he got it. He was making a point about how just having the data available isnt enough - the modern challenge is to make sense of big data in a meaningful way through visualization. And to back this point up, he made it sound like a bunch of engineers didn't look closely at the data sheet for the rubber used in their o-rings and oopsy! The challenger blew up. Shrug, Who knew?
I'll tell you who knew. Roger. "The engineers" knew perfectly well what was going to happen. It was management who refused to listen to them and launched anyway.
Feyman's account of his involvement in investigating this disaster in "the pleasure of finding things out" was excellent. In one exercise, he bad the engineers and management write down what they honestly thought the failure rate of the shuttle was. Management quoted 1 in 100,000, while engineering quoted 1 in 100. The challenger disaster was a symptom of a complete breakdown in communication between engineering and management at NASA.
The engineers knew what was going to happen with a fairly high degree of certainty - enough that the launch shouldn't hvae happened. Then they turned to management and tried to present their findings. Clearly the business guys didn't want to be the ones to hold up the launch, they had a strong incentive not to. So the engineers had to convince them why the company should basically stick its neck out as one of hundreds and hundreds of vendors and stop this launch. The engineers didn't do the best job of communicating the certainty of the problem, the magnitude of the problem, etc. So it was a little bit of both.
That lecture was probably 8 years ago for me, but it definitely left a lasting impression. Today I work on systems that truthfully are more complicated than the space shuttle and also have a lot more lives at stake. "Intellectual honesty" is a phrase I live by. There have been times I fucked up some math and knew we needed to fix something - even though it would money and delay, but I stuck to my guns and made sure it happened. The alternative is too likely to lead to disaster. It keeps me up at night, and those are the near misses. I can't imagine what that guy went through knowing they almost stopped it, but didn't.
That's a recurring theme when management screws up badly.
Intellectual honesty applied to this particular situation:
1. The management should know that the engineers are more profound in the topic than they are.
2. The management should know that lives are at stake.
3. The management should notice that the engineers take their time and argue until late at night, mentioning "death", "explosion"
4. The management should know that communication skills may vary.
But management still did not act upon it, because they did not know their limits. That's their fault, nobody else's.
I'm glad you take your responsibility seriously.
During reading the book, I was struck by two things: one was the "slippery slope" introduced by first seeing something and arguing it was not a launch risk. With the decision making system in place, that meant it was very difficult to then, when another datum came in, to say that the two data together implied something was wrong, because they had already argued that the first datum was not a risk. This is of course a well-known phenomenon, but it didn't seem like the system was well-equipped to deal with sparse and sometimes contradictory data.
The second was the feeling that what was happening was that the engineers were observing a random process and post-rationalizing cause and effect. (I had just read "Fooled by Randomness", which may have contributed to this.) Every piece of information was made to fit into some model, but it seemed like no one was considering that maybe what they were seeing was just inherent randomness. With that view, the progressively bad outcomes that happened before indicated that what was observed was a poorly-characterized random process with a fat tail. At some point, one rationalization was that they had a "safety factor of 4 left", but if you have indications of a fat-tailed process, that's not much to bank on.
It's detailed, but well-written and the quality of the analysis is extraordinary. If you want a preview, take a glance at the debris analysis on page 40. If you like understated drama, start at the comm transcript on page 42. ("Lock the doors.")
Everyone at NASA got a hard copy of this report.
His argument (which the linked article takes to task) is that the engineers knew but were unable to convince due to their lack of ability to put together a convincing case for it. While Tufte's argument might have some merit, there was probably a lot more going on than just that. Perhaps even with a very compelling case presented, the launch would have gone through.
So you are correct that he knew, but you are wrong in claiming the lecturer got it wrong. It is exactly because those who makes the decisions don't have the necessary know-how that visualizing the data in a meaningful way is so much more important.
Besides, saying the engineers didn't "try hard enough" is essentially claiming the engineers were in charge. They were not. Managers are paid to manage their department, not just please the higher ups. When a claim is made that failure to address a problem will result in catastrophe and loss of life... really what more needs to be said, or drawn, to make the point? If you're not going to trust your own experts, why have them at all?
And how does the Challenger explosion in any way suggest that there aren't enough engineers? Certainly this one major issue had engineers urging for delay, so there doesn't seem to have been a lack of them.
You know that all of the shuttles risks are well documented, right? It's a mountain of issues that basically gets the "ok to be ignored for now" stamp at every launch.
I think we are talking past each other.
I am talking about the problem in general.
1 in 100,000 means launching the shuttle 3 times a day for nearly 100 years and experiencing only one failure. If the logical side of their brain is that far out to lunch, why would you think that the problem was simply an insufficient appeal to logic?
So the problem was more about insufficient appeal to emotion than to logic. Whether that would have been possible in the case of Challenger is another question, obviously retrospective analysis is always easier.
But in my mind there is no doubt that visualization are able to change peoples opinions and understanding. I have seen that first hand on some projects.
This is quite simply astonishing; everything I have ever learned in my engineering classes said to do what McDonald did and look what happened to him.
Engineers place a lot of emphasis on having the right answer and almost no emphasis on ensuring their influence. It doesn't matter how right you are if nobody will follow your direction.
Gordon Cooper: You boys know what makes this bird go up? FUNDING makes this bird go up.
Gus Grissom: He's right. No bucks, no Buck Rogers.
"When the space shuttle Columbia burned up on reentry in 2003, killing its crew of seven, the accident was blamed on the same kinds of management failures that occurred with the Challenger. By that time, Boisjoly believed that NASA was beyond reform, some of its officials should be indicted on manslaughter charges and the agency abolished."
"NASA's mismanagement "is not going to stop until somebody gets sent to hard rock hotel," Boisjoly said. "I don't care how many commissions you have. These guys have a way of numbing their brains. They have destroyed $5 billion worth of hardware and 14 lives because of their nonsense." "
Horsepucky: the engineers calculations weren't mis-anything. The historical record has long since proven that the engineers were repeatedly ignored by their management and by bureaucrats at NASA.
While I don't know the people involved with Challenger - I was in 6th grade at the time - it goes well against my own experience that NASA management had anything but the interests of the crew in mind. To a fault. In fact, your average NASA employee doted on astronauts like a star struck little girl. What the crew wanted, the crew got. You could always tell who the astronaut was when you saw a group walking about the centers - he/she was the one whose every semi-whimsical comment extracted voluminous and polite laughter from the others in the group.
I was, however, working when Columbia blew up. In fact, my mission was supposed to fly on it when it got back. Although sad, I feel comfortable saying that most of the people working on these things sort of know it's going to happen from time to time. It wasn't exactly surprising to us or the crew.
Blaming managers and celebrating engineers is overly simplistic. The line is not as well defined as you might think. I had few - if any (I can't think of a single one, actually) - managers (either contractors or NASA employees) who were not experienced engineers.
The safety rules for a shuttle payload, let alone the actual orbiter, are voluminous and arcane. It is the primary reason that very little new technology comes out of the manned space flight program. Everything new is considered too dangerous because it hasn't been flown before.
This stuff is insanely dangerous. It is pretty damn easy to come up with a way some piece of hardware you're working on could kill someone. The complexity is enormous. The number of people involved is in the thousands, and they're spread all over the country. Different centers have different rules.
As a result, you make life-and-death decisions literally every day. It's not such a big deal, because there is a lot of formal process in place to make sure it gets done right. The the "standards" are what keep space flight as we know it as safe as it is. Are they or the processes by which they are enforced perfect? Hell no.
The system failed. People failed. But we knew this would happen, and we did it anyway because it's the price of exploring the frontiers. We learned from Challenger. We learned from Columbia. We will learn from the next catastrophic failure. NASA isn't perfect. In fact, you might say the bloated organization and government involvement makes this sort of thing inevitable. But I bet the small privateers exploring manned space flight will run into their own challenges.
Basically, what I'm saying is that we need to keep this in a larger perspective. Obsessing over one failure in what is a centuries-long quest is not helpful. Dissect it, learn from it, and move on.
After listening to tapes of the trials, interviews, reading transcripts, and reading articles it was very apparent to me that this was a failure of management. The lead engineer, during the discussions of whether to launch the night before, was arguing that the engineering evidence did not support a launch under the temperature conditions projected for the following morning. He was told by Morton-Thiokol business reps to "take off your engineer hat, and put on your manager hat".
Evidence points to this failure happening because NASA needed a PR boost for funding, and M.T. wanted to continue doing business with them delivering solid-state boosters.
Because Roger Boisjoly spoke to Congress during the hearings he was black-listed from his industry. At no point during the decisions leading up to that disaster did good engineering practices that could have prevented this destruction come into play.
> The system failed. People failed. But we knew this would happen, and we did it anyway because it's the price of exploring the frontiers. ... you might say the bloated organization and government involvement makes this sort of thing inevitable. But I bet the small privateers exploring manned space flight will run into their own challenges.
I bet they will do better, and go farther. They will know a disaster like that will likely ruin their company, so they will make damn sure that the communication process between managers and engineers doesn't break down, and that the process complexity is kept in check.
You're describing (and excusing) a bloated and dysfunctional system that sprang up around the need to manage the complexity of the space shuttle. People tried to fix the organization after Challenger, but the fact that Linda Ham stopped the request for imagery as described by CAIB shows that they failed or it reverted. And as your attitude shows, there is a bit of a fatalist perspective ("bloated organization and government involvement"). In the long term, it has to be fixed, or stuff will keep blowing up.
Some day, NASA will likely be seen as strange and primitive. That insanely high risk of death is the current state of the art, however. It will get worked out of the system over a very long period of time - but only if we don't lose the nerve to launch these things because someone might die.
How many people died crossing oceans back in the day because someone screwed up? Shit - they still die on ships, and in cars and in planes. Space ships are going to blow up, crash, and fail. It's just life.
It's not a totally fair comparison to say "Boeing can do it, why can't NASA and their contractors"
With current technology, it seems obvious that if we want to (further) develop manned spaceflight, we should launch unmanned orbiters with crash-test dummies and telepresence surrogates, and keep doing so until we've established a safety record.
This is the question I've always had about the Challenger, but never heard addressed: How much of a factor was the crew's opinion considered to be? I strongly suspect that the astronauts themselves exerted informal but real pressure to prefer flying, since it would be their moment in the sun.
I did work on astronaut EVA training, however. In that case, the astronauts were king. No matter how silly the request, their wishes were catered to - but only in areas of usability, not safety. It turns out it's really difficult to connect wires in space, for example - small things matter a lot. I can remember at least one case where what they wanted was just plain dumb. But we did it anyway. Usually, though, they were pretty good about that sort of stuff.
Do you believe that we'd be better served by simpler (but potentially much more dangerous) craft flown by private industry? If not, why is the NASA approach better, seeing as how it is mired in both politic from without and bureaucracy/process from within?
Hell, even 20-50% safety isn't necessarily a productive strategy, if you pay too much for it.
I think the main trick is to reduce the unit cost/training investment in astronauts so that we can send up more, and making very cheap vehicles to get them there, so that we'll not have to worry about losing a big investment when (not if) something goes wrong. Putting hundreds of millions of trained meat into billions of dollars worth of aerospace tech is not sustainable.
If it would get me to the moon with a 1 in 3 chance, hell, give me a banana and call me Albert VII.
Space shuttles exploding will definitely kill people.
We're human. Fellow humans died in this pursuit of space exploration. Is it inevitable? Maybe. Probably.
Does it still hurt? Hell yes.
Space shuttles exploding will definitely kill people."
Don't make such a big jump from the first to the second. You could have also said, "Extreme space exploration risk might or might not kill people." Fact is, in both finances and space exploration, in rare cases it has led to the death of those involved (directly or indirectly). But those cases are rare and the rewards are so great, and so we press on.
When the person making the rules (the government) is no longer subject to the rules, but held responsible when they're broken, do you think they'll get looser or stricter?
And now not only does your company have to obey safety laws or whatever else applies, you've got insurance companies, shareholder lawsuits, and so on.
The Challenger/Columbia astronauts are (correctly) treated as heroes who sacrificed for the greater good. The Space Inc. astronauts who die will be considered victims of corporate negligence.
Loosing 7 lives is a tragedy. For the CEO, knowing loosing those lives will cost the company everything, they are going to pay more attention to safety, than a government bureaucracy.
In a sense, every shuttle was an X-plane - it was as much a research vehicle as a commercial transport to LEO. One builds shuttles to learn how to better build shuttles and, in order to do that, learn as much as possible.
BTW, I envy you (in a good way, of course) more than a little. Working for NASA is a really cool thing.
My view is that even if the crew were doomed, gathering additional information in advance of reentry would have allowed for a better understanding of circumstances, better post-disaster modeling of what went wrong and how the orbiter failed (both in the launch-time foam strike, and in the reentry heat-shield penetration and structural failure).
Whether or not to inform the crew is yet another decision. Astronauts are aware that theirs is a highly risky venture, though with a low sample size, the specific odds are somewhat uncertain, though on the order of 4:100 per human space flight. If you're going to go on a space mission, you'd better be prepared to die.
If NASA refrained from assessing strike damage on those grounds, I feel a grave error was committed.
What would really be interesting is why he "failed to make his case" according to executives.
Due to the short timescale to build their presentation, they re-used info from existing presentations. Unfortunately, the same info had previously been used to demonstrate why their o-rings were safe. The NASA people basically said, 'last time you showed me this graph it meant things were safe, this time it means things are dangerous. What gives?'
The decision makers were looking for excuses to move forward, and that gave them an excuse to ignore the warnings.
It just goes to show you that even when everyone is paying attention, things still go wrong. Some people heard him and made the call that it was still safe. They were wrong. That, unfortunately, is the state of the art today (or it was back in the 80's). The alternative is to stay on the ground.
But yes, shit does happen, and nobody climbing abord the orbiter is under any illusion that it is a safe thing to do.
Nobody is forced into an orbiter - people BEG for the opportunity. We gave them that opportunity, working in good faith to the best of our ability. Sometimes it doesn't work out. Sometimes things break. Sometimes people screw up. We all know the risks.
You can sit on the porch with a near 100% safety record or you can give it a try. Your choice.
What I cannot understand is: an unknown, unforseen contingency is a completely different thing than an engineer pointing out "this WILL fail, it will blow up and I have proof" and there really should not be any excuse for ignoring a warning like this... yes, you cannot make it 100% safe but you should at least aim to make it as safe as humanly possible given your current level of technology and knowledge... so, in my book overriding an engineer saying "this WILL fail and it'll blow up" is actually negligent man slaughter. When I get into my car in the morning and don't care that the brakes aren't working even my mechanic told me my brake lines were cut, what would you call that?
At the end of the day, if even ONE person demonstrates scientific or engineering knowledge that shows a serious safety concern, then why would you actively choose to ignore it. Period.
NASA management - whether it be by organisational process and or personally identifiable decision making - failed in their responsibilities in spectacular fashion!
I do not believe that if someone knew with 100% certainty that Challenger would blow up that it would ever have launched. The trouble came in in the judgment of that risk. In this case, from what I've read, they got it wrong - very wrong.
You can argue about how certain they have to be, or how negligent people were to ignore estimated failure probabilities of whatever magnitude. But it's not like someone says, "this will blow up 85% of the time, period. Make a call." It's more subtle, complex, and less concrete than that.
1. Note that this is not equavlent to "if it blew up, they got it wrong.". Sometimes the small, properly calculated risk blows up on you just because you're unlucky - which is different from a miscalculated risk blowing up on you.
O-rings are supposed to seal on compression, not expansion.
As it is now, the O-rings are getting blown out of their tracks but still managing to seal the whole assembly quickly enough.
The above unplanned behavior, which is the only thing preventing a hull loss (and a crew loss since there's no provision for escape) is sufficiently iffy that sooner or later we're likely to run out of luck.
(I'd also add about the Columbia loss that NASA had a "can't do" attitude towards the problem they observed of the foam hitting the wing. Hardly a "crew first" attitude.)
When I visit a NASA center, there are posters up all over saying "If it's not safe, say so." Part of the reason for the 2-year grounding of the Shuttle fleet, post-Challenger, was to put in place a stronger culture of safety at NASA.
A commenter mentioned the false dichotomy of "engineers vs. managers". It's a hard call, as an engineer, to disappoint a manager (or a whole line of managers, all the way up) with a call to solve a possible problem. Civil engineers may be more used to this sort of accountability.
[...] "We all knew what the implication was without actually coming out and saying it," a tearful Boisjoly told Zwerdling in 1986. "We all knew if the seals failed the shuttle would blow up."
Armed with the data that described that possibility, Boisjoly and his colleagues argued persistently and vigorously for hours. At first, Thiokol managers agreed with them and formally recommended a launch delay. But NASA officials on a conference call challenged that recommendation.
"I am appalled," said NASA's George Hardy, according to Boisjoly and our other source in the room. "I am appalled by your recommendation."
Another shuttle program manager, Lawrence Mulloy, didn't hide his disdain. "My God, Thiokol," he said. "When do you want me to launch--next April?"
These words and this debate were not known publicly until our interviews with Boisjoly and his colleague. They told us that the NASA pressure caused Thiokol managers to "put their management hats on," as one source told us. They overruled Boisjoly and the other engineers and told NASA to go ahead and launch.
"We thought that if the seals failed the shuttle would never get off the launch pad," Boisjoly told Zwerdling. So, when Challenger lifted off without incident, he and the others watching television screens at Thiokol's Utah plant were relieved.
"And when we were one minute into the launch a friend turned to me and said, 'Oh God. We made it. We made it!'" Boisjoly continued. "Then, a few seconds later, the shuttle blew up. And we all knew exactly what happened."
I guess it's just worth remembering that even if you're under pressure to ship, launch, or publish, if the guys whose job it is to know tell you to reconsider, you probably should.
(Another problem probably is communication between engineers and (project) managers - you can not be blatantly un-subtle enough... if the worst case is complete data loss then you paint them that picture in no subtle terms; if the worst case is catastrophic equipment failure, explosions, launching pad annihilated and several lives including the millions spent training them lost, then that's what you tell them and document in very direct terms.)
It is also why it is a bad idea to make policy changes strictly off the cause of a single failure, and that is where things like commissions should help: you can move the focus to looking at the entire problem set for weaknesses instead of just leaving it with "make a better o-ring".
Space is dangerous. We should stop pretending it can be made "safe". It just gives politicians something to wag their tongues at when something inevitably goes wrong.
The problem here is that NASA is a political agency, not a scientific one. Each year, elected politicians sit down and decide how much they're going to get.
This provides thoughtful perspective on policy trade-offs. As Thomas Sowell has written, "The first lesson of economics is scarcity: There is never enough of anything to satisfy all those who want it. The first lesson of politics is to disregard the first lesson of economics."
What if by some stroke of luck, Challenger was extremely lucky enough to not explode on that 1st launch...What would happen to these righteous people such as Roger? Condemned by the people around them as some over worrying, insane self righteous people who thought they know everything. If you get what I mean...
It's sad that the challenger explosion happened, but at the same time it helps to highlight an important issue which may otherwise remain buried.
In software/web development context, it's of course harder to say this thing is going to blow up because a serious technical debt usually only climbs in after a much longer time for which by then, the people responsible may have left . Leaving the next victim to clean it up. This also is a sad state when it comes to final year projects, where students try to do everything that could impress to the graders on the outside to get the top grade. While students who make the extra effort for a clean and maintainble backend did not get the top grades because the lecturer only looks at the outside during presentation.
It's quite fascinating and pretty eye opening.
One thing is to believe that the chance of blowing up is ~1% (which is enough to prevent launch).
Another thing is to be certain, that it WILL blow up (I assume 90%+ probability here).
If he was so certain, why he could not convince his management?
Both of those predictions turned out to be extraordinarily accurate.
Hopefully no loss of life ever..