We don't know how those who were on the wrong side of the issue coped, and they should not be pressured to make public anything beyond what the inquiry required, but it seems plausible to me that those who could persuade themselves that they made the right decision, given the circumstances and despite the outcome, probably fared best. That is usually the case.
Edward Tufte tried to suggest that Boisjoly could have presented his case more effectively. Tufte may have been thinking purely pedagogically, but regardless, the implied criticism was unjustified, as Boisjoly's point should have been clear to anyone familiar with the issue, and in fact it was clear to quite a few, though unfortunately not to the few who mattered, and I doubt that, for them, a different presentation would have made a difference.
We can't always be right, and we can't all be heroes, but I hope we can all avoid being the person who said to Boisjoly, when it appeared that Boisjoly's testimony might be fatal to Morton Thiokol, that he would leave his children for Boisjoly to raise if he lost his job.
what was the communication issue here, besides the obvious "the executive is deaf"?
From Shuttle disasters to Diesel scandals to Data breaches.
The obvious solution to this is by blaming engineers. Engineers must try harder. If the management still doesn't agree, well then you are not trying hard enough. Remember the rule, they are never wrong, if you fail to convince them its your mistake because you didn't convince them otherwise.
Hence a 'Communication problem'. Which implies management is mostly innocent, and engineering didn't do its job well enough.
This absurd rule exists for a simple reason.
Who is supposed to audit these disasters and arrive at a root cause?
(Yes I know, it's awful that people have to do this, in a perfect world things like this would be unnecessary, etc. When you find a perfect world, call me, I would like to buy some real estate.)
Please note that this scenario would arise only after the disaster happens. These people have to make a decision before that. There is no concept of a blame before a disaster, only accountability. For a simple reason that it has not happened yet.
Now if a room full of engineers said NO in consensus, and you still go ahead and say YES. Then you either know something they don't which in case you must mention what that information is, and get it reviewed from those engineers who don't have that information. Then vote again, make the YES/NO decision appropriately. Repeat until there is consensus between both you and them. If not take complete responsibility for making that decision, because only you know why you are making that decision.
Decisions this crucial are not left to gut feeling and Rambo level on-the-field thinking. There is a reason you have a panel of experts/engineers sitting.
You don't make any decisions in reality. You only iterate the process of arriving at YES/NO once all the data and scenarios are reviewed.
Situations like these happens when managers/administrators think they are kings with veto to override decisions at whim.
Even at the very highest levels. Including that of Prime Ministers and Presidents, governments are largely run by expert panels and commissions, with politicians only give policy directions. And manage things based on recommendations.
The executive needs to be able to determine if the current "no" is "more no" than the usual "no". That might not exactly be straightforward depending on how dysfunctional the relationship between management and engineers had become.
In contrast, lawyers freely play with words as if they have no meaning, because they will claim, "In our legal system, justice is the responsibility of the jury. We lawyers merely serve as opposing advocates. Thus we are obliged to confuse and mislead the jury toward our desired ends, using whatever verbiage / argument best serves our client."
That's why I'm an engineer.
When they see the actual engineers' answers of "NO" on the traceability report, then NASA would have stopped right there as a blocker and push hard.
Having execs make engineering safety critical decisions is a way to kill people. And hopefully now NASA forces this on all their contraters and internal staff.
This is completely unsupported by the extensive investigation and reports of the catastrophe. NASA simply would not accept NO, and all evidence available says they would have found a way to justify ignoring any weather + seals evidence that would have stopped the launch.
Were the safe conditions for use specified in the contract? Or was the breakdown in communication long before the launch day? NASA expecting an all weather booster, and the engineering firm producing a booster which would be unsafe in freezing conditions?
To be clearer - the hubris of Titanic wasn't technological, it was operational - it was going too fast, in an ice field, with a lack of lookouts - beyond this, it didnt reduce speed after ice was spotted.
Yes, there were issues, damage to such a small portion of the ship (less than 1/3) shouldn't have caused it to sink (I'll note, many other ships had similar or worse flaws than Titanic did).
The lifeboats on the other hand (as most commonly cited) are not a real issue - Titanic started launching boats about 25 min after it struck the iceberg, which is about how long a damage survey would take. It had not launched all 20 boats by the time the forward list became so great that they were unable to launch more - this doesn't even touch the multitudes below decks, many of whom didnt speak english, or were unable to be moved up to the boat deck in a timely enough fashion.
I'd also like to share something I dug up from the internet:
"As far as I can determine, Titanic is the single example of a passenger ship that sank with decks level. It was probably the only time in history when lifeboats could be launched from both sides simultaneously. The Andrea Doria sinking is a more typical event. In that case, half of the lifeboats were rendered useless by the cant of the decks. Given the testimony that Titanic was "lolling" as it foundered (listing from side to side), it is highly probable that carrying sufficient lifeboats would have raised the center of gravity sufficiently to have caused a permanent list early in the evening. In other words, the weight of those extra boats might have created a situation in which they could not have been used, anyway.
And, it is conveniently overlooked that from the moment when launching boats became necessary until the moment when it became impossible was not long enough to launch the 24 boats the ship carried. If it had carried more boats, they could not have been filled and launched properly. This is not to say that some people might not have used the un-launched boats to survive--just to point out that more lifeboats was not the answer to saving everyone aboard Titanic.
The only way to save everyone on any passenger ship is to not let it sink. That's the key, not lifeboats.
White Star and the British Board of Trade had an embarassing situation on their hands. They had lost the world's largest ship...not to mention some 1,500 irreplaceable souls. They needed an issue to divert public attention from the real problems behind Titanic's foundering. For instance the lack of lookout, the speed, and the fact that the ship was already well into the ice when it struck. Why wasn't the conduct of the voyage changed at 10:00 p.m. instead of after the accident? Or, what was wrong with the design of the ship that allowed damage to the bow to ultimately claim the whole ship? These issues go to the heart of the matter, but public attention was easily diverted by the lifeboat issue. It was a diversion that worked so well that the lack of lifeboats has become almost the only safety issue ever discussed."
I am however quite skeptical of the claim in that link that Titanic would have remained afloat for 6 hours had there been not watertight compartments - various marine architects have done simulations and have determined that the ship would have capsized or rolled over onto its side well before that point - beyond that, the ship would have lost lighting between 45 min, and one hour after collision, creating a panic situation as well.
burning more coal than necessary -> high speed
coal fire -> burning more coal
Didn't they declare it to be "unsinkable" due to the technology they employed? That's hubris.
The term was used in a publicity brochure by WSL.
I work on complex systems, I design them to be failure proof, that doesnt make them so
From the article.
"I am appalled," said NASA's George Hardy, according to Boisjoly and our other source in the room. "I am appalled by your recommendation."
Another shuttle program manager, Lawrence Mulloy, didn't hide his disdain. "My God, Thiokol," he said. "When do you want me to launch — next April?"
They told us that the NASA pressure caused Thiokol managers to "put their management hats on," as one source told us. They overruled Boisjoly and the other engineers and told NASA to go ahead and launch.
That's the tribal impulse right there. The one that says: screw the rest of the world, what matters is us and our in-group. Circle the wagons. Be a team player. Prize loyalty.
Maybe the type of person who bends this way has something valuable to contribute during their time on this earth, but often I struggle to see just what that might be.
Designing processes to appropriately address such concerns seems to hinge on the answer.
[edit for grammar]
The Challenger disaster is a commonly used case study in engineering classes, especially when discussing engineering ethics or communication with clients/managers.
This was not related to the challenger disaster in any way, just to show the ideas of the report are more broadly applied.
> The Challenger disaster is a commonly used case study in engineering classes, especially when discussing engineering ethics or communication with clients/managers.
Yes, and so is the Mars Climate Orbiter to some extend, though no ethical questions there.
My understanding is that it was basically zero: there was no instance where technical experts raised a concern of this magnitude (remember that Boisjoly explicitly said in a memo the previous July that there was a risk of loss of the spacecraft and loss of life) that was not valid.
> Where the NASA administrators bombarded with such concerns for every launch, or were reports a relatively rare occurrence?
Critical flight risks were reviewed before every launch, but the number of them at any given time was small: AFAIK the SRB O-ring issue was the only critical flight risk before the Challenger launch.
Another crucial piece of back story that isn't often mentioned: Boisjoly and the other engineers at Thiokol had already tried, the previous summer (after the July memo that Boisjoly wrote) to get all shuttle flights stopped until the SRB issue could be understood and fixed. NASA refused. So when they were trying to convince NASA not to launch the Challenger, the night before the flight, they were already handicapped because of that previous response by NASA.
Is that proper aerospace lingo, like "controlled flight into terrain"?
There's an equivalent term fof CFiT too - lithobraking. It's like aerobraking (using planet's atmosphere to shed some speed), except with rocks.
I could call an incident where a plane engine (on my plane) threatened to tear itself and the wing to bits because combustion became unbalanced, causing a vibration that had I not shut it down would have led to a rapid unplanned disassembly too, and likely not ended as well as the preventative.
Nobody understood exactly at the time what was causing the F1 engine instability. They were reduced to trying random things and hoping it worked. The mathematical models to figure it out were not developed until years later.
The schedule was rushed out of fear the Soviets would beat them.
Don't mistake the success of the gamble as the same as the risk was low and understood. NASA gambled big time, and they were very lucky.
I'm not saying they were wrong to take the gamble, either. But they knew the engines might blow up and launched anyway.
Hm, interesting, I didn't realize the state of knowledge was that limited at the time.
> Don't mistake the success of the gamble as the same as the risk was low and understood.
I didn't mean to imply that the risk was low. You're right that it wasn't (and you've clarified that it wasn't well understood either).
It still seems like a different decision process from the one that led to the Challenger explosion, though. In the Saturn V case, it seems like NASA made an open-eyed decision to gamble with everyone involved being aware of the risk and on board with it. In the Challenger case, NASA basically refused to properly evaluate the risk in the first place.
Here, the problem is not the engineers saying no. It is that for some reason, administrators weren't able to properly discern trivial matters from life threatening ones.
That suggests a very serious flaw in the overall capability, seriousness and knowledge of the project of the person making that decision.
The administrators are not that stupid. Reading the article carefully. These people were worried about the launch to be delayed by an year. That subsequently brings up the question about how that would work for their careers.
Also what is the downside for these people. Say the Shuttle crashes, they are not going to be billed $2 billion or charged with manslaughter. Heck, all they get is a job transfer in the worst possible case(https://en.wikipedia.org/wiki/Linda_Ham#Columbia_disaster_an...).
If you have literally no repercussions for making wrong decisions. You have now incentivized the administrators to make whatever decisions works best for them personally.
No amount and quality of engineering reports are going to work from here.
Engineers should be qualified to carry responsibility for their own work.
Engineers may be best qualified to quantify and estimate the risks. That does not mean they are best to judge whether the risks are acceptable. It is fair to imbue that responsibility in another role.
Real world examples include, for software engineers, risks due to security issues or reliability issues. The business (perhaps a non-technical CEO) has to decide the prioritization of effort and funds on those items that will be acceptable to them to meet business goals while maintaining an acceptable level of risk. Not line engineers.
As a many-time CTO, several time CEO, several time Chairman, I have had plenty of promotions but in the end you always do what your boss tells you (or resign). :)
EDIT: Essentially "socialism" is an economic status, while Authoritarianism deals with social freedoms.
Historically most "communist" societies are also Authoritarian, but that's not always the case.
And as a side note: we have to somehow learn to deal with "inevitable human error and conflict of interest" in events where failure near population centers might be catastrophic. We either have to, or give up on a lot of progress. Quite a lot of interesting technologies and research areas involve managing higher and higher energies.
There is also a false dichotomy presented here, because the alternative to nuclear is not limited to just fossil fuel. There are numerous forms of clean energy that doesn't have the failure possibility of nuclear.
On the other hand, if you look at actual fatality statistics, you can make a fairly strong case for wind and solar being far deadlier than nuclear (hydro is pretty close though). Just because solar and wind often involve installing things high up and people fall.
Nuclear has the potential to create a whole no-go area around it. And I have very little faith in us humans as a wider group to learn to deal with conflict of interest. I believe mitigation is better. Widespread solar and other distributed tech like that might not hold the raw promise of nuclear but I hope it will give us a more robust energy system in the world, harder to disrupt either by malice, conflict of interest or by accident.
Personally I am in love with things like nuclear fission powered rockets (nuclear engines could be started from bases in high orbit to avoid spewing dirty stuff into the atmosphere), submarine derived nuclear mini electric generator stations towed to coastal cities, fusion research and all that. I hope humanity in the long run will make use of such things...
Nuclear is somewhat special because radiation feels like magic (invisible killer), but I'd like to point out that we already deal with lots of comparably dangerous large-scale endeavours like this. Consider hydroelectric plants and large chemical plants. Both create as much danger of a "no-go area" as nuclear plants. Few times, accidents did happen, and people died (in case of hydro, a lot of people), ground became uninhabitable. But mostly, we manage all of them well worldwide.
I don't see how nuclear power plants could create extra in-group conflict of interests beyond what we already know to handle both in nuclear, and other industrial processes. Humans suck, but not that hard.
> Widespread solar and other distributed tech like that might not hold the raw promise of nuclear but I hope it will give us a more robust energy system in the world, harder to disrupt either by malice, conflict of interest or by accident.
That's a good point and I'm sympathetic to it, but my worry now is twofold: long-term, I'm not convinced solar/wind will give us enough energy (note how progress of mankind is mostly tied to commanding more and more energy). I'd love to see fusion working in particular. Short-term, due to unsuitability of solar/wind for base load and current lack of battery technology that would compensate, I'm not convinced solar/wind will be enough.
(Also, I don't understand why it's always nuclear vs. solar/wind, instead of nuclear+solar/wind vs. coal&gas. Ecological activism has weird priorities.)
source: from 2011 phys.org - Why nuclear power will never supply the world's energy needs
The amount of available Uranium is disputed of course, but I could not find sources about significant new finds. Canada is still the largest provider.
So "long-term" isn't something nuclear power excels at. Yes, yes, Uranium is found in traces everywhere. But economically extracting it for energy production is a different matter. Demand is higher than production even today.
What really would help us would be a good energy sink to store power produced by water/wind/sun.
The article from phys.org you quote doesn't show its math much; I'll try to track down the proceedings it references, as I'm genuinely surprised by the difference with what's in "Without Hot Air".
 - https://www.withouthotair.com/c24/page_162.shtml
Worth noting that 17 Chernobyl-style RBMK reactors were built and have largely operated without incident. Lithuania was entirely powered by two such reactors until they were shut down in the 2000s, to be replaced by fossil fuels, on the grounds that they lack containment buildings. The EU ponied up nearly a billion Euros for the decommissioning, which I can't help but notice is almost as much as the new Chernobyl sarcophagus. Might it have been cheaper and better to just... build a containment building? Perhaps. But there would never have been the political will for that.
The fear of nuclear is doing more damage than nuclear ever did.
Wind/Solar don’t fit the demand curve, but they crush Nuclear from a cost perspective. This is making Nuclear non viable as the gap ends up looking more random which is the worst place for Nuclear power.
Basicly, if wind/Solar is 50% cheaper than Nuclear having a 30% over supply of wind/Solar still costs less. Resulting in the need for zero power most of the time. In large power grids Hydro power can then fill most to all of these gaps. Nuclear is then competing with storage but it’s costs per kWh go up when it’s spending most of it’s time off.
It's less than 2/3rds (<1bn vs. 1.56bn) and the new sarcophagus is the (hopefully) last of a series of developments for confinement for a century (ie. there were costs before) while the decommissioning reduces these two reactors to mostly nothing, where the remaining troublesome radioactive material can be long-term stored together with any spent fuel (once there's a proper solution for that which we need, decommissioning or not).
I say, play to our strengths. Massive roll out of solar now - keep nuclear on the table (it's good tech), but don't rely on it to solve a substantial part of short to medium term future energy need. I also think we should plan for the event of "Global Thermonuclear War". Solar deployments and other renewables will fare much better. You can count on any centralized structures being targeted. That means (large) nuclear power plants and large hydro installations.
You’re accepting a real threat (global warming) to sustain an imaginary one (nuclear meltdowns of modern plants).
This deserves a lot of upvotes
Why is it that everyone says the area will be unusable for such a long time when the Nagasaki and Hiroshima areas are usable.
Is there a different half life for those nukes than the ones we have today ?
Air burst bombs don't produce as much dangerous fallout as ground bursting ones. Ground bombs causes debris on the ground to be ground up to dust and mutated into radioactive isotopes. Nuclear power plants acquire over time nasty radioactive products in the spent fuel.
What I was alluding to above though, was that in the event of total war, what the enemy (regardless of sides) will target, is large centralized infrastructure. Such as power plants, including nuclear. So if we have to rebuild in the aftermath, the harder it gets, the more we relied beforehand on centralized power plants, like nuclear plants. Since these will have been bombed to craters.
In this scenario, I'm not even considering fallout - just how easily we could get any infrastructure up and running again.
If you want to be really nasty you can use something like cobalt to blanket your secondary - cobalt having isotopes that are fairly long half lives but still rather potent.
Fortunately, nobody has built weapons like this.
The cynic in me says yet.
As a species some of us are really fucked up.
From what I was taught anyway.
That's a somewhat different scenario (scientists saying "yes, but" and administrators saying "hell, yes!"), but it's still pretty interesting to see what the push for nuclear power (eg. France's strategic reliance on it, UK's new reactor that is dead on arrival economically, or Iran's interest in obtaining the means) is really about.
On the other hand, China and India apparently were looking into building systems that are safer with no weapon grade by-products.
The "administrative" picture is not so rosy! Of course, the history of atomic energy in the US is replete with examples of lies told to the public to obtain "consent" - exhibit A, the Nevada Test Site and the "downwinders" . These lies were told to citizens by the AEC, the predecessor agency of the NRC, which now regulates nuclear power in the US.
A review of a recent book  by a former member of the NRC indicates that, in the view of some, administrators rolling over for commercial operators of atomic plants remains standard operating procedure. This sounds very similar to the problem with the Challenger shuttle:
"The political infighting was particularly intense after the 2011 Fukushima disaster. Jaczko visited Japan and grew impatient with the “litany of guarantees” from industry about American nuclear facilities. He tried to insist on new requirements to mitigate accidents triggered by natural disasters such as floods, earthquakes, and tsunamis. One internal NRC report drafted after Fukushima criticized the practice of relying on voluntary industry initiative to address safety concerns. Jaczko's descriptions of other commissioners' attempts to quash or edit the report provide a disturbing glimpse of the dynamic of trust and betrayal within the agency."
Linda Ham, the manager who rejected these requests left the space shuttle program after the Columbia disaster and was moved to other positions at NASA.
As with the final Columbia mission, attempts to study the damage were thwarted (though in this case on specious security grounds, as this was a military mission.)
That's a decision I don't understand. Would taking the pictures have been a big change in flight plan or something like that?
Secondly, there was a strong prevalence of the idea that, even IF they had found something, nothing could have really been done, which was probably true according to the post-mortem.
Either way, I'm highly dubious that some of the world's most competent scientists, engineers and military people couldn't work this problem and find a solution. NASA finding damage and having a few days to figure out how to fix it sets the stage for another Finest Hour. Instead, we have tragedy and disgrace.
That would be a pretty weak excuse and only applicable in hindsight. I have read that depending on the size of the damage they could potentially have changed the descent profile to put more load on the other wing. To decide this you first need the data.
Business schools use variations of the Challenger launch as a case study in group decision making and organizational behavior. I experienced one while at business school. The crucial parts of Challenger were applied to another scenario. Risk and safety were brought up in discussion but were outlier considerations by the group. The group agreed to proceed with the plan.
EDIT: In response to the suggestion, below I've removed the reference to actual case study, to allow future students to enjoy it as much as I did.
I was the only dissenter in my class. You and I should connect.
In particular they showed the data points of failed launches, you could hardly derive a correlation between temperature and failed launch from those. If you add the data points of all the successful launches you can clearly see that successful launches happened at higher temperatures. Not a single students who has had yet to take that class would've allowed that launch to go.
It's worse because so far the few really bad accidents caused by "self driving" cars would have been stopped if a human was in control (see: Tesla + truck accident).
Also not widely known was that she was gay. She had to conceal that, too, from NASA. She died young of pancreatic cancer.
Kutyna: On STS-51C, which flew a year before, it was 53 degrees [at launch, then the coldest temperature recorded during a shuttle launch] and they completely burned through the first O-ring and charred the second one. One day [early in the investigation] Sally Ride and I were walking together. She was on my right side and was looking straight ahead. She opened up her notebook and with her left hand, still looking straight ahead, gave me a piece of paper. Didn't say a single word. I look at the piece of paper. It's a NASA document. It's got two columns on it. The first column is temperature, the second column is resiliency of O-rings as a function of temperature. It shows that they get stiff when it gets cold. Sally and I were really good buddies. She figured she could trust me to give me that piece of paper and not implicate her or the people at NASA who gave it to her, because they could all get fired.
Kutyna: I wondered how I could introduce this information Sally had given me. So I had Feynman at my house for dinner. I have a 1973 Opel GT, a really cute car. We went out to the garage, and I'm bragging about the car, but he could care less about cars. I had taken the carburetor out. And Feynman said, "What's this?" And I said, "Oh, just a carburetor. I'm cleaning it." Then I said, "Professor, these carburetors have O-rings in them. And when it gets cold, they leak. Do you suppose that has anything to do with our situation?" He did not say a word. We finished the night, and the next Tuesday, at the first public meeting, is when he did his O-ring demonstration.
We were sitting in three rows, and there was a section of the shuttle joint, about an inch across, that showed the tang and clevis [the two parts of the joint meant to be sealed by the O-ring]. We passed this section around from person to person. It hit our row and I gave it to Feynman, expecting him to pass it on. But he put it down. He pulled out pliers and a screwdriver and pulled out the section of O-ring from this joint. He put a C-clamp on it and put it in his glass of ice water. So now I know what he's going to do. It sat there for a while, and now the discussion had moved on from technical stuff into financial things. I saw Feynman's arm going out to press the button on his microphone. I grabbed his arm and said, "Not now." Pretty soon his arm started going out again, and I said, "Not now!" We got to a point where it was starting to get technical again, and I said, "Now." He pushed the button and started the demonstration. He took the C-clamp off and showed the thing does not bounce back when it's cold. And he said the now-famous words, "I believe that has some significance for our problem."
The program was such a disaster, by every conceivable reckoning; using the money wasted on that white elephant, we could have had, thirty years ago, what Elon Musk is just finally getting around to, and (as a bonus!) without Elon.
Otherwise one person achieving what Musk has is just as good as any other. Just as long as they are stable enough to not pose a risk to the mission.
https://retromat.org/en/?id=130 is an example of this. Trust is the most important thing about teamwork imo.
If you trust each other there is no need for anonymity.
If you forgot how different things were back then, here are a few highlights:
Thiokol spokesmen ''consistently and falsely'' portrayed Mr. Boisjoly as ''a disgruntled or malcontented employee whose views should be discounted and whose professional expertise should be doubted,'' the suit said. It cited press interviews in which Thiokol spokesmen labeled Mr. Boisjoly a ''tattletale'' and an ''impatient'' employee who tried to hire subcontractors in violation of a contract.
Roger stood his ground and paid dearly for it, kudos to him for having some integrity.
Tufte made a number of errors, including the sorts of errors Tufte might freak out about:
The other temperatures Tufte lists on
Table 1 are of the ambient air at time of launch. Tufte has mixed apples and oranges.
Tufte thus has both coordinates on the scatterplot wrong. The vertical axis should be
“blow-by”, not “O-ring damage”, and the horizontal axis should be “O-ring
temperature”, not a mixture of O-ring temperature and ambient air temperature.
I don't know if he ever offered a response to this paper.
Maybe he didn't offer a response to this paper because the fundamental design approach is correct, albeit the actual numbers need refinement?
But, as the response that justin66 linked to shows, Tufte's "improved" chart is incorrect on both axes: its "temperature" axis wrongly conflates ambient air temperature with O-ring temperature, and its "damage" axis wrongly conflates erosion with blow-by. (See comment at end.)
The reasons why the charts shown by the Thikol engineers did not communicate "correlation between temperature and damage" was, as the Boisjoly response makes clear, because, at the time (the night before the Challenger launch), nobody knew what that correlation was. The argument the engineers were making was not "the risk of O-ring failure increases with decreasing temperature". It was "since we don't understand the root cause of the O-ring issue, we should not launch at any temperature outside the previous range of launch temperatures". The lowest previous launch temperature was 53 F; the temperature on the morning of the Challenger launch was 29 F.
Why did the engineers not know the correlation between temperature and damage? Because the data they had at the time was inconclusive and incomplete (for example, they did not even have complete data on the ambient air temperature and the O-ring temperature for every previous launch--a point Tufte overlooks), and their attempts to obtain more data had been mostly unsuccessful.
And why were the engineers reduced to making what is, on the surface, a fairly weak argument the night before the Challenger launch? Because, as I noted in another comment upthread, they had already tried, the previous summer, to get NASA to stop all Shuttle flights until the O-ring issue could be properly understood and fixed, on the grounds that with it not fixed, every Shuttle flight had a significant risk of loss of vehicle and loss of life (see further comment below on this). And NASA refused. So the engineers that night already knew they were dealing with a NASA management that was simply ignoring a critical flight risk; therefore, arguments of the form "this is a critical flight risk we don't understand, so we shouldn't launch" were out of bounds, since they had already been tried and had failed. The engineers were simply trying to do the best they could to get at least some Shuttle flights stopped, and making the best arguments they could to do that, against the background of their much better argument for having all flights stopped having already failed.
A further comment on what I said above, that every Shuttle flight was a significant risk with the O-ring issue not understood. Tufte's assumption that the root issue was in fact a "correlation between temperature and damage" was wrong. It is true that the data from previous launches showed more "witness events" (evidence of either erosion or blow-by) at lower temperatures. But the engineers also had test stand data showing that under some conditions, the O-ring joints were failing to seal at any temperature below 100 F! (And there was at least one flight that showed blow-by, i.e., evidence of the O-ring joint failing to seal, at 75 F.) So the problem wasn't "the joint is OK at higher temperatures but unacceptably risky at lower temperatures". The problem was "the joint is unacceptably risky at any temperature below 100 F"; the fact that it was more unacceptably risky at lower temperature than higher was a relatively minor detail. But NASA had already refused to listen to that argument.
And a final brief comment on erosion vs. blow-by. Erosion is damage to the O-ring due to hot gas eroding part of it while the O-ring is sealing the joint. Blow-by is hot gas going right past the O-ring because it is not sealing the joint. The O-rings were designed to tolerate a certain amount of erosion, based on the expected temperature of the gas and the duration of the burn; so erosion, in and of itself, was not evidence of a problem not anticipated by the design. But the O-rings were not designed to not seal at all: not sealing was a failure of the design. So blow-by was direct evidence of a failure of the design. That's why conflating the two is wrong: blow-by is the problem, not erosion. (It's true that an O-ring that has hot gas blowing by it because it's not sealing will also have erosion; but blow-by without erosion is still a problem, while erosion without blow-by is not. So blow-by is the indicator that should be focused on when assessing the risk level of the O-ring joint design.)
The Challenger disaster was in 1986, while PowerPoint 1.0 shipped in April 1987
The problem is that the people who had the power to stop the launch were managers dealing with other concerns. Engineers should have the power to stop the launch.
And with principles like Agile and Lean, engineers are fortunately increasingly getting empowered to stop a launch if they feel it would be irresponsible. I hope NASA now uses these sort of principles too.
Of course rockets are far bigger with much higher stakes than cars, but nothing about that justifies taking engineers out of the loop.
There is a real lack of leadership in the world, at all levels of society. It's not something an ideal like agile can fix. Without good leaders we end up with broken agile. I'm pretty sure NASA has a good engineering process. It doesn't matter if nobody listens.
In this particular case, there was a person, and expert, who saw a very real danger and was overruled by people who lacked his expertise because his opinion was inconvenient to them. That's not good. If an expert sounds the alarm, you listen to him, no matter how inconvenient it may be.
And in Agile and maybe more explicitly in Lean, the people who do the work are empowered to make decisions regarding their work. Responsibility shouldn't lie entirely with people who may be distracted by other concerns.
It's OT, but isn't it funny that we think that by applying in the exactly correct way a method that, to my knowledge, wasn't born from fundamental principles or hard science, you should get all the results you don't get when taking some liberties with it?
I mean, where does the assurance that the method works better than any variation of it come from?
The engineers shouldn't have that responsibility. The higher-ups would just say "why didn't you stop it??".
The question is relevant because I’ve read conflicting sources about whether the damage to the O-ring caused by the freezing temperature was permanent. If the temperature permanently compromised the O-ring material, then a delay wouldn’t have saved the Challenger, only a disassembly of the boosters would have. However, if the O-ring performance would have recovered when brought back to a normal temperature range, then a delay could have prevented the disaster. Does anyone have any definitive sources on this?