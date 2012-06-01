I grew up with stories of the Challenger after my father - a statistician - and 2 of his co-authors were selected by National Academy of Sciences to study if the danger could have been predicted beforehand. They showed that the likelihood of failure was 13% at the launch temperature, but would have been negligible if NASA had waited just a few hours. (His co-author, Ed Fowlkes, was dying of AIDS at the time - and considered this paper one of his life's great achievements)
Bad statistical inferences were a huge part of the launch story, and you can see more in Richard Feynman's critiques:
https://en.wikipedia.org/wiki/Rogers_Commission_Report
Secondly, the effect I highlight (a biased data sample) is a key issue with news/social media - and can lead us to heavily flawed inferences if we don't correct for it.
I'll dig deep into this in future posts with a substantial amount of data and visualizations.
http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/roger...
The key items for me were:
1) While they had no expectation of erosion, and the design did not call for the o-rings to erode, once they observed them eroding, they retroactively invented a "margin of error" based on what fraction the o-rings eroded. This was not based on an actual understood process, and is akin to saying "well, the bridge didn't break when we drove that truck over it, so it must be okay"
2) The engineers actually knew the risk (~1% chance of loss per launch, not specific to the o-rings, compared with two actual losses of the shuttle over ~130 missions). Management used entirely invented numbers for the risk which were not justified.
[..] In spite of these variations from case to case, officials behaved as
if they understood it, giving apparently logical arguments to each
other often depending on the "success" of previous flights. For
example. in determining if flight 51-L was safe to fly in the face of
ring erosion in flight 51-C, it was noted that the erosion depth was
only one-third of the radius. It had been noted in an experiment
cutting the ring that cutting it as deep as one radius was necessary
before the ring failed. Instead of being very concerned that
variations of poorly understood conditions might reasonably create a
deeper erosion this time, it was asserted, there was "a safety factor
of three." This is a strange use of the engineer's term ,"safety
factor." If a bridge is built to withstand a certain load without the
beams permanently deforming, cracking, or breaking, it may be designed
for the materials used to actually stand up under three times the
load. This "safety factor" is to allow for uncertain excesses of load,
or unknown extra loads, or weaknesses in the material that might have
unexpected flaws, etc. If now the expected load comes on to the new
bridge and a crack appears in a beam, this is a failure of the
design. There was no safety factor at all; even though the bridge did
not actually collapse because the crack went only one-third of the way
through the beam. The O-rings of the Solid Rocket Boosters were not
designed to erode. Erosion was a clue that something was wrong.
Erosion was not something from which safety can be inferred.
It's even worse, you drive a truck over it, afterwards 1/3 of the steel is cracked, so you conclude that it must be able to safely accept 3x the weight. Nonesense! This is the sort of moronic engineering that killed the crew of the Challenger.
1. http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/roger...
Note that it wasn't even engineering, reading the story in full (it's great, and covers the software side for which Feynman had nothing but praise) Feynman repeatedly noted that engineers were fairly realistic[0] and had been ringing alarms pretty much all along, this was entirely manglement mangling.
[0] unless the spectre of manglement was involved, at least for some of them
Something which, as summarized by Feynman's quote above, should be patently obvious to any engineer as bullshit.
Feynman's appendix is the only part that even tries, but it doesn't go far enough through no fault of Feynman's, he had no resources to pursue this line of inquiry. It was a struggle just to get that appendix into the report.
They should have interviewed every single person in any way remotely involved in that O-ring decision, find out if they objected to it, and if they didn't what money/institutional/social obstacles there were to prevent that.
Did some engineer actually sign off on the aforementioned "safety factor"? We don't know, but somehow I doubt that's language management came up with on their own, and if they did that there was no way for an engineer to spot that and report "wtf? The system doesn't work like that!".
Reading between the lines some engineer actually did come up with that estimate, but likely that engineer was where he was because NASA had a culture of promoting mindless yes-men.
I meant Feynman's later recounting of the whole affair (in "What do you care what other people think"), rather than just the report.
> Did some engineer actually sign off on the aforementioned "safety factor"? We don't know, but somehow I doubt that's language management came up with on their own
That doesn't mean they were fed that by an engineer, only that they'd encountered the term before.
> and if they did that there was no way for an engineer to spot that and report "wtf? The system doesn't work like that!".
And then what? Upper-management uses "safety factor" in a completely bullshit manner, and engineer spots that (because they're masochistic and read management reports?), tells their direct manager it's inane, and then what, you think it's going to go up the chain to upper-management which will fix the issue? Because IIRC (I don't have my copy of What Do You Care on me so I can't check) Feynman noted that engineering systematically got lost somewhere along management ladder as one middle-manager decided not to bother their manager with a mere engineer (or worse, technician!)'s concern or suggestions.
> Reading between the lines some engineer actually did come up with that estimate, but likely that engineer was where he was because NASA had a culture of promoting mindless yes-men.
That's really not what I read behind the lines considering engineers had failure estimates in the % range and management had estimates in the per-hundred-thousand range.
I've read that too. You're dangerously close to getting me to re-read everything Feynman's written, again. I don't know whether to curse you or thank you :)
> And then what? [...]
I feel we're in violent agreement as to what the actual problem at NASA was, yes, I'm under no illusion that if some engineer had raised these issues it would have gone well for him. This is made clear in the opening words of Feynman's analysis,:
[...] It appears that there are enormous differences of opinion as to the
probability of a failure with loss of vehicle and of human life. The
estimates range from roughly 1 in 100 to 1 in 100,000. The higher
figures come from the working engineers, and the very low figures from
management. What are the causes and consequences of this lack of
agreement? Since 1 part in 100,000 would imply that one could put a
Shuttle up each day for 300 years expecting to lose only one, we could
properly ask "What is the cause of management's fantastic faith in the
machinery?"
The real flaw in the report is that it didn't explore how that came to be institutional practice at NASA, Feynman is the only one who tried.
> That's really not what I read behind the lines.
Regardless of what sort of dysfunctional management practices there were at NASA they couldn't have launched the thing without their engineers. If they were truly of the opinion that shuttle reliability was 3 orders of magnitude less than what management thought, perhaps they should have refused to work on it until that death machine was grounded pending review.
Of course that wouldn't have been easy, but it's our responsibility as engineers to consider those sorts of options in the face of dysfunctional management, especially when lives are on the line.
Seems like it was politically impossible for NASA to say outright that there was a 1% chance of failure for every launch -- it would have led to loss of public support for the Shuttle program. So we have a systems failure where both politicians and the public contributed by making NASA admins feel compelled to lie and cover up to make the launches work.
I mean, the courageous thing to do would be to stand up and say space travel is inherently risky, people might die, but it's still worth it. But courageous politicians regularly get voted out of office, and I would bet a courageous NASA admin who said that would end up fired.
Additionally, when you have a civilian on board, I think it really changes how you think about what an appropriate level of risk is (13% might have been ok with professional astronauts who knew the risks beforehand, but likely was too high for a civilian).
And the line engineers at Morton Thiokol fought back pretty hard on the decision, even if it might have impacted their careers negatively.
I wonder if you told the management that, in those words, if they would have still believed such a ludicrous idea.
Are you not falling victim to a sort of survivor-ship bias in only applying these analyses to 'failed' missions?
It seems obvious in hindsight that the 'whistle-blowers' were right, but how many people voiced concerns which turned out to be erroneous?
Evidence of this phenomenon in the shuttle program was that foam shedding (insulation for the main fuel tank falling off during launch) was observed as early as 1983, and had been noted as a substantial risk many times over the years. Engineers pressed for high resolution images to inspect damage, but those requests were denied. NASA continued to "drive drunk" over the years, even after Challenger, and inevitably, disaster struck when Columbia was damaged by a piece of foam that broke off during launch, resulting in disintegration during reentry.[1] After grounding the fleet and improving safety, the very next launch suffered similar foam shedding, though from a different part of the tank. The launch after that was also pressed, after delays and objections from the chief engineer and safety officer.
1. https://en.wikipedia.org/wiki/Space_Shuttle_Columbia_disaste...
You're absolutely right though about this being "Monday morning quarterbacking" with any analysis after the events. That said, two things are clear:
1. The most knowledgeable engineers were strongly against launch (see Ebeling and Boisjoly at Morton Thiokol)
2. There were huge statistical flaws made by decision makers at NASA before launch (see Feynman as noted in this thread)
One should always think of denominators (really ratio of numerator over denominator).
In health, public heath in particular is "about denominators."
[0]: https://www.sfu.ca/cmns/courses/2012/801/1-Readings/Tufte%20...
[1]: https://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outb...
http://www.onlineethics.org/Topics/ProfPractice/Exemplars/Be...
https://eagereyes.org/criticism/tufte-and-the-truth-about-th...
It's been longer since I read Feynman but I recall his assessment as being a lot more grounded, and fairer to the engineers.
Feynman laid the vast majority of the blame on management ("NASA officials" in his Appendix F), noting that engineers had fairly realistic views of the matter (and failure rate estimates) and IIRC that they'd tried to raise concerns but those had gotten lost climbing the manglement ladder.
The one unit for which he had nothing but praise was the Software Group:
> To summarize then, the computer software checking system and
attitude is of the highest quality. There appears to be no process of
gradually fooling oneself while degrading standards so characteristic
of the Solid Rocket Booster or Space Shuttle Main Engine safety
systems.
Nothing that they had to constantly resist manglement trying to mangle:
> To be sure, there have been recent suggestions by management
to curtail such elaborate and expensive tests as being unnecessary at
this late date in Shuttle history.
Everyone knows about regulatory capture (when companies manage their safety regulators). Normally it's private organizations pushing public safety regulators.
Here, on the other hand, it was a public organization that took risks.
The reason is simple. Every organization has certain needs. Boeing needs to make planes (make $), Ford needs to make cars (make $), etc. Safety is an annoying thing they need to get over with ASAP to get to their primary purpose (make $).
Nasa needed to launch, and (at least for the managers) it became an acceptable risk. If it flies 20 times and blows once, they win.
So should there be a NASA and a NASAOC (NASA oversight committee, to check on them)?
Then the organization on top of both (Congress, the President) will choose which one to listen.
This is the general problem of self-policing.
And the only way to get around this is by having multiple, independent, providers. So if NASA doesn't think SpaceX is safe enough, they can shut down the contract while still having access to space.
If Nasa had that in 1986, the Shuttle would have been (rightfully) decommissioned then and there. Unfortunately, it required _another_ accident before anything moved.
And the lesson can be learned to matters outside space.
The next class period, only after the teams propose their decided course of action, it is revealed where the data really came from. I imagine it's quite jarring, especially for those who decided to proceed, albeit with different risks in mind.
Nature understands no jesting; she is always true, always serious, always severe; she is always right, and errors and faults are always those of man. The man incapable of appreciating her, she despises; and only to the apt, the pure, and the true, does she resign herself and reveal her secrets.
As we learned later about the rubber o-rings failing in cold weather, the solution has always been framed as not launching in those conditions.
But, why not use a different material, or design away the o-rings to avoid the problem in the first place?
Edit: question is already answered here:
https://news.ycombinator.com/item?id=13239241
Source: Anecdotes from an old friend who is a quality assurance engineer. He was one of the boys on the ground in Huston that brought the Apollo 13 crew home.
But the time of the shuttle, NASA had moved from a prestigious project to pork barrel politics. End result was that managers and politicians were endlessly overruling the engineers.
In fact, Americans could have built the entire shuttle without using any parts at all.
(I thought OP meant no O-rings would've been used in the entire shuttle, not just these particular ones.)
A given failure rate (or even worse, a failure count) doesn't tell you much about the system without also including the totals of both success and failure.
I'm sure I could be more rigorous with this though. Is there a way to express a given failure rate in terms of certainty? As in, we have sampled the failure rate of a component with fixed parameters a, b, c, and we are x% certain of the failure rate? (Maybe I'm wording this wrong - I don't have much of a stats background).
Do you have an example? I love having more viz tools in my toolbox!
Wouldn't there be a whole bunch of different stats measured, which would all say that you should be cautious when trying a region far away from what you know?
In any case, a very good example of how stats is unintuitive. I hadn't guessed about the missing "no error" data until I read it. I'm sure there's many more little things like that. Simpson's paradox, those kinds of things.
No, that's correct, the launch was the coldest yet and reached temperatures at which the O-rings had lost their flexibility and couldn't spring back fast enough to seal. In fact an iconic scene from the Challenger hearings was Feynman showing (on TV!) the loss of ductility after having dunked an O-ring in ice-water.
The piece says: "Below is the key graph of the O-ring test data that NASA analyzed before launch" and reproduces the famous chart. It continues, "NASA management used the data behind this first graph (among many other pieces of information) to justify their view the night before launch that there was no temperature effect on O-ring performance [...] But NASA management made one catastrophic mistake: this was not that chart they should have been looking at."
I think these statements are pretty misleading without some major caveats.
Tufte ("Visual and Statistical Thinking: Displays of Evidence for Making Decisions"; https://blogs.stockton.edu/hist4690/files/2012/06/Edward-Tuf...) writes:
"Most accounts of the Challenger reproduce a scatterplot that apparently demonstrates the analytical failure of the pre-launch debate. The graph depicts only launches with O-ring damage and their temperatures, omitting all damage-free launches (an absence of data points on the line of zero incidents of damage). First published in the shuttle commission report (PCSSCA, volume 1, 146), the chart is a favorite of statistics teachers. [...] The graph of the missing data-points is a vivid and poignant object lesson in how not to look at data when making an important decision. But it is too good to be true! First, the graph was not part of the pre-launch debate; it was not among the 13 charts used by Thiokol and NASA in deciding to launch. Rather, it was drawn after the accident by two staff members (the executive director and a lawyer) at the commission as their simulation of the poor reasoning in the pre-launch debate. Second, the graph implies that the pre-launch analysis examined 7 launches at 7 temperatures with 7 damage measurements. That is not true; only 2 cases of blow-by and 2 temperatures were linked up. The actual pre-launch analysis was much thinner than indicated by the commission scatterplot. Third, the damage scale is dequantified, only counting the number of incidents rather than measuring their severity. In short, whether for teaching statistics or for seeking to understand the practice of data graphics, why use an inaccurately simulated post-launch chart when we have the genuine 13 pre-launch decision charts right in hand?"
(For a response to Tufte's essay, see https://people.rit.edu/wlrgsh/FINRobison.pdf, also cited elsewhere here.)
