
Personal observations on the reliability of the Shuttle (1986) - umanwizard
http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogers-commission/Appendix-F.txt
======
kens
Related: Gregg Easterbrook's article "Beam Me Out Of This Death Trap, Scotty"
[1] is long but remarkably prescient, having been written a year before the
first shuttle flight. It goes into the dangers of the tiles, how the costs
would spiral, the danger of relying on a single launch vehicle, the benefits
of disposable rockets, and other warnings that ended up being right.

The article talks about how unlikely the shuttle was to achieve the expected
500 flights, and would more likely only have 200 flights. (Real number: 135)

Some of the quotes from the article are scary in retrospect:

Quote: "Here's the plan. Suppose one of the solid-fueled boosters fails. The
plan is, you die."

Another quote: "When Columbia's tiles started popping off in a stiff breeze,
it occurred to engineers that ice chunks from the tank would crash into the
tiles during the sonic chaos of launch: Goodbye, Columbia."

Remember, this article is from 1980, before the shuttle launched.

[1]
[http://www.washingtonmonthly.com/features/2001/8004.easterbr...](http://www.washingtonmonthly.com/features/2001/8004.easterbrook-
fulltext.html)

~~~
robryk
Gregg Easterbrook wrote the following about simulated shuttle landings:

> They've never flounced like a twig on the crazy rapids of "bias"\--the bland
> physics term for unexplained variations in the earth's gravitational and
> magnetic fields.

I can't make heads or tails out of it -- I can't find any reference to such a
phenomenon called "bias" and I don't see how gravitational field variations (I
assume he means the ones caused by uneven density) could have any effect at
their minuscule amplitude. Is this a result of some misunderstanding or am I
missing something?

~~~
001sky
[http://maps.ngdc.noaa.gov/viewers/historical_declination/](http://maps.ngdc.noaa.gov/viewers/historical_declination/)

~~~
robryk
Thanks. I didn't know that the differences change in space in such weird
fashion. I guess what was meant were spatial differences, not temporal ones
and it now the part about magnetic field makes some sense to me.

------
InclinedPlane
Interestingly, a thorough risk assessment of the Shuttle was done much later
by NASA (near the end of the program) and it concluded that the risk of losing
a Shuttle in the pre-Challenger era to be much higher than 1 in 100, closer to
1 in 10. Many people look at Challenger and Columbia as instances where the
Shuttle program hit a patch of bad luck. In reality the Shuttle program has
been extraordinarily lucky, there were many other close calls, some not well
publicized, that came within a hair's breadth of causing loss of crew and
vehicle (STS-1, STS-8, STS-9, STS-27 being examples of such). It was always a
tricky bird to fly, and in the early days there were about half a dozen
different things that could kill it outright with a shockingly high
probability (not just the SRBs or foam/ice strikes on the TPS, also the APUs
(which caught fire and exploded on one flight), the computer (which was
completely inoperable just before landing on another flight), the SSMEs (which
came close to causing loss of the orbiter once or twice), and other
components). Over time some of the systems were improved to such a degree that
they were no longer serious risks, but the whole system was so complicated and
there were so many elements of risk that even at the end of the program many
substantial risks still remained.

------
stiff
The last sentence from this piece is just beautiful, it has become my personal
motto:

 _For a successful technology, reality must take precedence over public
relations, for nature cannot be fooled._

It captures in a capsule form the reasons for a huge fraction of all the big
engineering catastrophes, maybe even most of them. For everyone interested in
similar case studies, and in reliability from a wide engineering perspective,
I strongly recommend the book "Design Paradigms: Case Histories of Error and
Judgment in Engineering" by Henry Petroski.

~~~
marze
Collectively, the public has a similar approach to nuclear power plants, which
make vast tracts of land useless for agriculture if they melt down.

We've had three meltdowns in 40 years, about one per 13 years so far. Have we
been lucky? Or have we been unlucky? Time will tell.

~~~
maxharris
_We 've had three meltdowns in 40 years, about one per 13 years so far._

This is not a fair comparison. The shuttle was a bad design, with a high
failure rate, and it doesn't make any sense to lump it in with tried-and-true
rockets, which have a far lower failure rate. Similarly, it doesn't make sense
to lump the dangerous RBMK reactor design used at Chernobyl with the far safer
reactors designed in the West.

How many people died _in flight_ during the Apollo missions? The answer is the
same number that died as a result of the events at Three-Mile Island and
Fukushima: zero. Design matters, and if you're trying to be objective on this
topic, you have to distinguish between good designs and bad ones.

Also, not all meltdowns are created equal, as your post suggests. If you look
at Three Mile Island on Google Maps today, you'll see all sorts of arable
cropland _in active use_ all around the site.

EDIT: Why the downvote? Is there anything I've said that isn't factual?

EDIT 2: here's a link that explains the differences between the Soviet RBMK
and the Western LWR designs:
[http://users.owt.com/smsrpm/Chernobyl/RBMKvsLWR.html](http://users.owt.com/smsrpm/Chernobyl/RBMKvsLWR.html)

~~~
keithflower
Wow, NASA-think redux!

I really don't think you want to use Three-Mile Island as some sort of
exemplar of safe technology:

 _The lessons of Three Mile Island have been, for the most part, forgotten.
The nuclear industry changed and improved somewhat, but the deep understanding
of what went wrong was lost on the public in general and the real lessons that
we could have learned as a society were, too. The financial mess we are
experiencing right now isn’t all that different from Three Mile Island. If
we’d taken better to heart the true lessons of TMI we might not be in our
present jam._

 _Looking back at the accident with the benefit of knowing what it took to
clean it up and what the workers found when they were finally able to send
robots inside the containment, the TMI accident was very bad indeed. There
were pressure spikes during the accident that would have cracked an average
containment vessel, releasing radioactive gases into the atmosphere.
Fortunately the Unit 2 containment wasn’t average. TMI-2 was built on the
final approach path to Harrisburg International Airport, a former U.S. Air
Force base, and was therefore beefed-up specifically to withstand the impact
of a B-52 hitting the structure at 200 knots. A normal containment would have
been breached._

 _TMI wasn’t caused by a computer failure but the accident was made vastly
worse by an error of computer design. Specifically, TMI-2 had a terrible user
interface._

 _What happened at Unit 2 was a little more complex. A cascading series of
events caused the computer to notice SEVEN HUNDRED things wrong in the first
few minutes of the accident. The ONE audible alarm started ringing and stayed
ringing continuously until someone turned it off as useless. The ONE visual
alarm was activated and blinked for days, indicating nothing useful at all.
The line printer queue quickly contained 700 error reports followed by several
thousand error report updates and corrections. The printer queue was almost
instantly hours behind, so the operators knew they had a problem (700 problems
actually, though they couldn’t know that) but had no idea what the problem
was._

 _So they guessed._

 _Not good._

[http://www.cringely.com/2009/03/31/three-mile-island-
memorie...](http://www.cringely.com/2009/03/31/three-mile-island-memories/)

~~~
maxharris
I don't think you understood my point at all. I'm not saying that _any_
nuclear incident is minor, or that there will never be one again.

A Chernobyl-style incident will never happen with a LWR reactor. That much is
known, and the experts - _nuclear engineers_ \- are unanimous on this point.
Feynman's point was, "when judging risk, ask the engineers that actually
design and build the technology, not the management." The article I linked to
([http://users.owt.com/smsrpm/Chernobyl/RBMKvsLWR.html](http://users.owt.com/smsrpm/Chernobyl/RBMKvsLWR.html))
was written by nuclear engineers (who were students at the time, but have been
in their field for many years now). The article you linked to was written by
... a journalist that writes about the computer industry? Now I'm left with
the idea that _you_ have missed Feynman's point.

~~~
keithflower
You're right. I don't think I understand your point because I don't think you
know what your point is.

Are you trying to advance the theory that _management_ had no hand in the
terrible decision-making around TMI, including the numerous shortcuts that
Cringely noted (nb: you are hardly using an _engineering_ approach when you
resort to ad hominem criticism of Cringely as a "journalist" without
responding to any of the legitimate problems he noted). Or are you trying to
advance the theory that no engineering mistakes were made at TMI?

And are you trying to advance the theory here that somehow, magically,
_management_ will not continue to make NASA-like _management_ mistakes in
current and future nuclear facilities?

Because if so, that's _precisely_ Feynman's point, which you seem to be
ignoring.

And c'mon, you're merely moving goal-posts around when you focus on the
severity of Chernobyl vs TMI. That isn't speaking at all to how technology was
misused in both instances by people making exactly the kinds of errors Feynman
emphasizes.

> A Chernobyl-style incident will _never_ happen with a LWR reactor [emphasis
> added]. That much is known, and the experts - nuclear engineers - are
> _unanimous_ on this point [emphasis added].

Really? _Never?_ _Unanimous_? That sounds like a really interesting
_engineering_ judgment. Could you kindly link to the unanimous consensus
statement from _nuclear engineers_ that support that strong but odd statement?

Because that sounds an awful lot like NASA management who claimed that the
chances of loss of a space shuttle were so remote as to be negligible.

And your statement is even more curious when we read review papers like the
one[1] from _nuclear engineer_ Bah Sehgal[2] of the National Academy of
_Engineering_ , who concludes:

"The presently-installed LWR plants in Western countries have been addressing
their safety performance from the day they were installed and operating...
_Clearly, not all the severe accident issues have been resolved for the
presently-installed plants_ [emphasis added].

"The presently-installed LWR plants made improvements in components, systems,
operator training, man-machine interface, safety culture, etc., thereby
significantly reducing the probability of a severe accident occurring. They
also instituted severe accident management backfits, systems and procedures,
which are providing assurance of the elimination of an uncontrolled and large
release of radioactivity even in case a severe accident occurs. _Still, the
presently-installed plants can not provide assurance of coolability of a melt
pool /debris bed, which could be formed during a bounding severe accident. In
that situation, the LWR owner can not assure the public that the accident has
been terminated and that there is no further danger of the release of
radioactivity._ [emphasis added]"

Sorry. It is you with your pronouncements of _never_ who is absolutely not
getting Feynman's point whatsoever. In particular, he decried _management_ and
others with their own wishful pronouncements of _never_ , which stood in stark
contrast with the concerns of _engineers_ who were well aware that there were
quantifiable and real risks associated with their technology.

[1]
[http://www.kns.org/jknsfile/v38/JK0380697.pdf](http://www.kns.org/jknsfile/v38/JK0380697.pdf)

[2] [https://www.nae.edu/69277.aspx](https://www.nae.edu/69277.aspx)

~~~
maxharris
dmfdmf wrote the following right here on hn five years ago. I'm reposting for
the benefit of those that won't click on his comment:

"As a former design engineer in the nuclear business I have to make the
following comments;

1) The lessons of TMI are far from forgotten. TMI is one of the most studied
accidents and the lessons learned are incorporated throughout engineering and
technical training.

2) Anyone who claims TMI was worse than Chernobyl is an idiot. One of the
major lessons learned from TMI was that the design basis and safety strategies
of western reactors work. This despite the serious operator training and
control room design flaws that were exposed by the accident.

3) Anyone who mentions Chernobyl and TMI in the same breath does not know what
they are talking about. A few facts about Chernobyl; these RMBK reactors were
originally designed to generate plutonium for bombs and then scaled up for
electric power generation which created all sorts of operational problems.
When I was an undergrad my nuke prof said the design was inherently unstable
and an accident was inevitable. The western countries had tried for years to
get them to shut them down. On the night of the accident the engineers
disabled 4 or 5 safety systems in order to run a turbine spin down test. This
test was ordered by Moscow and the previous lead engineer was fired for not
completing it prior to the last planned shutdown.

4) TMI experienced a partial core melt. I read an engineering report after the
accident that it was technically and economically feasible to fix the damaged
reactor. The PR nightmare this would create dictated that it would not be
fixed. Chernobyl's core was blown sky high by a steam explosion and fuel rods
littered the plant site, thus killing the responding firemen with lethal doses
of radiation. There is no dispute regarding which core had more damage.

5) The claim that the containment would have cracked due to "pressure spikes"
except that TMI was specially reinforced to protect against aircraft impact is
engineering nonsense. First, these are different design requirements and
operate on different physical principles. Second, if the accident exposed such
a serious deficiency in the design of "normal" containment buildings it would
have resulted in the shutdown or at least a reduced operating power at all
other plants of similar design. No such regulatory action ever occurred.

6) While it is scary to write about "releasing radioactive gases into the
atmosphere" the reality is that such releases are pretty harmless. These gases
are typically biologically and physically inert and quickly dissipate in the
wind to harmless background radiation levels. One of the major lessons learned
from TMI was that the more dangerous biologically active materials like
radioactive iodine or potassium do not escape and tend to stick to other
material even in a core melt. That is if you have a containment building,
unlike Chernobyl.

7) It is insulting to say that the operators did not know what was going on
with the reactor "so they guessed" as if they started pushing buttons and
pulling levers willy-nilly. The operators knew that the information they were
receiving was not complete or wrong. The biggest problem was that their
training was flawed and incorporated an assumption that was incorrect -- thus
leading them to take actions that made the situation worse.

About the only thing that I agree with Cringely on is that we should be
building nuclear reactors now."

~~~
keithflower
I reckon your kind, single posting to HN from a "design engineer in the
nuclear business"... who thinks the events leading to a partial nuclear
meltdown at a LWR like TMI reflect a kind of engineering triumph and the kind
of statistical confidence (that Feynman calls out) that would lead engineers
to conclude that LWRs could _never_ suffer a nuclear meltdown....instead of
the requested posting of a link to a consensus statement about the
_impossibility_ of a major nuclear reactor accident by _all nuclear
engineers_....is better than nothing.

Just kidding with you some. But I think we're done here. Have a nice day.

------
treblig
There were 135[1] Space Shuttle missions with 2 resulting in human casualties
(Challenger and Columbia disasters).

Thus, a failure with loss of vehicle and of human life of 1.48 in 100.

 _The estimates range from roughly 1 in 100 to 1 in 100,000. The higher
figures come from the working engineers, and the very low figures from
management._

The reality was even more dangerous than the engineers had predicted, and far
more dangerous than management had.

[1]
[http://en.wikipedia.org/wiki/List_of_space_shuttle_missions](http://en.wikipedia.org/wiki/List_of_space_shuttle_missions)

~~~
VLM
I would suspect the engineering estimate had one or at most two sig figs,
which isn't bad compared to results. I'm sure the management estimate had six
sig figures of course.

~~~
kens
When NASA was comparing designs for their post-shuttle rocket recently, they
literally used four significant figures in the risk estimates [1], and used
this estimate to pick the design. It seems crazy to me to have four sig figs
of reliability for designs that were basically at the PowerPoint stage.

NASA started with the requirement that their new rocket have less than 1 in
1000 odds of loss of crew (LOC). They concluded that using an existing Atlas V
had LOC odds of 1 in 957 (unacceptable), while the paper design of putting a
capsule on top of a Shuttle booster had LOC odds of 1 in 1918 (totally
acceptable). They then quoted this 1,918 number in a lot of places to justify
the program.

This rocket was the Ares-I [2], which turned into a fiasco and was canceled
four years ago.

My conclusion is that NASA's current risk assessments are as bogus as the ones
for the space shuttle. They start with an unrealistic goal (1 in 1000 risk),
make totally unjustifiable estimates to meet the goal, and then make bad
decisions based on these estimates. Coincidentally, the decisions based on
these estimates line up with the politically-desirable outcome.

The 1 in 1918 risk assessment turned out to be totally wrong, of course. The
Air Force pointed out that the launch escape system wouldn't work since
burning fuel would melt the parachute and everyone would die. [3]

My personal view is that NASA needs to admit that rockets are dangerous and
you probably can't get the risk below 1 in 100. Then NASA can focus on doing
the best job they can. [4]

[1] See for example
[http://www.nasa.gov/pdf/140649main_ESAS_full.pdf](http://www.nasa.gov/pdf/140649main_ESAS_full.pdf)
figure 1-26 [2]
[http://en.wikipedia.org/wiki/Ares_I](http://en.wikipedia.org/wiki/Ares_I) [3]
[http://archive.is/YD1sh](http://archive.is/YD1sh) [4] See "Safe is not an
option" for discussion on how NASA's focus on safety is harming the space
program:
[http://www.thespacereview.com/article/2435/1](http://www.thespacereview.com/article/2435/1)

------
pjmorris
This is an absolute classic of engineering literature. The last sentence,
perhaps deservedly, gets most of the glory, but the whole piece should be
under every engineer's and every manager's fingers.

I constantly see the dynamic observed in the first paragraph, and it would
seem that the question "What is the cause of management's fantastic faith in
the machinery?" is eternal.

~~~
curtis
I think management's big problem is that they are often confused by the
difference between _what they need_ and _what they have_. Of course I've spent
a lot of my career working for venture capital-based startups where this
problem might naturally be more prevalent.

~~~
smacktoward
I would phrase it slightly differently: the problem is that when human beings
know that they need something, and also know that _they will not be allowed to
get it,_ they are remarkably successful at convincing themselves that they
don't really need it after all.

Take the infamous O-rings from the Shuttle's solid rocket boosters, for
instance. As Feynman notes in this appendix, the appearance of erosion on the
O-rings was an indication that their design was fundamentally flawed. So why
did NASA's engineers twist themselves into pretzels to argue that everything
was fine? The reason is because they knew that _there was no chance that those
boosters were going to be redesigned._ The political will for that, in NASA
and in Washington, just was not there. Even if the engineers threw down their
tools and refused to launch any more Shuttles on the grounds that they were
unsafe, all that would happen would be that the Powers That Be would come down
on them like a ton of bricks and force them either to shut up and get back to
work, or to get out of NASA and therefore wreck their careers.

When people are forced to choose between bad options, the easiest thing to do
is usually nothing. So that's what NASA's engineers did. And since "I know
this thing I work on is likely to kill people, and I'm not going to do
anything about that" is the kind of self-knowledge that leads to cognitive
dissonance
([http://en.wikipedia.org/wiki/Cognitive_dissonance](http://en.wikipedia.org/wiki/Cognitive_dissonance)),
they put together ad-hoc rationalizations to help them live with it.
Rationalizations like the misuse of the term "safety factor" that Feynman
flagged, for example; if you can twist that term so it fits the facts in front
of you, you can convince yourself that the Shuttle isn't _really_ unsafe.
Which takes care of the cognitive dissonance.

You can see this cycle all the time in software development, too. How many
systems have you worked on that weren't really secure, but you know that
management doesn't have the stomach to accept the tradeoffs -- the extra cost,
the reduced convenience -- it would take to _make_ them secure? What happens
in those cases? In the ones I've seen, the people involved just convince
themselves that the system isn't insecure after all. "We've never been hacked;
if the system was insecure, we would have been hacked; therefore, the system
is not insecure." It's not great logic, but when you desperately want to
believe something it doesn't _have_ to be great logic to convince you. It just
has to tell you what you already want to hear.

~~~
tanzam75
The engineers _knew_ why there was blow-by on the O-ring -- joint rotation.
The obvious solution was to redesign the field joint to resist this rotation,
so that the O-ring would never be exposed to hot gases. All they needed was
time and money.

Then _Challenger_ happened. Now, no Shuttle could ever launch again until the
joint was redesigned and proven to work. Time and money were no object.

------
alexhutcheson
For anyone else interested in how the organizational incentives and
institutional culture at NASA helped to set the stage for the Challenger
disaster, I highly recommend _The Challenger Launch Decision_ [1] by Diane
Vaughan.

From the New York Times review[2]:

In "The Challenger Launch Decision" Diane Vaughan, a sociologist at Boston
College, takes up where the Rogers Commission and Claus Jensen leave off. She
finds the traditional explanation of the accident -- "amorally calculating
managers intentionally violating rules" \-- to be profoundly unsatisfactory.
Why, she asks, would they knowingly indulge such a risk when the future of the
space program, to say nothing of the lives of the astronauts, hung in the
balance? "It defied my understanding," she says.

[1]
[https://www.goodreads.com/book/show/995029.The_Challenger_La...](https://www.goodreads.com/book/show/995029.The_Challenger_Launch_Decision)

[2]
[http://www.nytimes.com/books/97/04/13/nnp/19074.html](http://www.nytimes.com/books/97/04/13/nnp/19074.html)

------
marze
Even with Feynman's carefully reasoned essay, the next shuttle disaster was a
mirror of the first: chunks of foam falling of each flight, careful monitoring
but no serious action until the foam resulted in a loss of a vehicle.

The first loss was after careful monitoring of near-burn throughs of the SRB
o-rings on many flights, but no decisive action.

~~~
rbanffy
It really astounded me when I learned no shuttle was inspected for damage
while in orbit until after the loss of the Columbia.

The shuttle was an experimental vehicle. It was their job to gather as much
data as possible on it. With that, the foam problem would have become evident
long before the deaths of the Columbia crew.

~~~
toufka
As I recall, part of the rationale, was that even if there was an issue, there
was literally zero way for it to be fixed. So now you spend the next 2-3 days
in the shuttle, "doing your job" knowing that you will not return alive. With
that in mind, the though initially was, "let's just not check".

~~~
tanzam75
Not true, there would've been options available. After the loss of _Columbia_
, engineers came up with two ways to save the crew: a rescue mission, and an
emergency repair EVA.

The rescue mission would've been hazardous, but the expected loss of life
would have been negative. (More likely to save seven astronauts on the
_Columbia_ than to lose two astronauts on the rescue mission.) The repair
would've been jury-rigged and may not have worked, but it would have been
better than reentering without attempting to repair the damage.

Once you'd inspected the Shuttle, you'd know that it was in pretty bad shape.
And then you've have moved into _Apollo 13_ mode -- how do we come up with a
way to save the crew? And maybe it would've worked. It's only because the
Shuttle was not inspected that NASA proceeded as though nothing were wrong.

Incidentally, they had the STS-27 astronauts check for tile damage, back in
1988. The astronauts were convinced that they were going to die on re-entry,
but they still did their jobs for the rest of the mission.

This also means that NASA had 15 years to develop in-orbit repair methods
before _Columbia_ needed them. But nothing was done in this area either.

------
altero
I think root problem is that shuttle was starting and landing with people.
They should use it just for cargo, second rocket with Apollo should have
transport people from/to orbit.

~~~
bdunbar
Wings were the real problem.

They stuck out where they could be struck by stuff. They precluded the capsule
eject system. They cost Shuttle tons of fuel. For a system used in only one
part it's flight regime.

Why _why_ did Shuttle have wings? The Air Force insisted. So they could launch
Shuttle into polar orbit from Vandenberg. Shuttle would need the wings to come
_back_ to Vandenberg.

Then the Air Force withdrew from the program. Too late to get rid of the
terrible wings, however.

~~~
gaius
Curious, how would it land _anywhere_ without wings? Apart from splashing
down.

~~~
tunap
Lifting Bodies

[http://www.nasa.gov/centers/dryden/news/FactSheets/FS-011-DF...](http://www.nasa.gov/centers/dryden/news/FactSheets/FS-011-DFRC.html)

------
benmorris
In case you missed it Discovery Channel aired a pretty interesting Challenger
docu(drama?) in November. The show portrayed Dr. Feynman's path to reaching
these conclusions. Pretty interesting and disturbing at the same time.

[http://www.discovery.com/tv-shows/the-challenger-
disaster/ch...](http://www.discovery.com/tv-shows/the-challenger-
disaster/challenger-disaster-videos/combing-through-the-wreckage.htm)

