Related: Gregg Easterbrook's article "Beam Me Out Of This Death Trap, Scotty" [1] is long but remarkably prescient, having been written a year before the first shuttle flight. It goes into the dangers of the tiles, how the costs would spiral, the danger of relying on a single launch vehicle, the benefits of disposable rockets, and other warnings that ended up being right.
The article talks about how unlikely the shuttle was to achieve the expected 500 flights, and would more likely only have 200 flights. (Real number: 135)
Some of the quotes from the article are scary in retrospect:
Quote: "Here's the plan. Suppose one of the solid-fueled boosters fails. The plan is, you die."
Another quote: "When Columbia's tiles started popping off in a stiff breeze, it occurred to engineers that ice chunks from the tank would crash into the tiles during the sonic chaos of launch: Goodbye, Columbia."
Remember, this article is from 1980, before the shuttle launched.
Not only did Gregg Easterbrook's article describe the dangers that would eventually lead to the demise of the Challenger and the Columbia, but he also mentions a third failure mode: the Space Shuttle Main Engines.
Could easily have failed catastrophically and taken out a third Shuttle. Instead, all the failures of the SSME turned out to be survivable. We just got really lucky.
> NASA was claiming that the engines were in the regular range of engineering, but they're not; the engines had many difficulties that the guys at JPL told me about. (I found out later that the people who worked on the engines always had their fingers crossed on each flight, and the moment they saw the Shuttle explode, they were all sure it was the engines. But of course, the TV replay showed a flame coming out of one of the solid rocket boosters.)
Gregg Easterbrook wrote the following about simulated shuttle landings:
> They've never flounced like a twig on the crazy rapids of "bias"--the bland physics term for unexplained variations in the earth's gravitational and magnetic fields.
I can't make heads or tails out of it -- I can't find any reference to such a phenomenon called "bias" and I don't see how gravitational field variations (I assume he means the ones caused by uneven density) could have any effect at their minuscule amplitude. Is this a result of some misunderstanding or am I missing something?
Thanks. I didn't know that the differences change in space in such weird fashion. I guess what was meant were spatial differences, not temporal ones and it now the part about magnetic field makes some sense to me.
Buran was carried piggy-back on an expendable rocket (Energia), with no attempt at reusable main engines, less aggressive engineering on the expendable main engines than the SSMEs, and no SRBs. (The equivalent function was performed by liquid-fueled strap-ons.) The Buran orbiter looked a lot like the Shuttle, but the systems as a whole were engineered rather differently.
Also, the sole flight of the Buran was entirely unmanned - unlike the Space Shuttle, which had less sophisticated automation and required a pilot onboard to land the thing, Buran was capable of entirely automatic landing. The difference was probably mostly the result of political rather than technical reasons though.
The only reason Space Shuttle has required a pilot when landing from the point when the re-entry burn has started was to push a button that lowered the landing gear at a proper point in time. In fact, there was some equipment stored on the ISS that, if installed, would allow the Shuttle to land autonomously (the intended use for that was deorbiting a Shuttle with significant tile damage).
It is unclear whether Buran's autopilot is more or less complex than the Space Shuttle's fly-by-wire system. The pilot's inputs are the outer loop of a very complex feedback control system on the space shuttle; one need only look at how stable SS Columbia was on its last re-entry (despite heavy damage,) to appreciate the fly-by-wire system.
That having been said, it's still not safe enough. For example, what about a birdstrike or other debris? A small capsule on top of the stack would have a protective shroud while it is in the atmosphere, and the heatshield is at the bottom anyway. The Shuttle/Buran design was too big to protect in this way.
From my understanding a Bird Strike during launch would only occur very early in the flight, where the shuttle was still traveling at a relatively low speed.
> From my understanding a Bird Strike during launch would only occur very early in the flight, where the shuttle was still traveling at a relatively low speed.
There are a number of birds that can fly up to 10,000 feet. At 10,000 feet, the Shuttle is doing half the speed of sound. Hitting a bird at 10,000 feet would've been no picnic.
(In fact, the altitude record for birds is above 30,000 feet -- but these are unlikely to be found in Florida.)
Soyuz and Shenzhou both have a payload fairing that covers the entire spacecraft. Apollo had a partial shroud that covered the command module. These spacecraft do not even have an exposed heatshield during launch. Apollo didn't even need a fairing for aerodynamic reasons -- but it still had one anyway.
Interestingly, a thorough risk assessment of the Shuttle was done much later by NASA (near the end of the program) and it concluded that the risk of losing a Shuttle in the pre-Challenger era to be much higher than 1 in 100, closer to 1 in 10. Many people look at Challenger and Columbia as instances where the Shuttle program hit a patch of bad luck. In reality the Shuttle program has been extraordinarily lucky, there were many other close calls, some not well publicized, that came within a hair's breadth of causing loss of crew and vehicle (STS-1, STS-8, STS-9, STS-27 being examples of such). It was always a tricky bird to fly, and in the early days there were about half a dozen different things that could kill it outright with a shockingly high probability (not just the SRBs or foam/ice strikes on the TPS, also the APUs (which caught fire and exploded on one flight), the computer (which was completely inoperable just before landing on another flight), the SSMEs (which came close to causing loss of the orbiter once or twice), and other components). Over time some of the systems were improved to such a degree that they were no longer serious risks, but the whole system was so complicated and there were so many elements of risk that even at the end of the program many substantial risks still remained.
The last sentence from this piece is just beautiful, it has become my personal motto:
For a successful technology, reality must take precedence over public relations, for nature cannot be fooled.
It captures in a capsule form the reasons for a huge fraction of all the big engineering catastrophes, maybe even most of them. For everyone interested in similar case studies, and in reliability from a wide engineering perspective, I strongly recommend the book "Design Paradigms: Case Histories of Error and Judgment in Engineering" by Henry Petroski.
We've had three meltdowns in 40 years, about one per 13 years so far.
This is not a fair comparison. The shuttle was a bad design, with a high failure rate, and it doesn't make any sense to lump it in with tried-and-true rockets, which have a far lower failure rate. Similarly, it doesn't make sense to lump the dangerous RBMK reactor design used at Chernobyl with the far safer reactors designed in the West.
How many people died in flight during the Apollo missions? The answer is the same number that died as a result of the events at Three-Mile Island and Fukushima: zero. Design matters, and if you're trying to be objective on this topic, you have to distinguish between good designs and bad ones.
Also, not all meltdowns are created equal, as your post suggests. If you look at Three Mile Island on Google Maps today, you'll see all sorts of arable cropland in active use all around the site.
EDIT: Why the downvote? Is there anything I've said that isn't factual?
I really don't think you want to use Three-Mile Island as some sort of exemplar of safe technology:
The lessons of Three Mile Island have been, for the most part, forgotten. The nuclear industry changed and improved somewhat, but the deep understanding of what went wrong was lost on the public in general and the real lessons that we could have learned as a society were, too. The financial mess we are experiencing right now isn’t all that different from Three Mile Island. If we’d taken better to heart the true lessons of TMI we might not be in our present jam.
Looking back at the accident with the benefit of knowing what it took to clean it up and what the workers found when they were finally able to send robots inside the containment, the TMI accident was very bad indeed. There were pressure spikes during the accident that would have cracked an average containment vessel, releasing radioactive gases into the atmosphere. Fortunately the Unit 2 containment wasn’t average. TMI-2 was built on the final approach path to Harrisburg International Airport, a former U.S. Air Force base, and was therefore beefed-up specifically to withstand the impact of a B-52 hitting the structure at 200 knots. A normal containment would have been breached.
TMI wasn’t caused by a computer failure but the accident was made vastly worse by an error of computer design. Specifically, TMI-2 had a terrible user interface.
What happened at Unit 2 was a little more complex. A cascading series of events caused the computer to notice SEVEN HUNDRED things wrong in the first few minutes of the accident. The ONE audible alarm started ringing and stayed ringing continuously until someone turned it off as useless. The ONE visual alarm was activated and blinked for days, indicating nothing useful at all. The line printer queue quickly contained 700 error reports followed by several thousand error report updates and corrections. The printer queue was almost instantly hours behind, so the operators knew they had a problem (700 problems actually, though they couldn’t know that) but had no idea what the problem was.
I don't think you understood my point at all. I'm not saying that any nuclear incident is minor, or that there will never be one again.
A Chernobyl-style incident will never happen with a LWR reactor. That much is known, and the experts - nuclear engineers - are unanimous on this point. Feynman's point was, "when judging risk, ask the engineers that actually design and build the technology, not the management." The article I linked to (http://users.owt.com/smsrpm/Chernobyl/RBMKvsLWR.html) was written by nuclear engineers (who were students at the time, but have been in their field for many years now). The article you linked to was written by ... a journalist that writes about the computer industry? Now I'm left with the idea that you have missed Feynman's point.
You're right. I don't think I understand your point because I don't think you know what your point is.
Are you trying to advance the theory that management had no hand in the terrible decision-making around TMI, including the numerous shortcuts that Cringely noted (nb: you are hardly using an engineering approach when you resort to ad hominem criticism of Cringely as a "journalist" without responding to any of the legitimate problems he noted). Or are you trying to advance the theory that no engineering mistakes were made at TMI?
And are you trying to advance the theory here that somehow, magically, management will not continue to make NASA-like management mistakes in current and future nuclear facilities?
Because if so, that's precisely Feynman's point, which you seem to be ignoring.
And c'mon, you're merely moving goal-posts around when you focus on the severity of Chernobyl vs TMI. That isn't speaking at all to how technology was misused in both instances by people making exactly the kinds of errors Feynman emphasizes.
> A Chernobyl-style incident will never happen with a LWR reactor [emphasis added]. That much is known, and the experts - nuclear engineers - are unanimous on this point [emphasis added].
Really? Never?Unanimous? That sounds like a really interesting engineering judgment. Could you kindly link to the unanimous consensus statement from nuclear engineers that support that strong but odd statement?
Because that sounds an awful lot like NASA management who claimed that the chances of loss of a space shuttle were so remote as to be negligible.
And your statement is even more curious when we read review papers like the one[1] from nuclear engineer Bah Sehgal[2] of the National Academy of Engineering, who concludes:
"The presently-installed LWR plants in Western countries have been addressing their safety performance from the day they were installed and operating...Clearly, not all the severe accident issues have been resolved for the presently-installed plants [emphasis added].
"The presently-installed LWR plants made improvements in components, systems, operator training, man-machine interface, safety culture, etc., thereby significantly reducing the probability of a severe accident occurring. They also instituted severe accident management backfits, systems and procedures, which are providing assurance of the elimination of an uncontrolled and large release of radioactivity even in case a severe accident occurs. Still, the presently-installed plants can not provide assurance of coolability of a melt pool/debris bed, which could be formed during a bounding severe accident. In that situation, the LWR owner can not assure the public that the accident has been terminated and that there is no further danger of the release of radioactivity. [emphasis added]"
Sorry. It is you with your pronouncements of never who is absolutely not getting Feynman's point whatsoever. In particular, he decried management and others with their own wishful pronouncements of never, which stood in stark contrast with the concerns of engineers who were well aware that there were quantifiable and real risks associated with their technology.
dmfdmf wrote the following right here on hn five years ago. I'm reposting for the benefit of those that won't click on his comment:
"As a former design engineer in the nuclear business I have to make the following comments;
1) The lessons of TMI are far from forgotten. TMI is one of the most studied accidents and the lessons learned are incorporated throughout engineering and technical training.
2) Anyone who claims TMI was worse than Chernobyl is an idiot. One of the major lessons learned from TMI was that the design basis and safety strategies of western reactors work. This despite the serious operator training and control room design flaws that were exposed by the accident.
3) Anyone who mentions Chernobyl and TMI in the same breath does not know what they are talking about. A few facts about Chernobyl; these RMBK reactors were originally designed to generate plutonium for bombs and then scaled up for electric power generation which created all sorts of operational problems. When I was an undergrad my nuke prof said the design was inherently unstable and an accident was inevitable. The western countries had tried for years to get them to shut them down. On the night of the accident the engineers disabled 4 or 5 safety systems in order to run a turbine spin down test. This test was ordered by Moscow and the previous lead engineer was fired for not completing it prior to the last planned shutdown.
4) TMI experienced a partial core melt. I read an engineering report after the accident that it was technically and economically feasible to fix the damaged reactor. The PR nightmare this would create dictated that it would not be fixed. Chernobyl's core was blown sky high by a steam explosion and fuel rods littered the plant site, thus killing the responding firemen with lethal doses of radiation. There is no dispute regarding which core had more damage.
5) The claim that the containment would have cracked due to "pressure spikes" except that TMI was specially reinforced to protect against aircraft impact is engineering nonsense. First, these are different design requirements and operate on different physical principles. Second, if the accident exposed such a serious deficiency in the design of "normal" containment buildings it would have resulted in the shutdown or at least a reduced operating power at all other plants of similar design. No such regulatory action ever occurred.
6) While it is scary to write about "releasing radioactive gases into the atmosphere" the reality is that such releases are pretty harmless. These gases are typically biologically and physically inert and quickly dissipate in the wind to harmless background radiation levels. One of the major lessons learned from TMI was that the more dangerous biologically active materials like radioactive iodine or potassium do not escape and tend to stick to other material even in a core melt. That is if you have a containment building, unlike Chernobyl.
7) It is insulting to say that the operators did not know what was going on with the reactor "so they guessed" as if they started pushing buttons and pulling levers willy-nilly. The operators knew that the information they were receiving was not complete or wrong. The biggest problem was that their training was flawed and incorporated an assumption that was incorrect -- thus leading them to take actions that made the situation worse.
About the only thing that I agree with Cringely on is that we should be building nuclear reactors now."
I reckon your kind, single posting to HN from a "design engineer in the nuclear business"... who thinks the events leading to a partial nuclear meltdown at a LWR like TMI reflect a kind of engineering triumph and the kind of statistical confidence (that Feynman calls out) that would lead engineers to conclude that LWRs could never suffer a nuclear meltdown....instead of the requested posting of a link to a consensus statement about the impossibility of a major nuclear reactor accident by all nuclear engineers....is better than nothing.
Just kidding with you some. But I think we're done here. Have a nice day.
You and the parent for some reason also want to vehemently argue a really strange, irrelevant point, that Chernobyl was worse than TMI. Is that in dispute?
So what? The Apollo I fire was not as bad as the Space Shuttle disasters. Your point is....what?
Apollo I, Challenger, Columbia, Chernobyl, and TMI came about in large measure precisely as a result of the kind of silliness that Feynman decries, and that you and the parent for some reason don't want to address.
You do realise this is exactly the attitude to safety that Feynman calls out as having lead to the Challenger disaster, right? There were zero deaths from space shuttle accidents until that happened, so NASA dismissed all the close calls and unexpected issues that were warning them it wasn't as safe as their predictions claimed. To quote Feynman, "When playing Russian roulette the fact that the first shot got off safely is little comfort for the next."
Once we have inherently safe reactors powered by safer fuels, we'll be dope-slapping ourselves about those pressurized water uranium reactors while saying those same words.
Different designs matter. For example, airliners have gotten enormously safer over the years. You simply cannot conflate the safety record of 1930's airliners with modern airliners.
There were seventeen Apollo missions that flew (1 was cancelled due to a fatal ground accident so that leaves 2-17 plus the Apollo-Soyuz docking). There were no in-flight Shuttle fatalities in its first seventeen missions either.
You're worried about three meltdowns, but not the 2,000 nuclear weapons that have been detonated since the start of the atomic age? Compared to bomb tests, the fallout from nuclear power is a rounding error.
But let's review those three meltdowns. Three Mile Island didn't cause a single case of cancer. Chernobyl was a ridiculously unsafe design run with incredible incompetence. The WHO estimates it caused 40,000 cases of cancer and 4,000 deaths from those cancers. That may sound bad, but pollution from coal kills around 30,000 people each year just in the US. Finally, there's Fukushima. It's expected to cause 100-200 premature deaths from cancers. Remember, the cause of that meltdown was an earthquake-tsunami combo that killed almost 20,000 people.
If anything, shutting down nuclear power worsens public health. Demand for electricity isn't going to go down, so we end up burning coal, and coal plants are much worse than nuclear. From http://squid314.livejournal.com/292620.html:
According to the Clean Air Task Force, coal plants kill about thirty thousand people per year in the US through pollution (which causes respiratory disease). There are six hundred coal plants, so that's about 50 deaths per plant. These numbers are much higher - maybe even by an order of magnitude - in Chinese and third-world coal plants, which lack the US' stringent environmental restrictions.
Even in the worst-case scenario, nuclear power still does better than coal:
When you hit a nuclear plant with the fifth largest earthquake ever recorded, then immediately follow that with a twenty foot high tsunami, and then it explodes, it still kills fewer people than an average coal plant does every single year when everything goes perfectly.
There are lots of things that cause long-term harm to the environment: toxic chemicals, heavy metals, and yes, radioisotopes. If we want to enjoy a first-world quality of life, we have to accept some pollution. The least harmful energy source today is nuclear power.
I'd actually argue that the RMBK-1000 reactor is not horrifically unsafe, so long as it's operated within its design envelope - which is to say, for example, not overriding the automated safety systems. Had it been built with a containment vessel, I'd even argue it was within the safety margins of western reactors. Chernobyl was incompetent crew largely, taking a design well outside of its normal safety margins by shutting down nearly every safety system the reactor had - these choices compounded the design limitations of the reactor. Under no circumstances should the control rods have been removed all the way from the reactor - this single action caused the meltdown (combined with the graphite tips on the control rods).
We know the risks of nuclear power, if you do it wrong it does kill people, and salts the earth in the vicinity of the unit for generations. This is better in my opinion than killing people who live in the vicinity of a power plant as a course of normal operation.
A reactor complex like Fukushima has ~100 tons of reactor fuel per reactor, not counting spent fuel rods. Say 1000 tons total.
A bomb has a few kg of fissionable material, maybe 100,000 times less than a reactor complex. Do the math, if a reactor fails containment, the contaminants dwarf that of a bomb.
I think Fukushima contaminants have tripled the radioactive cesium in the Pacific still left over from the 2000 bombs you mention.
Nuclear is an outdated cold-war technology that won't compete with solar economically. They are slowly being phased out, down from 17% electric worldwide a few years ago to 10% now.
I don't think you addressed most of my points, but I feel I need to respond to what you've thrown out here.
Only 1-3% of a fuel rod's mass is fissile, and containment failure doesn't aerosolize a kiloton of fuel rods. Almost all the material stays at the site. While the smallest possible bomb is about 20kg of fissile material, actual bombs are an order of magnitude bigger. In addition, bombs cause neutron activation in the environment and create a mushroom cloud that lifts fallout into the stratosphere. The immediate effects are much worse and much more widespread.
You can say scary things like, "tripled the radioactive cesium in the Pacific", but Fukushima leaked around 9kg of cesium radioisotopes. Outside of the immediate area, there's no risk to marine or human life. From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3341070/:
We address risks to public health and marine biota by showing that though Cs isotopes are elevated 10–1,000× over prior levels in waters off Japan, radiation risks due to these radionuclides are below those generally considered harmful to marine animals and human consumers, and even below those from naturally occurring radionuclides.
A useful measure of power plant capacity is the "terawatt", equal to one trillion watts. Coal kills 2500 times as many people per terawatt as nuclear. In fact, nuclear power has the lowest fatality per terawatt of any form of power in existence. Rooftop solar power has a per terawatt death rate ten times worse than nuclear power because of - I kid you not - people falling off roofs when installing the panel. Hydroelectric power has a worse fatality rate because of dams bursting and flooding people. Even wind power has a worse fatality per terawatt rate - seventy three people have died in windmill related accidents.
You responded with the economics of solar power instead of addressing my points about safety, but as far as I can tell, you're incorrect on that. By every measure I could find, solar was over twice the cost of nuclear. It would break the chart if it was shown on http://en.wikipedia.org/wiki/Template:Cost_of_energy_sources. While nuclear power has decreased recently, the reason is political ass-covering, not economics. And it's not solar that's picking up the slack. Fossil fuels are the winners: http://en.wikipedia.org/wiki/File:World_energy_consumption.s...
There are two reasons for this: 1. Fossil fuels are cheap, mostly because their cost to the environment and public health aren't priced in. 2. They are reliable base-band sources of power. Coal works no matter how cloudy it gets or how calm the wind is. Without reliable, predictable generation, wind and solar require expensive (and sometimes dangerous) energy storage systems. At best, they can only supplement base-band sources.
Well, this is certainly relevant to Feynman's essay.
You argue that nuclear power is safer than other forms because there haven't been any accidents that have killed a lot of people, yet. I would guess you would put the odds of a future accident at a probability similar to the NASA managers mentioned by Feynman. Feynman would respond that past avoidance of accidents doesn't mean you can say the probably of future accidents is negligible.
If nuclear is so "safe", why does the industry need liability limits where the taxpayers pay any damages above some minimal amount? Reactor operators should just buy insurance for the maximum possible damage an accident would cause. Of course they'd shut down if they had to do that.
You originally claimed that the fallout from bomb testing was "a rounding error" compared to the fallout from a nuclear accident. Now you accept that the radioactive cesium from Fukushima has tripled the cesium in the Pacific from all 2000 bomb tests? Not exactly a "rounding error". (although I agree, outside the immediate vicinity of the accident, the cesium isn't especially dangerous)
Regarding economics, simply look at the trends. Solar panels are getting cheaper at an insane rate. Nuclear plant construction costs have doubled over the past ten years. Even completely amortized plants are shutting down in the US because they can't compete, about five in the last year or two. Any taxpayers or ratepayers funding a new plant will never get their money back, as solar will be so cheap by the time the plant is ready to start up it will be instantly mothballed.
> By every measure I could find, solar was over twice the cost of nuclear.
So, please by patient with me, i'm not trying to make you look dumb or anything. It's just the from what I've researched, solar power pays for itself in 5-10 years depending on location. Meanwhile nuclear power has many externalities which are taxpayer-subsidized. It seems to me the reasons we haven't "gone solar" are not to do directly with cost, but of waiting until the technology is mature before investing gangbusters.
As for the dangers of storing electricity generated through solar energy, if a nuclear power plant can be run safely, couldn't an electrical storage plant also be run safely?
> Meanwhile nuclear power has many externalities which are taxpayer-subsidized.
And how many of those are taxpayer-imposed? Illogical and uninformed anti-nuclear activists prevent new reactors from being built, leaving us with increasingly dangerous old reactors or even more harmful coal plants. Political bullshit prevents breeder/burner reactors from making nuclear waste a manageable problem. Thorium gets no research money because it's useless for weapons.
That is ridiculous. Let's say DWave was even better in what it actually produced, but even worse in its PR: it creates a true quantum cumputer proof of concept.
But it is even worse at public relations and EVERYONE thinks that it is a scam with rigged demos. It has no credibility.
Now I ask you: in this thought experiment, is nature going to fund your quantum company, because you actually kicked nature's ass and proved a true quantum computer in concept?
No. You have to actually maintain real credibility, much as the space program did.
Nature can't be fooled, but Nature also doesn't fund shit. Whether the government, VC's, or the people, only people fund people.
I think you are not wrong for saying you need to convince the public for giant expenditures, but if all of those things are not doing as they pretend to be, people die and all credibility is lost anyway.
This is a sad reality, different from the "reality" in the sentence. So I have to say: this is a sad fact. But I love that sentence. The nature cannot be fooled eventually.
There were 135[1] Space Shuttle missions with 2 resulting in human casualties (Challenger and Columbia disasters).
Thus, a failure with loss of vehicle and of human life of 1.48 in 100.
The estimates range from roughly 1 in 100 to 1 in 100,000. The higher figures come from the working engineers, and the very low figures from management.
The reality was even more dangerous than the engineers had predicted, and far more dangerous than management had.
One of the major problems with the shuttle is that the design of the Space Shuttle meant that a loss of mission pretty much also meant the loss of the crew as well. A 1 in 70 accident rate is not nearly as big a deal if there's a 90%+ chance of the crew surviving the accident. (I just pulled the 90% number out of the air -- I've never seen any actual estimates for the effectiveness of launch escape systems in the event of accidents. In practice the success rate has been 1/1.)
I would suspect the engineering estimate had one or at most two sig figs, which isn't bad compared to results. I'm sure the management estimate had six sig figures of course.
When NASA was comparing designs for their post-shuttle rocket recently, they literally used four significant figures in the risk estimates [1], and used this estimate to pick the design. It seems crazy to me to have four sig figs of reliability for designs that were basically at the PowerPoint stage.
NASA started with the requirement that their new rocket have less than 1 in 1000 odds of loss of crew (LOC). They concluded that using an existing Atlas V had LOC odds of 1 in 957 (unacceptable), while the paper design of putting a capsule on top of a Shuttle booster had LOC odds of 1 in 1918 (totally acceptable). They then quoted this 1,918 number in a lot of places to justify the program.
This rocket was the Ares-I [2], which turned into a fiasco and was canceled four years ago.
My conclusion is that NASA's current risk assessments are as bogus as the ones for the space shuttle. They start with an unrealistic goal (1 in 1000 risk), make totally unjustifiable estimates to meet the goal, and then make bad decisions based on these estimates. Coincidentally, the decisions based on these estimates line up with the politically-desirable outcome.
The 1 in 1918 risk assessment turned out to be totally wrong, of course. The Air Force pointed out that the launch escape system wouldn't work since burning fuel would melt the parachute and everyone would die. [3]
My personal view is that NASA needs to admit that rockets are dangerous and you probably can't get the risk below 1 in 100. Then NASA can focus on doing the best job they can. [4]
This is an absolute classic of engineering literature. The last sentence, perhaps deservedly, gets most of the glory, but the whole piece should be under every engineer's and every manager's fingers.
I constantly see the dynamic observed in the first paragraph, and it would seem that the question "What is the cause of management's fantastic faith in the machinery?" is eternal.
I think management's big problem is that they are often confused by the difference between what they need and what they have. Of course I've spent a lot of my career working for venture capital-based startups where this problem might naturally be more prevalent.
I would phrase it slightly differently: the problem is that when human beings know that they need something, and also know that they will not be allowed to get it, they are remarkably successful at convincing themselves that they don't really need it after all.
Take the infamous O-rings from the Shuttle's solid rocket boosters, for instance. As Feynman notes in this appendix, the appearance of erosion on the O-rings was an indication that their design was fundamentally flawed. So why did NASA's engineers twist themselves into pretzels to argue that everything was fine? The reason is because they knew that there was no chance that those boosters were going to be redesigned. The political will for that, in NASA and in Washington, just was not there. Even if the engineers threw down their tools and refused to launch any more Shuttles on the grounds that they were unsafe, all that would happen would be that the Powers That Be would come down on them like a ton of bricks and force them either to shut up and get back to work, or to get out of NASA and therefore wreck their careers.
When people are forced to choose between bad options, the easiest thing to do is usually nothing. So that's what NASA's engineers did. And since "I know this thing I work on is likely to kill people, and I'm not going to do anything about that" is the kind of self-knowledge that leads to cognitive dissonance (http://en.wikipedia.org/wiki/Cognitive_dissonance), they put together ad-hoc rationalizations to help them live with it. Rationalizations like the misuse of the term "safety factor" that Feynman flagged, for example; if you can twist that term so it fits the facts in front of you, you can convince yourself that the Shuttle isn't really unsafe. Which takes care of the cognitive dissonance.
You can see this cycle all the time in software development, too. How many systems have you worked on that weren't really secure, but you know that management doesn't have the stomach to accept the tradeoffs -- the extra cost, the reduced convenience -- it would take to make them secure? What happens in those cases? In the ones I've seen, the people involved just convince themselves that the system isn't insecure after all. "We've never been hacked; if the system was insecure, we would have been hacked; therefore, the system is not insecure." It's not great logic, but when you desperately want to believe something it doesn't have to be great logic to convince you. It just has to tell you what you already want to hear.
The engineers knew why there was blow-by on the O-ring -- joint rotation. The obvious solution was to redesign the field joint to resist this rotation, so that the O-ring would never be exposed to hot gases. All they needed was time and money.
Then Challenger happened. Now, no Shuttle could ever launch again until the joint was redesigned and proven to work. Time and money were no object.
Management's main problem is that they believe the problem is one of people, and designing counter-productive incentive systems to try to get people to do better work, with raises and firings and carrot-and-stick motivation, and not, fundamentally, in designing systems for quality.
This may apply less to NASA, but it does apply to the business world, yet the result is the same. Failures and defects and poor quality are primarily the result of incomplete systematic control over the end product.
So in that sense, I saw this more as an astute diatribe on the management issues of 20th century America than an engineering piece. The engineering here is known quite well; any engineer can explain it thoroughly as Feynman has done.
The part that flies in the face of truth is how we manage people.
For anyone else interested in how the organizational incentives and
institutional culture at NASA helped to set the stage for the
Challenger disaster, I highly recommend The Challenger Launch
Decision[1] by Diane Vaughan.
From the New York Times review[2]:
In "The Challenger Launch Decision" Diane Vaughan, a sociologist at
Boston College, takes up where the Rogers Commission and Claus Jensen
leave off. She finds the traditional explanation of the accident --
"amorally calculating managers intentionally violating rules" -- to
be profoundly unsatisfactory. Why, she asks, would they knowingly
indulge such a risk when the future of the space program, to say
nothing of the lives of the astronauts, hung in the balance?
"It defied my understanding," she says.
Even with Feynman's carefully reasoned essay, the next shuttle disaster was a mirror of the first: chunks of foam falling of each flight, careful monitoring but no serious action until the foam resulted in a loss of a vehicle.
The first loss was after careful monitoring of near-burn throughs of the SRB o-rings on many flights, but no decisive action.
It really astounded me when I learned no shuttle was inspected for damage while in orbit until after the loss of the Columbia.
The shuttle was an experimental vehicle. It was their job to gather as much data as possible on it. With that, the foam problem would have become evident long before the deaths of the Columbia crew.
> It really astounded me when I learned no shuttle was inspected for damage while in orbit until after the loss of the Columbia.
Atlantis was hit by tank insulation during launch of STS-27 (1988). While in space, the astronauts inspected the tiles using a camera mounted on the Canadarm. Hoot Gibson says that it "looked like it had been blasted by a shotgun." They were convinced that they were going to die on re-entry, and Gibson had even planned what he'd say to Houston in his final transmission before he died.
They got very, very lucky. There were 700 damaged tiles, but only one missing tile. And the one missing tile just happened to be over an antenna mounting bracket, which made the structure stronger than the rest of the wing. If the foam had hit just a couple of tiles over, then they would've died.
This was the second flight after the Challenger disaster. It might have ended the Shuttle program, right then and there.
Another account can be found in Mike Mullane's memoirs, Riding Rockets. Mullane was a mission specialist on STS-27, and operated the Canadarm during the tile inspection. He says that he kept thinking about the tiles, and had trouble sleeping. Hoot Gibson advised him: "No reason to die all tensed up."
Remember -- STS-27 was the second mission after the Challenger explosion. Not enough time yet for complacency to have set in. In addition, Mike Mullane had an it-was-absolutely-not-an-affair-we-were-just-friends relationship with Judy Resnick. So he knew very personally just how dangerous the Shuttle could be.
As I recall, part of the rationale, was that even if there was an issue, there was literally zero way for it to be fixed. So now you spend the next 2-3 days in the shuttle, "doing your job" knowing that you will not return alive. With that in mind, the though initially was, "let's just not check".
Not true, there would've been options available. After the loss of Columbia, engineers came up with two ways to save the crew: a rescue mission, and an emergency repair EVA.
The rescue mission would've been hazardous, but the expected loss of life would have been negative. (More likely to save seven astronauts on the Columbia than to lose two astronauts on the rescue mission.) The repair would've been jury-rigged and may not have worked, but it would have been better than reentering without attempting to repair the damage.
Once you'd inspected the Shuttle, you'd know that it was in pretty bad shape. And then you've have moved into Apollo 13 mode -- how do we come up with a way to save the crew? And maybe it would've worked. It's only because the Shuttle was not inspected that NASA proceeded as though nothing were wrong.
Incidentally, they had the STS-27 astronauts check for tile damage, back in 1988. The astronauts were convinced that they were going to die on re-entry, but they still did their jobs for the rest of the mission.
This also means that NASA had 15 years to develop in-orbit repair methods before Columbia needed them. But nothing was done in this area either.
While it may have made sense in the pre-Mir days, it made none whatsoever on missions that serviced space stations. Knowing the extent of damage possible during ascent would have changed the rules by which shuttles operated.
I think root problem is that shuttle was starting and landing with people. They should use it just for cargo, second rocket with Apollo should have transport people from/to orbit.
They stuck out where they could be struck by stuff. They precluded the capsule eject system. They cost Shuttle tons of fuel. For a system used in only one part it's flight regime.
Why _why_ did Shuttle have wings? The Air Force insisted. So they could launch Shuttle into polar orbit from Vandenberg. Shuttle would need the wings to come _back_ to Vandenberg.
Then the Air Force withdrew from the program. Too late to get rid of the terrible wings, however.
It's a legitimate question to ask if the wings were worth the trouble. However, the real problem from a safety standpoint was that Shuttle was riding on the side of the stack rather than on top.
Even if that's the case, those lower wings aren't as vulnerable to minor damage since you don't need them for re-entry. That said, I am firmly in the "wings aren't worth it" camp anyway. Just put a capsule on top.
Elon musk explained many times how Dragon will land with helicopter precision. Currently it lands with precision of a few kilometers( because of the wind, they use parachutes ).
They can do that because even on a symetrical capsule you can have lift vector control.
I can't find the video, but you can look it up with google. It is very easy to find.
In case you missed it Discovery Channel aired a pretty interesting Challenger docu(drama?) in November. The show portrayed Dr. Feynman's path to reaching these conclusions. Pretty interesting and disturbing at the same time.
The article talks about how unlikely the shuttle was to achieve the expected 500 flights, and would more likely only have 200 flights. (Real number: 135)
Some of the quotes from the article are scary in retrospect:
Quote: "Here's the plan. Suppose one of the solid-fueled boosters fails. The plan is, you die."
Another quote: "When Columbia's tiles started popping off in a stiff breeze, it occurred to engineers that ice chunks from the tank would crash into the tiles during the sonic chaos of launch: Goodbye, Columbia."
Remember, this article is from 1980, before the shuttle launched.
[1] http://www.washingtonmonthly.com/features/2001/8004.easterbr...