That to me is pretty disgusting. In an incident like the loss of Columbia, there is no one, true "root cause". To assign blame to those foam technicians was disingenuous and just another instance of "passing the buck" that seems to happen so often in the post-mortem of NASA failures. NASA knew of earlier foam strikes (STS-112) yet chose to continue flying without diagnosing the problem. Even during the tragic STS-107 flight, engineers knew of foam strikes and their concerns were ignored. Even though they would have been almost completely powerless to remedy the situation on STS-107, the higher-ups decided to continue on with the mission instead of addressing the concerns with the heat shields. The article author states later in the article that he apologized to the foam technicians. Commendable, but I am still bothered by the fact that NASA was initially so eager to place the blame on a single contractor instead of owning up to their own culpability. Leadership and responsibility needs to come from the top, especially in such a prestigious organization!
When he talks about it being their fault, it's not that those engineers are being singled out for punishment and derision. They had to find out where the problem existed that led to the loss of Columbia, and after extremely thorough testing they believed it existed with the foam team. It's simply a matter of finding where a problem is and doing everything you can to fix it.
So it's not a personal, vindictive "your fault", its an impersonal "the problem is here, lets fix it".
It's not. That's complete bullshit, and the On-Board Software Group demonstrated it by being as flawless as can be during the whole history of the Shuttle: as far as I know there was no personal accountability in the OSG, the only thing accountable was The Process supported by a strong culture of adversarial testing.
Personal accountability in such a system brings politics and career advancement in focus and leads to issues being shoved under the rug when inconvenient and energy being expended in finger-pointing and blame games rather than fixing problems.
> So it's not a personal, vindictive "your fault", its an impersonal "the problem is here, lets fix it".
No, it's not. Accountability is very precisely "your fault", that's all it is. That's pretty much the definition of it.
Left to its own devices, the group would most certainly have operated as it did every time it found fault in its output: find out how The Process had allowed for a fault to be introduced and reach output, find out how to make The Process prevent the introduction and/or release of such faults, fix The Process.
So no, the "exact same accountability process" would most definitely not have been initiated within the group, a very different one would have taken place.
That. Accountability (worse: "personal accountability" and variants thereof) are a tool provided by the law to determine who to recover damages from after a failure has occurred. It is entirely unsuitable for failure prevention because it is entirely orthogonal to rigorous testing and a culture of workplace safety.
Anyone who insists on accountability on their project does not know what they need. What they get is an extraordinary amount of ass-covering, finger-pointing and blame deflection, though.
Perhaps I was a little too brief in my writeup. Nobody that I know (certainly not me) went to MAF and downdressed any of the workers. What happened was that the conclusion was reached in engineering and management meetings and the word filtered out to the workers that poor workmanship was the proximate cause of the loss of Columbia – as if there wasn’t enough blame to go around in many other areas. I really regret the erroneous conclusion, the impact it made on the workers, and the way the whole scenario played out at MAF. The people there were very hard working, dedicated, and proud of their involvement with America’s space program, many of them second or third generation workers at that location. Now, of course, they have all been laid off and the MAF plant is virtually a ghost town with very limited work going on there for other NASA or commercial projects.
This is stupid, perhaps wilfully so. The fact that foam can come off the external tank and strike critical parts of the shuttle is a fatal design flaw. Insulation has been used on rockets for sixty years now, and it has been observed to come loose every now an then. The difference is this: on a rocket it just falls off, but if any comes off the tank and strikes the Shuttle orbiter, there is disaster.
The fault was at an early stage of design.
But I think this direct wording was a rhetorical device to set the hook for the pivot in the story, when it turned out the problem with the foam wasn't inclusions during installation at all.
Some of the achievements of the Shuttle program have been inspiring, and the vehicle itself is pretty to look at, but we should have canned that program long, long ago.
2% US Space Shuttle
5% R-7 (Russian Soyuz)
5% Ariane 1-4 (European)
6% Tsyklon (Russian)
7% Kosmos (Russian)
10% Thor/Delta/N1/N2/H1 (US)
11% Titan 2/3/4 (US)
12% Proton (Russian)
13% Kosmos 2 (Russian R-12)
14% Atlas (US)
I was wrong, and you're making the mistake that I did. Namely confusing reliability and safety. A reliable rocket is one that successfully does what it is supposed to. A safe rocket is one that doesn't kill people.
The US space shuttle has proven to be more reliable than the Soyuz. It is more likely to actually get you into space. But the Soyuz has been safer than the US space shuttle. If you try to get into space on it, you're less likely to die.
If this seems impossible, consider that in both Soyuz 18a in 1975 and Soyuz T-10-1 in 1983 the rocket failed, but the cosmonauts survived. (In the first case the rocket failure happened 90 miles in the air, but the cosmonauts survived.) The space shuttle, by contrast, had no successful aborts.
That the people on that rocket escaped with "bruises" is amazing.
The Challenger explosion could have hypothetically been survivable though. In fact, the explosion itself was survived, likely by all of the crew. The crew cabin (http://upload.wikimedia.org/wikipedia/en/thumb/4/42/Challeng...) remained intact and possibly pressurized after vehicle breakup. The crew were almost certainly alive (and if the cabin remained pressurized, could have been concious as well) for nearly 3 minutes until it hit the ocean at over 200 miles per hour.
At some point during those 3 minutes, I don't know if the SR-71 ejection seats used for the first few Shuttle launches could have improved their chances of survival, but it seems at least somewhat possible that it could have. A parachute system for the crew cabin probably wouldn't work for the same reason the launch abort system on the proposed Ares was flawed (flying burning solid fuel going everywhere in the air is bad for parachutes)... nevertheless I think it is conceivable that you could build a Shuttle that would allow the crew to survive an accident like that.
But really, just stick the people on top. It makes way more sense. I know it is hard to compare the two accidents (though from what I understand, as far as solid fuel rocket failures go Challenger was pretty tame), but the contrast between Challenger and T-10-1 is something that lessons should be taken from.
I do not have data on it, but using that yardstick or even the more lenient "get into space within a month of the planned date", I think it was not very reliable. I also have the impression (but again: I do not have data) that the Soyuz is way more reliable in that respect.
Man-rated systems are DESIGNED to be much safer. The trade-off involving dollars is entirely different. You can't criticize a non-man-rated system for blowing up any more than you can criticize a UDP packet for not getting through: that trade-off was engineered in.
>> About five percent of the people that have been launched have died doing so. [..] About two percent of the manned launch/reentry attempts have killed their crew, with Soyuz and the Shuttle having almost the same death percentage rates.
We certainly learned a lot from Shuttle operations but in terms of spacecraft design mostly we learned what not to do.
This part is one of the more disturbing parts though, and a good reminder of why technical persons of all fields, whether rocket scientists or programmers, should not adopt a "Well, we worked hard and we're smart so I'm sure everything's fixed"
> What you probably don’t know is that a side note in a final briefing before Discovery’s flight pointed out that the large chunk of foam that brought down Columbia could not have been liberated from an internal installation defect. Hmm. After 26 months of work, nobody knew how to address that little statement. Of course we had fixed everything. What else could there be? What else could we do? We were exhausted with study, test, redesign. We decided to fly.
How is it that this mentality exists at NASA? Isn't it a matter of logic that if the foam was shown not to have been an installation defect, that the engineers have to keep looking for the actual cause? The OP just brushes over this but surely there was some kind of debate, like: "Well, the particular test claiming that the foam was NOT an installation defect was poorly conducted, and all our other measurements say that the installation is the likely cause, so moving on..."
I really hope there isn't some kind of "Oh fuck it, just ship it" mentality at NASA.
Despite all the "if it's not safe, say so" posters (e.g., http://www.dpvintageposters.com/cgi-local/detail.cgi?d=9203), the anonymous tip lines, and everything else, it's hard to stand up and say that something is not safe enough, or that this cause has not been fully nailed down. Because it's usually a qualitative thing, and careers and programs are at stake.
I was at a large auditorium at JSC (Houston) once. It's where the big pre-launch briefings are held. They had installed phone handsets all over the periphery and aisles of the room so that anyone could easily stop a briefing to ask a question. (I've never seen a capability quite like that in an auditorium.)
The room had (IIRC) around 200 seats. It's hard to be the guy who stands up and stops the briefing to ask the key question. Even though a lot of infrastructure has been created to make it possible.
Granted, it's a terribly hard thing to fix, getting the right information to the right people with the right priority. But this shows how critical it is to do just that.
 Obligatory Tufte comments:
(See also Feynman's comments on his experience on the investigation board in Surely You're Joking)
You have a stated problem "the foam that came off didn't come off because of the reasons we thought it did." Now you have no other ideas besides what you've already considered and tested for 26 months. What do you do? Possibly spend another 2 years investigating and find nothing? Or conclude that the risk is small enough to fly while being vigilant about the problem and looking for more data to lead you in the right direction?
Sometimes the only way to get more data to solve the problem is to do the very thing that causes it, while hoping that you've mitigated its effects well enough that the system is still safe.
The OP doesn't say how conclusive this "side note" was, or if it was one such note among many others. If it is the latter situation, then yes, it's understandable that it was seen as an acceptable blind spot.
But the situation, as the OP describes it, sounds pretty clear cut: The foam issues could come from poor installation procedures. But testing found that the defective foam "could not have been liberated from an internal installation defect"...
So I'm just interested in knowing the level of conclusiveness in that sidenote.
Sorry Wayne, it seems to me that you launched knowing there was an unresolved problem, not unlike the Challenger accident decision. What else could you do? Ground the vehicle until the problem is fixed!! The crews’ lives and the future of NASA was at stake.
... to which Hale replied, simply, "Yep."
NASA could have focused more on great science programs (like the Mars rovers, unmanned deep space probes, planetary science -- think of what they could accomplish with even 50% of the current overall NASA budget), military and government launch could have continued with ICBM-derived rockets, and private space could have gotten an earlier start.
Exactly, yes. The design should have been revised until they weren't pushing safety margins so hard. Of course, that would have been an engineer-led approach, which is the opposite approach from the one they used.
The big take away from this is what it means to be a good engineer: to be able to bow your head, and admit you were wrong despite all prior evidence.
No, I think being a good engineer means building good things. When the things get sufficiently complex, that starts to require control of your ego (what you described), being a good scientist/investigator, organizational skills, etc.
Note to people who didn't read the appendix - he touches specifically software development in the latter part of his note.
THAT'S THE PROBLEM WITH NASA!
In any other situation, when faced with such a dangerous close call, there would have been emotion and strong language used. But in NASAworld, that's all considered verboten. As Mr. Hale points out in his post, these people were his friends. He knew their families well. They weren't just employees. They dodged a bullet, and all he could call it was "unsatisfactory."
I'm not asking NASA to be full of raving loons. But show some goddamn emotion from time to time! One of the most wonderful things about Curiosity was not just the amazing landing, but the sheer jubilation the JPL team went through once they realized their little rover had safely survived the "7 minutes of terror" and landed. For 10 minutes, they hugged, shouted, and cheered. For crying out loud, the flight director had a mohawk! I have no doubt that by showing themselves as fully human, these amazing people just created a whole new generation of kids who will dream of sending probes to faraway places like Europa, Titan, and beyond.
Bottom line: I admire Mr. Hale's honesty in hindsight. But his bland non-emotionalism is one of the reasons people just don't care about space anymore. Make it exciting and demonstrate emotion, and people will care. Act all Spock-like 100% of the time and people will think you DON'T care (so why should they?)
Until we can say we got this getting to space thing, spacecraft should be considered research vehicles and information on every single aspect of their operation has to be gathered. When the Columbia was lost, I was appalled nobody ever inspected the heatshield for damage occurred during lift-off after more than 100 flights. Even if you consider it dangerous (or too much work) to have an astronaut visually inspect it, this could have been done from the Mir space station.
Many spacecraft were lost to arrogance, to the false certainty we know what we are doing when, in fact, we are still learning.
"There is a saying that a wise old program manager once passed along to me: “Great engineers, given unlimited resources and time will achieve exactly . . . . nothing”
Think about it."
It's amazing to hear someone be so honest about this.
(emphasis added by me)