
The result would be a catastrophe (1985) - moviuro
http://www.lettersofnote.com/2009/10/result-would-be-catastrophe.html
======
lutusp
A classic tale of engineering gone wrong. Apropos, I designed high-efficiency
20 KHz inverters for the Space Shuttle crew quarters, flight hardware that
ended up working flawlessly. But during the design phase, the primary Shuttle
contractor revealed that my inverters would be exposed to much higher voltages
than they had originally supposed.

When I saw this, I realized we would have to start over and choose different
components able to tolerate the higher voltages. My managers disagreed,
arguing that if we caused problems, we would be denied a follow-on contract
for similar inverters to be deployed in the payload bay. The managers composed
a reply to NASA in which there was no problem, things were fine, let's forge
ahead.

I was a mere engineer, I had no management authority, and I hadn't been
consulted about the reply. When I heard about it, I sat down and wrote a
letter of resignation and pushed my letter into more hands than was absolutely
necessary. I made some comparisons that at the time might have seemed over the
top (like the Apollo file that killed three astronauts, a disaster resulting
from lax oversight).

In my case, because of my having distributed the letter farther than was
absolutely necessary, my managers were forced to reverse themselves, I was
able to redesign my inverters in a safe way, and we got the follow-on contract
in spite of not being seen as "team players".

Many years later, at the time of the Challenger disaster, it finally dawned on
me that, had I disregarded the overvoltage issue as my managers had wanted me
to, and if something had gone wrong, I would have been held personally
responsible, because I was the only person with the level of technical
knowledge required to make the call, and my managers could disavow any
responsibility. At the time, I made the right decision, but for reasons that I
hadn't fully thought out -- if my equipment had failed in-flight, I would have
been held responsible, and that would have been a perfectly just outcome.

------
steven777400
Nothing I've worked on has been nearly to this scale, but having management
understand and ignore warnings is commonplace.

Just today, I pointed out that a backend web service system for an important
(HR) function was so insecure that simply putting a single quote in an end-
user input field would crash it, and that a SQL injection would be trivial.
After a little arguing from them ("You don't know what a SQL injection is")
they changed their tune to "Just put a limit on the frontend", and when I told
them that could be trivially bypassed, they said "No one will ever think to
try that." When I pressed further, they continued to "It's an internal app, no
one would attack it" then to "And do you want me to put locks on all the doors
and cabinets around the office too?"

So that's a case where there's some reasonable level of understanding and just
a refusal to act.

I also recall a time (at a different company) when I discovered a public
read/write anonymous FTP server setup externally facing with no firewall
rules. When I pointed out the problem, they said, "that's what our customers
are used to, we're not going to change it." and when I pushed, "No one will
ever find it anyways." Literally less than a month later that server was
overrun with illicit content.

It happens. Unfortunately, as we've seen with the Volkswagen debacle as well,
it happens even when regulation or (in the Challenger case) human life is on
the line.

~~~
sillysaurus3
They possibly felt that you were suggesting it was a lot of work to fix this,
when in reality it would take a couple of hours to add sanitization checks to
the inputs.

You're also overestimating the business impact of a security breach. This is
an area where people's moral compass overrides their shrewdness. Notice that
Anthem is still in business despite the worst possible breach, for example.

It's worth trying to improve security, if possible. But it takes one of two
things: (a) leverage, or (b) patience. One of the main reasons that companies
get a security audit is because a different company forces them to. They get
an audit in order to obtain a CFD (client facing document) saying that they
are secure. Without this, the other company will not do business with them,
which is the sole reason why the security audit happens.

The other way -- patiently pointing out ways of improving the situation, and
explaining the business merits of allocating time to this pursuit -- does not
usually result in meaningful security improvements. This is an unfortunate
fact of the industry, partly because security breaches do not usually put a
company out of business. There are exceptions to this, but that is the common
case.

~~~
steven777400
To be fair, it would be a lot of work to fix the backend. The SQL is
dynamically generated for every request, and the code that does so is
thousands of lines long, with nested subroutines that construct sub-portions
of the query, so there is no easy way to check where things need to be
parameterized. Also there is some injection from unusual places (like a search
options object separate from the search criteria object which includes the
ASC/DESC which, you guessed it, is directly injected).

Realistically, to be safe, you'd have to gut the whole thing and replace it.
You can be a "little safe" by doing something silly like replacing ' with ''
but that doesn't protect in non-string cases, etc, and I wouldn't propose a
solution like that because it could give a false sense of security.

I understand the push-back. The backend is maintained by others and it would
add to their workload to secure it. That's a real and actual cost.

As for the business impact, you are right. I just wanted it to be "on the
table" that I informed people of the situation, so that if there is an exploit
in the future, I don't get the "why didn't you tell anyone this was
vulnerable?" or something like that.

~~~
sillysaurus3
An attacker can't necessarily leverage ASC/DESC injection unless multiple
queries can be issued by a single SQL statement (i.e. injecting "ASC; SELECT *
FROM...") which isn't commonly enabled in most database deployments. But there
are probably other insecurities here.

The solution is to run every input through a function that replaces each
single quote with two single quotes. That is all that's required to prevent
SQL injection, since it's not possible to construct a valid query regardless
of the injection point. (EDIT: This refers to string inputs. Numeric inputs
are handled in the obvious way, as is ASC/DESC. Injection into an ORDER BY
clause is not usually exploitable. These are all of the cases.)

If you propose this solution to management, they may be somewhat more likely
to take action, since it can be applied to the existing system.

~~~
steven777400
I respectfully disagree with your claim that replacing ' with '' will complete
the security. Reason: there are non-string cases, so there exists
concatenations like: "WHERE SomeNumber = " \+ inputValue + " ... "

but inputValue is a string (from untyped xml), but is put into SQL without
quotes because SomeNumber is an int type in the database. Since the xml is
constructed without validation, an attacker could put any value there,
including strings to inject, and do so without using quotes.

~~~
sillysaurus3
Yes, numeric cases are solved by running the input through a function that
allows only numeric characters, [-.0-9]. That's the only other case.

Management is more likely to listen to this, since it doesn't require a
rewrite.

~~~
continuational
How can you be so sure that's the only other case?

------
thoughtsimple
I worked in a company that made CNC machine tools at the time of the
Challenger disaster. The actions of Roger Boisjoly gave me the courage to
reject a demand from management that I add a "feature" to a control system I
was working on to disable all safety lockouts from a single virtual switch.

I refused the assignment. When they didn't fire me on the spot I followed the
managers who made the request and when they would ask another engineer to do
it, I would warn the engineer that they could be held personally responsible
for any injuries or fatalities that could occur.

I fully expected to be fired but surprisingly, management just gave up on the
idea. Perhaps my insistence that it was a bad idea convinced them.

~~~
JustSomeNobody
Perhaps they consulted with HR/Legal and were told firing you would be a bad
idea.

~~~
meric
Or Legal told them it was bad idea to ignore safety advice from their
engineers?

------
Houshalter
I'm not saying NASA didn't fuck up by any means. I'd just like to point out
that hindsight bias is a very real effect that can have huge consequences when
planning for future events:
[http://lesswrong.com/lw/il/hindsight_bias/](http://lesswrong.com/lw/il/hindsight_bias/)

>Viewing history through the lens of hindsight, we vastly underestimate the
cost of effective safety precautions. In 1986, the Challenger exploded for
reasons traced to an O-ring losing flexibility at low temperature. There were
warning signs of a problem with the O-rings. But preventing the Challenger
disaster would have required, not attending to the problem with the O-rings,
but attending to every warning sign which seemed as severe as the O-ring
problem, without benefit of hindsight. It could have been done, but it would
have required a general policy much more expensive than just fixing the
O-Rings.

>Shortly after September 11th 2001, I thought to myself, and now someone will
turn up minor intelligence warnings of something-or-other, and then the
hindsight will begin. Yes, I'm sure they had some minor warnings of an al
Qaeda plot, but they probably also had minor warnings of mafia activity,
nuclear material for sale, and an invasion from Mars.

>Because we don't see the cost of a general policy, we learn overly specific
lessons. After September 11th, the FAA prohibited box-cutters on airplanes—as
if the problem had been the failure to take this particular "obvious"
precaution. We don't learn the general lesson: the cost of effective caution
is very high because you must attend to problems that are not as obvious now
as past problems seem in hindsight.

~~~
VikingCoder
I think your comment is great, but I have to disagree with one part in
particular. I don't agree that it was a "some minor warning" when President
Clinton told President Bush explicitly; "In his campaign, Bush had said he
thought the biggest security issue was Iraq and a national missile defense. I
told him that in my opinion, the biggest security problem was Osama bin
Laden."

~~~
yellowstuff
"Osama Bin Laden is dangerous" is pretty far from actionable intelligence,
though. I think it's an example of hindsight bias to think that if President
Bush had been more prudent in dealing with Bin Laden then the 9/11 attacks
would have been prevented.

~~~
VikingCoder
Your paraphrase is one thing.

When the current Commander in Chief of the United States of America, the most
powerful military in the world, says "the biggest security problem [is] Osama
bin Laden," that's something entirely different.

And frankly, I trust the opinion of Richard Clarke more than I trust yours OR
MINE.

~~~
Houshalter
Except after 9/11 he did say that, and sent the entire military after him. And
it took 10 years to find him.

~~~
Johnny555
To be fair, we spent most of the effort fighting in the wrong country (Iraq)
chasing the wrong person (Saddam Hussein) who had nothing to do with
orchestrating the attacks, nor had any viable "weapons of mass destruction"
that were the purported reason for going after him in the first place.

~~~
Houshalter
That was 2 years later and had nothing to do with the search for bin Laden.
Who they put out a huge search effort for, and continued doing so for the next
10 years.

------
sgentle
If you haven't read Richard Feynman's appendix to the Rogers Commission report
on the Challenger disaster, I really recommend it:
[http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/roger...](http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogers-
commission/Appendix-F.txt)

It's very readable and goes into a bunch of other issues at NASA at the time
beyond the O-ring failure, which Feynman treats as a symptom of more general
engineering and cultural problems.

~~~
danso
Feynman is well-known for his TV appearance where he dramatically demonstrated
what happened with the O-rings, live...but in his memoir, "What Do You Care
What Other People Think?", he credits being inspired by General Kutyna, who
had asked him to join the commission. Feynman describes that a random comment
from Kutyna led him down the path of revelation:

> _Then he says, “I was working on my carburetor this morning, and I was
> thinking: the shuttle took off when the temperature was 28 or 29 degrees.
> The coldest temperature previous to that was 53 degrees. You’re a professor;
> what, sir, is the effect of cold on the O-rings?” “Oh!” I said. “It makes
> them stiff. Yes, of course!” That’s all he had to tell me. It was a clue for
> which I got a lot of credit later, but it was his observation. A professor
> of theoretical physics always has to be told what to look for. He just uses
> his knowledge to explain the observations of the experimenters!_

 _However_ , Feynman admits at the end of his multi-chapter account of serving
on the commission that Kutyna had coyly _played_ Feynman:

> _Another thing I understand better now has to do with where the idea came
> from that cold affects the O-rings. It was General Kutyna who called me up
> and said, “I was working on my carburetor, and I was thinking: what is the
> effect of cold on the O-rings?” Well, it turns out that one of NASA’s own
> astronauts told him there was information, somewhere in the works of NASA,
> that the O-rings had no resilience whatever at low temperatures—and NASA
> wasn’t saying anything about it. But General Kutyna had the career of that
> astronaut to worry about, so the real question the General was thinking
> about while he was working on his carburetor was, “How can I get this
> information out without jeopardizing my astronaut friend?” His solution was
> to get the professor excited about it, and his plan worked perfectly._

If we take Feynman at his word, to me this is a great description of how
science and bureaucracy and invention (and, sometimes, disaster) works: it's
not always just random eurekas, or study-real-hard-and-you'll-figure-it-
out...sometimes the information is already well-known, and obvious, but for
various logistical/political/bureaucratic reasons, it isn't immediately
disseminated or explored.

via: Feynman, Richard P. (2011-02-14). "What Do You Care What Other People
Think?": Further Adventures of a Curious Character (Kindle Locations
2923-2930). W. W. Norton & Company. Kindle Edition.

~~~
tizzdogg
The astronaut who tipped Kutyna off was Sally Ride. He kept that secret until
after she died. He talks about it in this oral history of the disaster from
popular mechanics: [http://www.popularmechanics.com/space/a18616/an-oral-
history...](http://www.popularmechanics.com/space/a18616/an-oral-history-of-
the-space-shuttle-challenger-disaster/)

"Kutyna: On STS-51C, which flew a year before, it was 53 degrees [at launch,
then the coldest temperature recorded during a shuttle launch] and they
completely burned through the first O-ring and charred the second one. One day
[early in the investigation] Sally Ride and I were walking together. She was
on my right side and was looking straight ahead. She opened up her notebook
and with her left hand, still looking straight ahead, gave me a piece of
paper. Didn't say a single word. I look at the piece of paper. It's a NASA
document. It's got two columns on it. The first column is temperature, the
second column is resiliency of O-rings as a function of temperature. It shows
that they get stiff when it gets cold. Sally and I were really good buddies.
She figured she could trust me to give me that piece of paper and not
implicate her or the people at NASA who gave it to her, because they could all
get fired."

~~~
danso
_Wow_...you know the bureaucracy is toxic if even an _astronaut_ as well known
as Sally Ride is afraid to speak out.

~~~
schoen
... after she had been _personally appointed by President Reagan_ to determine
_that exact fact_!

(although the oral history above suggests that she was concerned for other
people's careers, not her own, but it's still kind of amazing)

------
maho
It seems almost unbelievable how such a stern, concrete warning can be
ignored. Of course, we look at this memo with perfect 20/20 hindsight, knowing
that the O-ring issue caused a disaster.

I am genuinely curious why the warning was ignored. I am hesistant to believe
it was through malice or sheer incompetence. Does upper managment get two
overblown warnings per week by engineering, so that this critical warning was
"drowned by the noise"? Or were there good (at the time!) reasons to focus on
other issues first?

~~~
danek
There was pressure to ignore warnings because President Reagan was scheduled
to be present for the launch. In higher levels of management, things get more
political. Looking good and getting exposure count for a lot. No one wants to
be the guy that made nasa look bad by slipping the launch date. In tech, most
managers tend to be rated by how well their projects meet schedules and how
much operating expense they can minimize, since that's the only thing their
managers have any sense of control over. Your boss doesn't want to take the
heat for delaying project "glitter unicorn" by a week so you can fix 2 major
security holes in the service, one which allows customers to login without a
password. And this is why your company won't spend $500 for a data backup
system or why it takes 4 months to get your broken keyboard replaced. It's
turtles all the way up.

~~~
JackFr
> There was pressure to ignore warnings because President Reagan was scheduled
> to be present for the launch.

I never heard that before. Is there a source?

~~~
danek
I was wrong--he wasn't going be attend, but there (allegedly) were plans to
have a televised conversation between him and Christa McAuliffe during his
State of the Union address that night.

However this may not be true. While trying to find an e-source, I read that
there were some politically-motivated rumors that the white house ordered the
shuttle to launch for this reason. Feynman investigated that rumor and didn't
find any evidence that the white house ordered the launch.

It's possible the detail about an on-air tv conversation between the president
and a schoolteacher astronaut could have been fabricated as well to make
Reagan look bad, or it could have been based on elements of truth. For example
she was expected to broadcast 2 lessons to students from space, so the
capability definitely existed. It doesn't seem too far out that they would
have liked for this conversation to happen during the State of the Union,
though that doesn't mean the white house ordered the launch.

I read this in Edward Tufte's essay _Visual and Statistical Thinking_
([http://www.edwardtufte.com/tufte/books_textb](http://www.edwardtufte.com/tufte/books_textb)),
which discusses how better information displays might have convinced NASA
management to postpone the launch. It also appears as a chapter in his book
_Visual Explanations_.

I didn't have any luck finding sources on the internet (lots of keyword
overlap). Tufte is much better informed than pretty much all of us, though
it's possible he printed what amounts to an unsubstantiated rumor. FWIW he
also served on the Columbia accident investigation board.

I think there were other pressures to launch as well. The launch date had been
postponed 3 times already for various reasons. _Cynical danek: I wouldn 't be
too surprised if some manager's performance review was based on how many
launches took place on the scheduled date._

------
throwaway2380
_Boisjoly later revealed this memo to the presidential commission
investigating the disaster and was then forced to leave Morton Thioklol after
been shunned by disgruntled colleagues._

I find this phenomenon so difficult to understand.

~~~
pmarreck
By ending up being the only one who was correct about a catastrophe warning,
and being vocal about it in the aftermath, you somehow implicitly throw
everyone else under the bus.

Since "everyone else" > "you", majority rules, you go.

It's a bureaucracy problem. I've seen it, and it's also gotten me at least
once, when I decided to stick up for principles. My ego left intact, my job
did not.

This is also why getting fired should not automatically carry a negative
stigma.

I have a counterpoint- I bet that, given any risky endeavour, there are ALWAYS
some naysayers/doubters. So the probability that _someone_ at an organization
ends up being correct when a disaster occurs, is probably fairly high. Thus,
just because you won the disaster prediction lottery, may not entitle you to
as much acclaim as you might think.

~~~
deciplex
> _I have a counterpoint- I bet that, given any risky endeavour, there are
> ALWAYS some naysayers /doubters. So the probability that someone at an
> organization ends up being correct when a disaster occurs, is probably
> fairly high. Thus, just because you won the disaster prediction lottery, may
> not entitle you to as much acclaim as you might think._

Conversely, achieving success despite taking a great number of risks in the
process, may not entitle you to the acclaim that you do get.

------
tokenadult
I remember the Challenger explosion like it was yesterday. I'll link here to a
scan of the pages from Edward Tufte's book _Visual Explanations_ about what
went wrong with the data analysis during the launch planning, which
understated the risk of a launch in cold weather.

[http://williamwolff.org/wp-content/uploads/2013/01/tufte-
cha...](http://williamwolff.org/wp-content/uploads/2013/01/tufte-
challenger-1997.pdf)

~~~
nehushtan
But Boisjoly rejected Tufte's analysis

[http://www.onlineethics.org/CMS/profpractice/exempindex/RB-i...](http://www.onlineethics.org/CMS/profpractice/exempindex/RB-
intro/RepMisrep.aspx)

~~~
pmahoney
Can you summarize one or two of Boisjoly's main points?

Tufte's pictures of the (superfluous) rocket graphics showing the temperature
vs. his redrawing, placing temperature on the x-axis are very convincing.

~~~
jasode
Boisjoly's rebuttal is that

1) Tufte had temperature datapoints for the previous launches that the
Boisjoly's team didn't know[1]. I assume "know" to mean the complete
historical data wasn't all consolidated conveniently at his fingertips
_before_ the disaster. Presumably, Boisjoly's team could have gotten it _if it
had occurred to them_ to gather it. Essentially, Tufte had the benefit of
"hindsight" to make his compelling diagram showing cause & effect.

2) GIGO garbage in is garbage out. The Tufte temperature datapoint assumes
that the outside ambient temperature is equal to the O-ring temperature[2] so
substituting one for the other is wrong. (E.g. It's wrong to put them both on
the same X-axis.)

Quotes excerpted from
[http://www.onlineethics.org/CMS/profpractice/exempindex/RB-i...](http://www.onlineethics.org/CMS/profpractice/exempindex/RB-
intro/RepMisrep.aspx):

[1] _" He thus supposes that they knew the temperatures at launch of all the
shuttles and, assuming they acted voluntarily, infers they were incompetent."_

[2] _" , in addition, mixes O-ring temperatures and ambient air temperature as
though the two were the same."_

~~~
m741
I thought the rebuttal was very well written, if a bit dense.

For (1), you're correct that the engineers did not have the historical data.
More than that, though, it did occur to them to request the data, but they
were stifled by Morton Thiokol Management (and NASA).

To start with, there were a variety of previous problems with the O-rings
caused by variables that appear unrelated to temperature. These problems were
resolved, but prevented anyone from seeing a pattern. It was only on the basis
of the single data point in SRM 15 that Boisjoly requested temperature data in
advance of the launch.

Obtaining such data was far from simple, because, as you mention, the
temperature of the O-ring isn't the same as the ambient air temperature. Thus,
obtaining the data was relatively involved and required knowing many
variables: time on the pad, the gradient of ambient temperature, the
temperature at which testing was conducted, and so forth. For this reason, the
engineers didn't compile the data themselves (unclear what process they'd need
to get the data).

The engineers thus requested the data in advance, but had not received it.
They had precise data on only two data points (at 53 degrees and 75 degrees),
so the rest of the data in the chart was compiled after the fact.

"The data necessary for a calculation of O-ring temperatures was thus not
collected all along during the shuttle history. And when Boisjoly asked for
that data in September, along with much other data, any one of which might
have been the crucial missing piece to explain the anomalous cause, it was not
supplied. In fact, the engineers received none of the data they requested."

------
madaxe_again
"Ach, damned engineers, what do they know about engineering!?" ~ Management,
upon reading the memo.

Actually, more realistically:

"Brinton, do you know what this Boisjoly guy is talking about?"

"No Lund, I don't - his attitude is disappointing, we need team players."

"Thanks Brinton - I'll bring it up with his manager for his next performance
review."

~~~
sjm-lbm
To be honest, I think you are - to an extent - falling into the same trap that
the managers fell into at Thiokol: namely, assuming that your view of the
situation is complete and that input from other work groups can be
disregarded.

In reality, it's a _very_ big problem for companies to disappoint their large
customers (as NASA certainly was for Morton-Thiokol at the time), and I
suspect that the engineers would have been quite annoyed at management if the
ultimate result of a launch no-go was fewer contracts, lower pay and/or job
losses.

This isn't to say that management did the right thing, of course, just that
"lol management just throws some buzzwords around and never attempts to
understand the problem" is basically the same attitude that caused this
disaster - just seen from an engineering viewpoint.

~~~
madaxe_again
I am "management" where I work, and I'd drive my company into the ground and
make everyone unemployed, before I allowed my defective product to kill
someone.

~~~
sebastos
Duh. You think a single person at NASA would have reported otherwise had you
asked them before the Challenger disaster?

Social pressure can influence people into engaging in wishful thinking. They
made a horrible judgment call, and we should remember that that tragedy was
authored by them. But if you think you NEVER would have fallen for it...
you're not being self aware. There's a pretty decent chance you would have.

~~~
madaxe_again
The trick is to not give two hoots about social pressure. Being on the
spectrum can be an asset.

------
samwiseg
As someone born in 1995, I don't have any real emotional connection to the
Challenger explosion. It's pretty bizarre to think that children born after
the 9/11 attacks will probably feel the same indifference.

~~~
mturmon
If you are an engineer, it would be good to try to learn from this and similar
accidents. I.e., noting your lack of emotional connection is beside the point,
the question is, what can you take away from what happened?

~~~
samwiseg
Of course, it's extremely important. I'm currently listening to the
Freakonomics podcast on the Challenger explosion[0]. I was only noting the
difference being born a few years apart can make when it comes to significant
shared cultural memories.

[0][http://freakonomics.com/2015/05/20/failure-is-your-
friend-a-...](http://freakonomics.com/2015/05/20/failure-is-your-friend-a-
freakonomics-radio-rebroadcast/)

~~~
btilly
You will notice this sort of thing more and more as time goes on.

For me the "ah hah" moment was when it hit me that "Where were you when you
heard about Challenger?" was my generation's version of, "Where were you when
JFK was assassinated?"

Another easily observed one is music. Very few people are aware of much that
happened in popular music after they hit 25.

~~~
gradi3nt
I'm turning 25 in two weeks. So this is it? 2015 was my last year of music?
Damn...it went by so fast!

~~~
vlehto
Quick, hurry.

You can listen through 65 years of pop music made so far. Then, if you live 80
years, you only lose about 45,8% of all pop-music you could have heard during
your lifetime. That's not half bad.

If you include some blues, jazz and classical, that percentage goes down
significantly.

Regards, 29 year old with 4 year long amnesia about music.

------
Pitarou
This letter is an excellent illustration of why good writing matters, and why
bad writing can be disastrous.

The most important sentence in the letter ("The result would be a catastrophe
of the highest order - loss of human life.") is at the end of the third
paragraph, and is effectively hidden by two paragraphs of dense, technical
jargon that I, as a layman, cannot understand at all.

I honestly think that if he had just taken that sentence and moved it to the
end of the first paragraph, 7 astronauts' lives could have been saved.

~~~
djcapelis
I strongly doubt it and I don't think you've worked in an engineering
environment if you think this is true.

This document is written as a credible engineering analysis of a distinct
problem and uses extremely strong terms. It isn't addressed to the general
public, or an article on Buzzfeed, it is an interdepartmental memo from an
engineer signed by his manager to the Vice President of Engineering.

Your example of "clear writing" would have just made this engineer look
hyperbolic and likely undermined the issue even more. In a document like this,
stating the physical problem up front _is_ clear writing, and then, once the
engineering problem is stated, he immediately states the possible outcome and
then describes what he says as the management failure to allocate the
appropriate resources, and how to fix it.

How it comes off to you as a layman doesn't dictate whether or not this is
good writing. He wasn't writing it to you.

~~~
Pitarou
I respectfully disagree. Safety is of the highest priority, so if there's a
serious risk to human life, there's nothing hyperbolic about drawing attention
to it. If you can't do that, there's something very wrong with your
engineering culture.

To help the busy reader, the first paragraph of a memo should summarize the
whole document. (Like the abstract of a technical paper.) This document is
about an engineering problem _and_ it's serious consequences, so they should
both be mentioned in the first paragraph.

~~~
djcapelis
I think it's extremely misguided to assume that the problem here was a matter
of writing style. I think it's naive to assume that the people who received
this memo weren't aware that it was bringing attention to an engineering issue
that could lead to loss of human life. I think it's misguided and not
supported by the evidence to assume that was the gap in understanding that
lead to the problem. They're working on rockets. People working on rockets
know what the stakes are. These are the risks engineering projects like this
are structured around dealing with.

And I think it's pretty disrespectful to the engineer who is still haunted by
this to say that what he really should have done was switch some sentences
around and that would have totally solved the problem.

This is a strongly worded document.

Making safety a priority doesn't mean starting every engineering document with
the words "loss of life" which is a really common outcome of engineering
failures on programs like this. Making safety a priority means putting the
risk of an engineering failure up front and knowing that an engineering
failure in a life-critical system is critical. Making safety a priority means
people don't have to tell you what the stakes are every single sentence,
because everyone already knows and so what you really communicate is how much
risk there is, not the fact that risk exists. Making safety a priority means
even if you're working on an engineering problem that wouldn't lead to loss of
life, you fix the thing because you might be wrong and it might be part of a
correlated failure one day that does lead to loss of life. Making safety a
priority doesn't involve writing engineering documents in a way that makes
them more amenable to skimming.

It's a rocket. Engineering doesn't have to go that far wrong on rockets for
people to die. When someone sends a letter to a VP of engineering which begins
with "This letter is written to insure that management is fully aware of the
seriousness" then everyone who receives that memo is paying attention and if
they aren't then it's not the writing skills of the people involved that are
at fault.

I don't think a single person who was aware of the O-ring issue was unaware of
the stakes. That didn't show up in any reports on the panel. What did show up
was they estimated the risk of the problem wrong. The first sentence of this
engineer's letter went towards establishing the seriousness of the engineering
failure. Because that was the part that needed to be communicated most
clearly.

~~~
Pitarou
Okay. You've convinced me. Thanks for taking the time to explain your thoughts
on this. :-)

------
jldugger
Engineering degrees universally require a technical writing course. Engineers
universally revile it as busy work. This memo is Exhibit A for why written
communication is critical for engineering, and I imagine an assignment to
rewrite this memo to be more effective would be a good way to underscore the
importance of clear and concise writing.

~~~
brlewis
I upvoted this comment because the point about the importance of writing.
However, I don't think conciseness and clarity was the problem here. In 1985
if an engineer personally composed a long letter you should automatically know
it's important. If you read this letter at all the danger is made clear.

~~~
banku_brougham
The call to action was on the second page though - definitely try to keep it
to one page in corporate communications class.

------
sixdimensional
For those interested, social scientists often refer to what happened with the
Challenger as a "normal accident".

[https://en.wikipedia.org/wiki/Normal_Accidents](https://en.wikipedia.org/wiki/Normal_Accidents)

------
hackuser
What stands out to me the impact of loyalty on corruption. Despite doing
obviously the right thing: _Boisjoly later revealed this memo to the
presidential commission investigating the disaster and was then forced to
leave Morton Thioklol after been shunned by disgruntled colleagues._

Elsewhere in the discussion[1], we learn that key information about the
O-rings had to be carefully, anonymously disclosed to protect the jobs of
several people, including a prominent astronaut.

How do we avoid corruption when loyalty uber alles is the rule of almost all
organizations?

[1]
[https://news.ycombinator.com/item?id=10992224](https://news.ycombinator.com/item?id=10992224)

~~~
vlehto
For small enough organization that "loyalty uber alles" prevents corruption.
What is small enough? For a military company it's around Dunbar number. Squad
that actually works together on single issue is often optimal around 8
individuals. Company, squad and brigade are most important formations, as they
have dis-proportionally high number of things expected to handle
independently.

For this NASA o-ring bullshit, simplest solution seems to be having the
technical managers and the astronauts in the same in-group. Now your loyalty
is about not getting your mates killed.

If you need a big organization, the trick is to have that internal network of
squads and companies to work somehow non-corrupt and nice manner. This is
where it gets hairy. Basically you can go with assumption of corruption and
apply "transparency" or monetary incentives (sub contractors). Or you can
assume that corruption does not happen. Sometimes the mere assumption that
corruption does not happen stifles it. Practically for big companies failing
less than your competitors is adequate for success.

~~~
hackuser
Some interesting ideas; thanks. I don't quite grasp a couple of them:

> For small enough organization that "loyalty uber alles" prevents corruption.

How? There are many, many cases of people covering up for their fellow squad
members.

> Or you can assume that corruption does not happen. Sometimes the mere
> assumption that corruption does not happen stifles it.

Hmmm ... that seems like a well-known recipe for encouraging corruption. How
would that approach stifle it?

~~~
vlehto
>How?

It's unlikely that one would be corrupt against ones squad members. So as
squad member, you only have to worry about corruption of outside shit. People
who affect you and are not member of same in-group are the problem from any
individuals point of view.

>How would that approach stifle it?

Most people are good by nature. The only instance I've stolen from work was a
situation where it was expected that I might steal from work. So I kind of
showed to them that I'm more cunning than they are careful. A challenge.
Another point comes from self image, if everybody sees you as corrupt asshole,
you see yourself as corrupt asshole. So you might as well act on it.

In general people have strong tendency to act as is expected of them. Stronger
than the tendency to act as they are told.

------
DanielBMarkham
This was a terrible thing. So many saw this coming and nobody listened to
them.

In my mind, what made this tragedy even worse was the way the program itself
was conducted. You learn new modes of transportation and hardware by using it,
many times to the point of exhaustion. We should have built a dozen orbiters
and flown the shit out of them through hell or high water, learning as we went
along. Instead we built 5 and every time something happened we backed farther
away from the entire manned space program.

A lot of time was wasted, and the lessons we didn't learn? Somebody else is
still going to have to learn them someday.

~~~
mikeash
I think it's emblematic of the whole Shuttle program to look at the flight
test program. There basically wasn't one. In particular, look at the various
abort scenarios, and then look at the abort testing. Don't look too hard for
the abort testing, because they didn't do any. There was pretty much zero
resiliency in the system, and they knew it.

------
thescriptkiddie
Here's an interesting thing about the Challenger (and Columbia) disaster: We
find it to be particularly devastating, the same as a major natural disaster
or terrorist attack. But on paper, they don't even compare. The explosion of
the Challenger killed 7 people and cost NASA around $40 billion. The
destruction of the world trade center killed over 2600 and the cost to the
private insurers alone was over $40 billion. Hurricane Katrina killed over
1800 and cost at least $100 billion. Not even the same ballpark. So why the
big emotional impact? Because they stood for something important.

------
DickingAround
Today, there's no way I could get away with writing an entire page of context
before getting to the impact. I wonder how much context the recipient had and
what reading expectations were like back then...

~~~
dtparr
Leaving aside the fact that the subject includes 'Potential Failure
Criticality', the key impact line "The result would be a catastrophe of the
highest order - loss of human life" occurs less than 150 words into this. It's
the 5th sentence (though they're admittedly long sentences).

Are you having serious engineering-related discussions with your management
where they can't read 150 words before getting to the hook? (serious question,
not snark)

Is everyone communicating via twitter or something? (sorry, that was a bit of
snark)

~~~
thrownaway2424
I disagree. This is a fairly poorly-written persuasive letter, and we have
ample evidence for its lack of persuasive powers. The letter opens with
useless content "This letter is written ..." which is self-evident. Then it
proceeds to the grammatical error of "insure" which almost made me stop
reading. If I was writing a letter like this, and I wanted it to have some
effect, this would be the first sentence:

The total loss of a future shuttle mission and the death of its crew is a near
certainty with our current booster o-ring design.

Then I would omit everything else in the original letter except the request
for staffing.

~~~
delazeur
While I agree with you about the difference between "insure" and "ensure,"
that's not a universally accepted grammar rule. I think the may even be in the
minority; _The New York Times_ , for example, uses "insure" for all instances.

~~~
thrownaway2424
It does? I have a 5th Ed. New York Times Manual of Style and Usage right here
and it clearly distinguishes between the two. Under the entries for both
ensure (p 120) and insure (p 170) it gives examples of the other.

~~~
delazeur
Well, I guess that's what I get for believing my first Google hit for "insure
vs. ensure." Nonetheless, I see a lot of highly literate people using "insure"
for all cases.

------
ergothus
It is perhaps sad, but I'm heartened by this letter. I had heard that warnings
were given in advance, but this is clear (in fact, I find the "catastrophe of
the highest order" language to be less impressive than the clear and defined
"loss of human life").

Why am I heartened to see that someone foresaw a lethal accident and was
ignored? Because it was foreseen, and the consequences understood, with
clarity. Getting people to take clear warnings seriously seems a more easily
solved problem than getting us to be able to recognize the danger in the first
place. A single person in management, or a few people, being dense is fixable.
Groupthink where everyone assumes someone else is checking for problems and no
one does is harder to fix.

Now, just because it's an easier problem doesn't make it an EASY problem, but
still, easier.

------
sopooneo
I bet there were other such warnings, many of them valid, that just didn't
turn out to be catastrophic.

If there are any high level managers here, please tell me, does it seem like
your people are constantly alerting you of dire risks? Does it feel like you
are inundated with worriers?

------
js2
An piece NPR did on Boisjoly after he passed away:

[http://www.npr.org/sections/thetwo-
way/2012/02/06/146490064/...](http://www.npr.org/sections/thetwo-
way/2012/02/06/146490064/remembering-roger-boisjoly-he-tried-to-stop-shuttle-
challenger-launch)

And an interview with one of Boisjoly's colleauges:

[http://www.npr.org/sections/thetwo-
way/2016/01/28/464744781/...](http://www.npr.org/sections/thetwo-
way/2016/01/28/464744781/30-years-after-disaster-challenger-engineer-still-
blames-himself)

------
mirceal
In hindsight things are obvious.

I think the way to look at it is from the point of view that people have
before the event happens. You also weigh in all the warnings that you are
receiving.

No matter what you build, if it's complex enough you're always going to have
individual predicting doom. The challenging part is filtering the signal from
the noise and owning the decisions you make.

------
phkahler
Isn't something missing? I was under the impression that the accident happened
due to the cold. Nowhere in the memo does it say anything about how to avoid
the problem, nor does it call for a full stop of launches until its fixed. It
seems to be referring to a known issue without explaining what that is or what
to do - specifically. Or did I miss that?

~~~
JshWright
Simplifying a bit...

They knew the o-rings were being eroded away during the flight, but the
secondary o-ring was 'squishy' enough to fill in the gap and prevent the
erosion from actually destroying the vehicle. While the erosion was
unexpected, they figured the 'backup' was doing its job, and actually ended up
increasing the predicted safety margin (i.e. it's only working 1/5th of the
way through, so we have a 500% safety margin, yay! despite the fact that any
erosion was unexpected in first place)

The problem was, the cold made the secondary o-ring stiff enough that it
didn't 'squish' as much as it had in previous launches, so the o-ring failed
completely.

------
stepanhruda
Here is Richard Feynman talking about the issues
[https://www.youtube.com/watch?v=4kpDg7MjHps](https://www.youtube.com/watch?v=4kpDg7MjHps)

Seems not very different from how companies are managed 30 years later.

------
bluedino
Imagine being on the team that discovered and warned about this issue, and
then a year later watching the replays of the shuttle exploding on television.
Just knowing that's what had happened and your worst fears at come true.

------
mmaunder
My parents in law worked with Greg Jarvis at Hughes who was on that mission.
They had a touching memorial at Hughes for his wife according to my mom in
law. Very sad that it was preventable.

------
metaprinter
National Geographic has an awesome video called "Challenger, The Untold Story"
which highlights the communications breakdowns. google it

------
jrcii
Can you imagine this poor man's reaction as he watched the launch?

------
draw_down
I think many of us here have been Boisjoly at some point, though with much
lower stakes. This is how people are, they want to do the thing. Doesn't
matter how much you warn them.

------
jsprogrammer
The transcript is incorrect. It uses "Manager" under Kapp's signature, instead
of the original "Manger".

------
viach
If you don't stop the CO2 emission, the result would be a catastrophe.

Lets see how it will turn out now.

Edit: And yes, it is sarcasm, nothing changed or is going to change.

------
shams93
I doubt anyone in the reagan admin could even understand this note its highly
technical it was likely tossed out without ever being read.

~~~
crystalmeph
This memo was not sent to the administration, it was sent to the management of
Morton Thiokol, who could reasonably be expected to understand basic
engineering phrases like "loss of human life."

It's a real PITA to engineer safe systems (see e.g. IEC 61508 and ISO 13849 in
industrial automation), but it saves lives, and if you're in a position where
Sales, Production, etc. is trying to get you to rush the Engineering work so
they can make deadline, you've got to find the backbone to say "no," even if
it hurts you professionally.

~~~
madaxe_again
You'd think so, but they probably saw "loss of human life" and went
"hysterical. ignore.".

It's actually really hard to warn people. Warn them loudly and strongly and
they think you're scaremongering and being unnecessarily negative. Warn them
quietly, and nobody listens. Warn them just right, and they'll take it under
consideration, but go ahead anyway.

When of course the inevitable happens, as the guy who predicted doom, you'll
be blamed, because "it wouldn't have happened if you hadn't made a self
fulfilling prophecy".

