
SpaceX CRS-7 Failure Investigation Teleconference Thread - ivank
https://www.reddit.com/r/spacex/comments/3dyvta/rspacex_crs7_failure_investigation_teleconference/
======
beltex
_“Now before every flight I always send out an email company-wide saying, ‘If
anyone can think of any possible reason to hold off on launching, they should
call me immediately on my cell phone or send me an email, whether their
manager agrees with it or not.’ But thats sort of something I’ve always sent
before every flight, but the 20th time I send that email it just seems like…
you know, ’There’s Elon being paranoid again’, so maybe it doesn’t resonate
with the same force. But I think now everyone at the company appreciates just
how difficult it is to get rockets to orbit successfully and I think we’ll be
stronger for it.”_

Elon at 36:15

[http://nasawatch.com/archives/2015/07/spacex-
releases.html](http://nasawatch.com/archives/2015/07/spacex-releases.html)

~~~
justin66
> ‘If anyone can think of any possible reason to hold off on launching, they
> should call me immediately on my cell phone or send me an email, whether
> their manager agrees with it or not.’

Two thoughts: if you need to send a message like that, the culture of your
organization needs some work. Also, does he really think a person can send a
message like that - when their manager does not agree - without torpedoing
their career?

~~~
wiredfool
That's probably a direct reaction to some of the NASA issues that have
happened in the past. (Most famously the Challenger disaster)

~~~
justin66
Yeah. It is also the sort of thing NASA actually has in place - "anyone can
escalate a safety issue!"

It is worth studying how that kind of thing has NOT fixed the issues as NASA,
and how it did not prevent us from losing Columbia. You either have a culture
that puts safety, engineering and problem solving ahead of politics and
expediency in all cases, or you don't. There are absolutely not any shortcuts
to that.

~~~
lmm
I don't think it's intended as a short cut, just as one component of the right
culture.

------
Osmium
So I know very little about rockets, but here's my simplified understanding:

The liquid oxygen tank provides the liquid oxygen which fuels the rocket. They
use helium to keep the tank pressurised as the liquid oxygen gets used up (why
helium? it's light and unreactive). The helium is stored in its own tanks,
which are secured by struts like in this picture:
[http://i.szoter.com/741dc2bcf5762a48.jpg](http://i.szoter.com/741dc2bcf5762a48.jpg)
(via reddit). The strut failed, causing a helium leak and a complicated series
of events (including the helium tank bouncing around the liquid oxygen tank?),
that basically resulted in the liquid oxygen tank being over-pressurised and
exploding.

Bonus (very cool!) gif of the liquid oxygen tank during a previous launch
(unfortunately cameras apparently weren't included in this launch):
[https://i.imgur.com/WRp2ujX.gif](https://i.imgur.com/WRp2ujX.gif) (also via
reddit)

Some extra cool things:

(1) They narrowed down the source of the failure using "acoustic
triangulation", which, I think, is essentially using sound sensors
(accelerometers?) located at various locations to pinpoint the location of the
failure in 3D space.

(2) The Dragon capsule could have been saved if it'd had the right software
(which would deploy the parachutes). They'd already planned to do this, and
will now have a software update for the next launch. Why hadn't they done this
already? Because if the parachutes deploy accidentally, it could result in
launch failure, so it's something they have to be careful about. But the
capsule survived the explosion and they remained in contact with it until it
was below the horizon.

~~~
baggers
As final note the 'big explosion' was the air force blowing up the falcon9.
This is standard procedure to stop the rocket doing damage in further
locations and stopping a much of the fuel hitting the ground as possible (it's
apparently nasty stuff).

~~~
cloudwalking
For pedantry's sake: Falcon 9 blew itself up once an anomaly was detected. The
Air Force kill command was not sent until dozens of minutes after the vehicle
disintegrated.

~~~
MertsA
The rocket can initiate range safety itself like that?? What happens if a
pressure sensor in one of the tanks fails a second after liftoff?

------
mholt
It's amazing what profound effects the culture of a company can have on its
results, and how that changes over time.

> He said the early team had "an extreme level of paranoia" because of the
> difficulty of learning how to design and launch rockets. But now, “the vast
> amount of people at the company today have only ever seen success… when
> you’ve only seen success, you don’t fear failure quite as much."

> Musk said the night before each launch, he sends a company-wide email asking
> employees to send him an email or call his cell phone if they have any
> reason to believe the rocket should not launch.

I think it's good such realities are being realized, then handled in a gentle
(but firm) way.

Source: [http://mashable.com/2015/07/20/spacex-launch-failure-
strut/](http://mashable.com/2015/07/20/spacex-launch-failure-strut/)

------
rebootthesystem
I've done a lot of manufacturing with all sorts of metals. I am not sure one
can blame a vendor for a grain structure problem. This is a testing failure. A
failure to identify parts that don't pass tests.

The problem might very well be that the very tests a part must pass will
weaken the part to the point that it is not usable or less reliable. In other
words, some tests are destructive.

Here's on of many interesting pages that came up by searching for "how to test
aluminum for grain structure".

[https://www.google.com/webhp?sourceid=chrome-
instant&ion=1&e...](https://www.google.com/webhp?sourceid=chrome-
instant&ion=1&espv=2&ie=UTF-8#q=how%20to%20test%20aluminum%20for%20grain%20structure)

I guess I am saying I hope a vendor isn't blamed for this when the reality of
the matter might very well be that testing to 100% certainty is impossible.

~~~
Dylan16807
But there's so much margin for error on these specific parts. Even if the test
only ensures 60% strength, and weakens the part by 50%, that's still strong
enough.

A part being mildly out of spec might not be a vendor problem. A part being
20% strength is absolutely a vendor problem.

~~~
rebootthesystem
The real question rattling around in my head is:

Why was this designed such that the failure of A SINGLE STRUT would be
catastrophic?

I don't like to blame a vendor for what should be an engineering problem,
whether this means design or testing.

The problem here --assuming it is as described-- is that someone designed a
system with the assumption that none of these struts would fail. And,
furthermore, executing on a design where the failure of ONE strut could cause
a disaster.

Anyhow, that's what it looks like to me given what's been released.

I could be totally wrong.

~~~
lutorm
_Why was this designed such that the failure of A SINGLE XXX would be
catastrophic?_

There are plenty of things in a rocket that can fail that would take the
entire vehicle with it. Structural mechanics is pretty well understood and
loads are well predictable, so it seems perfectly reasonable to me to design
with the assumption that a part with a 10x safety factor will not fail.

If you didn't, your rocket would never get to orbit anyway.

------
obituary_latte
Kind of sad to see the reddit group /r/spacex wasn't afforded access to the
conference. I think they are probably one of the best moderated and one of the
most intelligent groups of fans spacex has.

~~~
joezydeco
If /r/spacex wants to pass the hat and buy a flight, maybe they can get
priority.

Being a fan and all is admirable, but I don't think it's "sad" that the
Internet wasn't given a seat on the call. SpaceX has a business to run.

~~~
dlgeek
As opposed to the small local/regional newspapers who were represented? This
was a media call, not a customer one.

------
hcrisp
Acoustic triangulation of accelerometers in upper stage helped pinpoint the
strut, and using only 0.893 seconds of data (unless the accels were in the
part that kept transmitting). That means the streaming sample rate must be
quite high!

~~~
hamiltonkibbe
assumong 330m/s for the speed of sound, you can get 1 inch of precision at a
sample rate of just under 13kHz. With 3 channels of 12-bit accelerometer data
that's ~ 58kBps with no compression which doesn't seem too out there. Of
course you can get a pretty close approximation at much lower sample rates
(/2, /4, /8, /16) using band-limited interpolation

~~~
tsotha
Aren't we talking about the speed of sound in helium, though? That should be
faster.

~~~
baq
helium is 972 m/s, aluminum 5100 m/s, steel 6100 m/s. depending on what they
actually measure, correct triangulation sounds like a non-trivial problem;
good job spacex for solving that. (unless your cad software does that for you
:))

~~~
TeMPOraL
> _(unless your cad software does that for you :))_

As far as I can tell, they write quite a lot of their CA* software ;).

------
ndonnellan
snippet summary: prelim failure cause might have been strut holding up helium
tank inside of oxygen tank failed well-below rated stress causing tank to
release helium into oxygen tank > failure.

snippet:

"Preliminary conclusion is that a COPV (helium container) strut in the CRS-7
second stage failed at 3.2Gs. A lot of data was analysed, it took only 0.893
seconds between first sign of trouble and end of data. Preliminary failure
arose from a strut in the second stage liquid oxygen tanks that was holding
down one composite helium bottle used to pressurize the stage. High pressure
helium bottles are pressurized at 5500 psi, stored inside in LOX tank. Several
helium bottles in upper stage. At ~3.2 g, one of those struts snapped and
broke free inside the tank. Buoyancy increases in accordance with G-load.
Released lots of helium into LOX tank. Data shows a drop in the helium
pressure, then a rise in the helium pressure system. Quite confusing. As
helium bottle broke free and pinched off manifold, restored the pressure but
released enough helium to cause the LOX tank to fail. It was a really odd
failure mode."

~~~
xenadu02
Indeed, it appears that they had to test thousands of struts. Most passed but
they found one that failed. Microscopic examination showed bad grain
structure. They're made by a vendor so it would appear that vendor has a
quality control problem.

The delays will be in setting up a QA process to test each individual strut
part, as well as eventually testing all vendor-supplied parts. If anyone can
do it, SpaceX can. No reason they can't design robot systems to test every
part.

~~~
thaumaturgy
I worked for a while for a vendor that produced some critical components for
... let's say major entities. NASA, US military, Boeing, and so on. I was in
electrical QA, next-to-last step before shipping. My job was to electrically
scrutinize individual pieces (in some cases) or random samples from a batch
(depending on how much the customer was paying) and compare their output to
customer's spec.

If your stuff is mission critical, if anybody could potentially die, you
really can't trust the parts from these vendors. A few of their smarter
customers would repeat all of our tests and send back defective parts. The
thing was, we had a lot of borderline stuff come through from bad production
(poorly paid or poorly trained staff or defective tools or materials), and
once some of these parts hit QA, they had a _lot_ of expense sunk into them.
The company didn't want to eat that cost, so there were a lot of arguments
between myself and the general manager. I failed a lot of stuff that previous
people in my position had let slide.

They also had a really stupid hockey-stick output graph each month. The
beginning of the month was slow, we were all cleaning our work areas and
retesting our test equipment, and then the last week of the month they'd try
to produce 90% of their expected output for the month. Because of my
reputation for rejecting stuff, he'd hover over my work area for the last day
or two each month.

Given the size of the company I worked for, I have to assume this is not
uncommon practice.

It was a heck of an experience, I finally got a better understanding for why
so many things seem to break all the time.

~~~
HCIdivision17
I'm bookmarking your comment. It is just the slice-of-life that I want to show
newbie engineers. Like some people say you need to spend a year or two in the
service industry to learn empathy, I feel engineers likewise need to spend
time in QA to learn what their ethics _really_ are. QA is _hard_ , and the
pressure to pass is difficult to withstand; eventually you take the "my boss
told me to do it" attitude or you learn to make ... Well, not _enemies_ , but
certainly rock the boat.

Kudos to you!

~~~
thaumaturgy
> _I feel engineers likewise need to spend time in QA to learn what their
> ethics really are. QA is hard, and the pressure to pass is difficult to
> withstand_

You really nailed it. The GM's position -- and he said this more than a few
times -- was that the parts were designed with extra tolerances already, so if
they were a little below spec it was OK.

Engineers have to keep that in mind when designing products: production knows
there's a margin for error and they'll take that into consideration when
deciding whether or not they can get away with shipping something.

(And the GM was a pretty OK guy, we got along fine otherwise. He in turn was
just under a lot of pressure from further up the ladder to meet certain
production goals.)

------
lutorm
The official statement is at
[http://www.spacex.com/news/2015/07/20/crs-7-investigation-
up...](http://www.spacex.com/news/2015/07/20/crs-7-investigation-update)

------
tempestn
I wonder if they'll be looking to recoup some losses from the strut
manufacturer. Would be an interesting case. They can prove that the struts can
fail well below certification, which should be worth something, but unless
they can prove that this particular failure was due to the strut, or at least
more likely than not, it would probably be difficult. Also might be unwise
from a business perspective as it could make other (potential) suppliers
nervous.

------
calinet6
Sometime in the future, in a book...

"And thus it was learnt that on the twentieth repetition, company-wide emails
cease their function, having been seen and imprinted many times, as it was
with the boy who cried wolf. But with the experience of failure comes wisdom
and strength for the future of the civilization who wishes to become
spacefaring." Elon 36:15

~~~
ForHackernews
Jesus Christ, I know you're "joking", but can y'all just drop the pretense and
start a religion dedicated to worshiping Musk already?

~~~
jamestanderson
No quotes necessary, that was a joke.

~~~
ForHackernews
[http://www.urbandictionary.com/define.php?term=kidding%20on%...](http://www.urbandictionary.com/define.php?term=kidding%20on%20the%20square)

The way half the people on this site worship Musk is freaking creepy. Just let
it go, Daddy Elon doesn't love you.

