

One of the SpaceX engines came apart during launch - Anon84
http://arstechnica.com/science/2012/10/that-smooth-spacex-launch-turns-out-one-of-the-engines-exploded/

======
stcredzero
_> We know the engine did not explode, because we continued to receive data
from it._

Uh, they say the engine did not explode. The failed engine shut down and
vented gasses ruptured the engine fairing. How about someone change the
inaccurate headline? _"That smooth SpaceX launch? Turns out one of the engines
exploded"_

~~~
anovikov
I don't know how do you come to conclusion that there was any kind of
mechanical damage anywhere. There was most certainly none. The engine simply
shut down, most probably because one of the many possible abort criteria
happened, e.g. some of the controlled variables of engine, like pressure,
flow, temperature of fluids/gases in many places, or turbopump RPM, etc.
exited the permitted corridor of safety. It can be due to something as simple
as a faulty sensor indicating a false reading, or the condition being too
strict (both happened to SpaceX before). Then the engine was shut down, and
the lower temp plume of the engine which being shut down appeared to you like
'smoke of explosion' on video.

I am in fact near certain that the engine remained mechanically fine after
shutdown and nothing else exploded or was broken. They will figure this out
for the next flight.

~~~
anovikov
Yeah now i see from the slow-mo that the aerodynamic shell got ruptured, but
it's no big deal, it's obviously not designed to handle any loads from INSIDE.

------
ChuckMcM
Looking forward to the SpaceX update.

One of the things that struck me is that trying to watch the control room at
the same time as this anomaly I have yet to pick out anyone who 'flinched' or
made any sort of noted move. That has left me wondering if they knew when this
happened that it happened. I have to believe they did.

I remember watching the faces of the people in the control room when they did
TV shots of the control room of NASA and noting that there was always someone
who knew that things weren't going to plan, their face betrayed that
knowledge.

That said, it looks like their primary cargo was fine, but they ended up
putting their secondary cargo into a 'backup' orbit.

~~~
trafficlight
I suppose they saw changes in the numbers, but they were still hitting their
marks, so they didn't worry too much. I watched the livestream, and honestly,
I didn't even know something bad had happened. Maybe they weren't glued to the
video but rather the numbers on their screens?

Although, historically it seems that NASA control teams pride themselves on
their ability to stay levelheaded during bad situations. Are there a lot of
ex-NASA employees at SpaceX?

~~~
ChuckMcM
Its a lot of data to comprehend. Doing operations on a large cluster
architecture has a similar issue which is that there are many things that can
go wrong and are 'ok' because the system is designed to deal with it
seamlessly. But I'm a big fan of tools that let you have a red/yellow/green
indication of nominal/warn/fail for each sub system. Looking at such a display
you can comprehend the entire system is 'ok' (or not).

~~~
caw
I manage a lot of complex data where many things could result an undesired
state. Traffic lights don't always illustrate these conditions well enough, or
let you diagnose the problem quickly.

Take the somewhat simple example of rebuilding a drive array.

All drives = green;

Within RAID tolerance = yellow

Degraded = red

What about the status of the hotspare? Is yellow rebuilding? What if there's
no more hot spares, but you're within RAID tolerance?

I'm not knocking traffic lights (sometimes they're good), but it's worth
spending time trying to figure out your data displays to give you the whole
picture. It takes a while, but makes monitoring so much easier.

~~~
ChuckMcM
Actually large format data displays are great for calling out unexpected
transitions. I had a chance to look at the control panel of a nuclear reactor
for an aircraft carrier once, and what I learned was that the panel had
annunciator lights that would blink on an 'unexpected' change. So for example
steam pressure would by static in one of two states (cold shutdown, operation)
and smoothly changing when state changing, so the light would start blinking
if either the steam pressure changed when the reactor wasn't changing state,
or if it stopped changing in the correct way when the reactor _was_ changing
state.

Looking across a sea of lights, if you saw one that was blinking you would go
over to it, read off the legend (that would tell you the conditions under
which that light would blink) and then take action. Once action was in process
you pushed the button to 'acknowledge' the anomaly. (which also set new
parameters for when it would start blinking again)

All in all a very cool system. I've been sorely tempted to hack something
similar for our search and crawling clusters. Sadly I don't think its
practical to have a mockup of the Enterprise-D warp core which pulses in
response to query rate, although that would be very very cool :-).

------
vsearch
There was no explosion: SpaceX CRS-1 Mission Update: October 8, 2012
[http://www.spaceref.com/news/viewpr.html?&pid=38825](http://www.spaceref.com/news/viewpr.html?&pid=38825)

From SpaceX "Approximately one minute and 19 seconds into last night's launch,
the Falcon 9 rocket detected an anomaly on one first stage engine. Initial
data suggests that one of the rocket's nine Merlin engines, Engine 1, lost
pressure suddenly and an engine shutdown command was issued immediately. We
know the engine did not explode, because we continued to receive data from it.
Our review indicates that the fairing that protects the engine from
aerodynamic loads ruptured due to the engine pressure release, and that none
of Falcon 9's other eight engines were impacted by this event."

------
glimcat
Maybe it's the engineer in me, maybe it's the astronomy geek who watched two
shuttles explode.

But I am way more impressed with "engine failed, still got to orbit safely"
than I was with the already titanic feat of making it to orbit in the first
place.

------
ynniv
When pushing the envelope, anything better than catastrophic failure is
success. That an engine exploded and both primary and secondary missions were
still completed is fantastic.

~~~
cowsaysoink
Rocketships are insane, there has to be many many fail safes because failures
happen all the time. I remember a NASA engineer giving a talk on engineering
safety when I was going to school where he said that the probability of
catastrophic failure in NASA's launches is 1 in 100 and that is as low as they
are able to make it at this point in time.

~~~
jmharvey
Closer to 2 in 100: 2 shuttle losses in 135 flights, plus 1 catastrophic
failure out of 32 missions in the pre-shuttle era.

------
Cushman
This is great for SpaceX. A more-or-less anticipated event that was handled
perfectly by automatic systems is being reported as a potentially catastrophic
failure. They get all the great press for proving the safety of their design
without having to go through the stress of an _actual_ potentially
catastrophic failure.

~~~
jerrya
Fantastic news!

------
spdy
Those engineers who designed this system can be proud. The worst possible
problem occurred and the backup plans worked out perfectly.

~~~
riffraff
they most certainly should, but I'd argue only "The worst possible _planned_
problem occurred" :)

~~~
salem
There is a difference between planned and anticipated.

------
InclinedPlane
Some important points on this subject.

First, the 1st stage has redundancy against engine failures however the 2nd
stage (which uses largely the same engine as in the 1st stage) has only one
engine. So if the per-engine failure rate is too high that could spell bad
news for overall vehicle reliability even if the launcher can survive 1st
stage engine failures remarkably well. Some reasons to be optimistic: 2nd
stage engines have much lower aerodynamic loading and don't have to operate
through "max-Q" as the 1st stage engines do (which was incidentally the point
in time that the engine failure on this flight happened).

Second, the Falcon 9's 1st stage engines are arranged in a 3x3 grid which
seems to result in some unfavorable aerodynamic forces on the engines on the
corners. It's possible that this contributed to the engine failure (which
occurred in a corner engine) and it's also possible that it contributed to the
destruction of the engine fairing after it was shut-down.

Third, the particular engine in use on this vehicle (the Merlin-1C) will only
be used on one more flight before being replaced (in the Falcon 9 v1.1) with
substantially redesigned Merlin-1D engines (in both the 1st and 2nd stages).
Additionally, the engine arrangement on the first stage will change to be
octagonal, radially symmetric instead of a grid.

It's good to know the systems and structures to protect against 1st stage
engine failure work well, however a lot of the reliability analysis up to this
point is somewhat obsoleted by the imminent change in design. I suspect that
the engine layout and upgrade will lead to greater overall reliability, but it
will take several flights to prove that.

Anyway, some things to chew on.

~~~
cpeterso
> _the Falcon 9's 1st stage engines are arranged in a 3x3 grid which seems to
> result in some unfavorable aerodynamic forces on the engines on the
> corners._

Why are the engines arranged in a 3x3 grid instead of something symmetrical
like a circle?

~~~
archon
Uninformed guess: They didn't have enough space for anything fancier than a
grid pattern. Perhaps the new engines have a smaller footprint.

~~~
hga
I've read one of the changes in the new engines will be their using their own
turbopumps. It's easy to imagine how their previously sourced ones wouldn't
allow an internal layout that was optimal for the external layout of the
engines.

------
MikeCodeAwesome
As the article surmised, one of the engines did indeed fail and the craft
corrected for the failure.

" _Falcon 9 detected an anomaly on one of the nine engines and shut it down.
As designed, the flight computer then recomputed a new ascent profile in
realtime to reach the target orbit…_ "

[http://www.parabolicarc.com/2012/10/07/falcon-9-suffers-
engi...](http://www.parabolicarc.com/2012/10/07/falcon-9-suffers-engine-
anomoly/)

------
zhaphod
I think it is important to read the latest press release from SpaceX where
they clearly state that the engine did not explode.

<http://www.spacex.com/press.php?page=20121008>

I really wish that we had a perfect launch but that's not the reality. I don't
think, technically, this hurts SpaceX. But once the perceptions are formed it
is hard to change them even if you throw mountains of data/facts at them. Case
in point the death panel buzz word that was used against Obamacare. I really
wish this anomaly doesn't harm SpaceX.

------
matt2000
Am I correct in thinking that previous rocket designs don't have any
redundancy built in? This seems like a big improvement in reliability but I'm
not super familiar with other rocket designs.

~~~
lutusp
> Am I correct in thinking that previous rocket designs don't have any
> redundancy built in?

The answer depends on specifics. The NASA Space Shuttle was able to reach
orbit after the failure of one of its three engines, but only if the payload
and/or altitude weren't near their range extremes. In other circumstances, the
timing of the failure might determine whether the mission could proceed.

When I worked on the Shuttle, one design guideline was that no single-point
failure modes should be allowed if it was possible to avoid them. Obviously
this guideline was frequently not met. A one-word summary describing the
avoidance of single-point failure modes is "redundancy".

------
codex
I didn't know this before, but SpaceX has only done eight launches so far--
half of them test flights. Perhaps this explains the anomaly.

~~~
natep
I doubt that. If a software startup designed and built a distributed system
that would automatically detect and recover from hardware faults (see: every
'design for failure' post ever), would you say that the hardware faults were
due to inexperience or incompetence? Building redundancy into the system was a
conscious decision, and they weren't "lucky" to have flown through this
anomaly.

~~~
codex
You misundersand me. Failures are not due to inexperience. High failure rates
are. If a single failure occurs in one out of eight launches, a double will
occur once every sixty four. While I believe the Falcon is double fault
tolerant, I don't believe for a minute that this is the first failure seen in
a SpaceX launch. If they've seen only one other failure in their eight
launches, their overall failure rate (resulting in loss of rocket, possibly
cargo and crew) would be one in 64. That is nearly the failure rate of the
Shuttle. With more experience, they may be able to lengthen their mean time to
loss (MTTL) by improving the failure rate.

~~~
natep
First of all, this launch was not a failure, full stop.

I don't know what your background in statistics is, but I'm impressed that
you're able to deduce the details of a such a complicated, stochastic process,
from only 8 observations, and are willing to extrapolate your predictions for
8 times as many more.

And for someone who loves to comment negatively on SpaceX/Tesla posts, maybe
you could spend 5 minutes looking at their Wikipedia pages and see that yes,
there have been failures (i.e. unable to achieve stated mission goals and
sometimes destroying payloads).

~~~
damoncali
The mission wasn't a failure, but a major component failed in a way that is
very concerning. There is _a lot_ of work to be done before a sane human being
will get in one of those, let alone approach the safety record of aircraft
that Musk is so fond of alluding to.

~~~
natep
If you feel that you are qualified to say it is a very concerning component
failure based on the scant evidence available, then that's up to you.

~~~
damoncali
I am, and of course it is concerning - the engine shut down and debris was
strewn about. Do the math on the failure rates. Unless things are dramatically
improved (the goal of course), these things are just not safe for people
outside of the dare-devil set. That's not a knock against Space X - this is
hard stuff. It _is_ a slight knock against Musk's over-the-top marketing that
has us on Mars in 15 years, which, in my opinion, is unrealistic.

------
ceejayoz
Earlier HN discussion: <http://news.ycombinator.com/item?id=4626866>

------
notjustanymike
Never before has "try { ... } catch (e) { ... }" been so important

~~~
btilly
Actually my understanding from talking to their first software developer is
that their systems are written in C++ and they do not use exceptions. Ever.

If that surprises you, consider that the default behavior of an uncaught
exception, anywhere in your code, no matter how minor, is to crash your
program. While you're in flight, the last thing that you want to see is a
software crash. Having software encounter an unanticipated state might or
might not destroy the rocket. Having your control system spontaneously cut out
in flight definitely _will_ destroy the rocket.

~~~
WalterBright
This is incorrect. In avionics software, you want software that self-detects
its own failure to quit immediately, and engage the backup system.

You do NOT want it to "soldier on" once it has entered an unknown state.

The general problem with error codes is they can be so easily ignored, and
then the software is operating in an unknown and untested state.

I suspect the actual reason why they eschewed exceptions is because exceptions
may not be able to guarantee hard realtime latency.

------
waratuman
Misleading title.

