
Behind the Lion Air Crash, a Trail of Decisions Kept Pilots in the Dark - simonbr
https://www.nytimes.com/2019/02/03/world/asia/lion-air-plane-crash-pilots.html
======
del82
Based only on what I read in this article, I can kind of see Boeing's point?
From what I understand, the procedure/checklist for an uncommanded nose down
didn't change from the old to the new version, even with the addition of MCAS.
So from the Pilot's perspective, there is nothing that they should do
differently in the new vs. the older 737s when this happens-- follow the
checklist, which will (eventually) cause you to flip the Stabilizer Trim
Cutout Switches, and that will fix the problem.

So the interface didn't change, and the procedure's the same. Should Boeing
and airlines update training every time they change something "under the
hood", even when the procedure for pilots is the same? How about when they
make software updates to already-flying models?

~~~
rocqua
One interface change was the effect of 'pulling hard back on the stick' in
case of runaway stabilizers. That worked with the old system, but not with
MCAS.

This seems to be exactly the interface change that lead to the crash.

~~~
CydeWeys
To use a car analogy ...

You can always override cruise control by stomping hard on the brake (like to
avoid an imminent crash). That's how it's always worked and you've gotten used
to this, and done it on occasion when warranted.

Now imagine that the next generation of adaptive cruise-control/"auto-
pilot"/whatever comes out, and stomping on the brakes no longer does anything.
You have to first disable the cruise control by pressing a button on the
steering wheel, and only then will inputs to the brake pedal do anything.

And then you don't tell drivers about this change.

You can totally see how, right in the lead-up to a fatal accident, a driver is
going to be focused solely on stomping on that brake pedal in increasing
panic, wondering right up to the moment of death why that's not doing
anything. They won't consider the cruise control off button because it's not
their most immediate need (braking is), and they've never needed to use it
before.

~~~
twtw
I do not think this is a good analogy.

Firstly, there is no single equivalent to "slamming on the brakes" for
uncommanded nose-down. This could be caused by a variety of faults, and pilots
are trained to respond in a fashion that will be effective for even those in
which the first thing to try doesn't work. There is a standard procedure in
place to use in this type of situation - arguing that the pilots should only
be expected to do one thing with increasing desperation is essentially arguing
that they will not be able to respond to a whole host of emergencies causing
uncommanded nose-down.

Second,

> Information from the flight data recorder shows that the plane’s nose was
> pitched down more than two dozen times during the brief flight, resisting
> efforts by the pilots to keep it flying level ... _The standard checklist
> for dealing with that sort of emergency on the previous version of the 737
> focuses on flipping the stabilizer trim cutout switches and using the manual
> wheels to adjust the stabilizers._ [emph mine]

your argument essentially hinges on the assumption that pulling hard back on
the stick is a sufficient solution for all the problems that may happen with a
plane with the exception of a fault with MCAS (the new system). I don't think
that is accurate. Even prior to the the 737 max! It sounds there were a lot of
things that would require further action that pulling back on the stick.

~~~
callmeal
>Firstly, there is no single equivalent to "slamming on the brakes" for
uncommanded nose-down.

Actually in the "old" version, there is.

From TFA:

Older 737s had another way of addressing certain problems with the
stabilizers: Pulling back on the yoke, or control column, one of which sits
immediately in front of both the captain and the first officer, would cut off
electronic control of the stabilizers, allowing the pilots to control them
manually.

Which makes the car analogy apt.

~~~
loeg
Drivers aren't trained to follow checklists and usually don't have dozens of
seconds to respond to mechanical emergencies. Cars also don't fall out of the
sky if they break down. It's not a great metaphor.

~~~
twtw
> have dozens of seconds to respond

12 minutes, in this case.

~~~
loeg
Yeah, I figured it's usually many minutes but didn't want to underplay the
stress and difficulty of following a checklist in an extreme emergency. In
contrast, car major mechanical failures generally resolve themselves extremely
quickly.

------
fabian2k
> In designing the 737 Max, Boeing decided to feed M.C.A.S. with data from
> only one of the two angle of attack sensors at a time, depending on which of
> two, redundant flight control computers — one on the captain’s side, one on
> the first officer’s side — happened to be active on that flight.

The one thing that seriously surprised me was that an automated system that is
able to point the airplane towards the ground is intentionally fed by a
single, non-redundant sensor.

Everything else I've read about various safety systems that limit or override
the pilot has explanations about how redundant sensors are used. And how the
system does switch itself off in case the sensors don't give consistent
results.

~~~
rocqua
There is one more technical surprise in this article for me. Pulling hard back
on the control collumn would override the stabilizer runaway in old versions,
but not MCAS. That is a major interface difference between the old and new
planes.

It sounds like the old flight manual stated one of two possible methods for
dealing with runaway stabilizers. Because the second method (hard back on the
control) wasn't in the manual, changes to that way weren't taken into account.
Hence a non backwards compatible change slipped through.

~~~
BlackFly
From the Semver perspective, change to an undocumented (and therefore not
public) feature is not a major change. The point is that their safety
documentation doesn't change from one system to the next.

Anyone who was "doing it by the book" was not pulling up on the stick.

Now, maybe Boeing was suggesting via side channels that there were alternate
ways to solve certain problems and those side channels should qualify as
public documentation... but it may have been intuition earned through
experience overriding standard procedure.

~~~
kelnos
That ignores the fact that people will rely on undocumented behavior anyway,
and a responsible developer should keep that in mind.

With a normal software library, you might make the decision to cause breakage
anyway, even if it inconveniences users of your library. Or you might not,
because you believe the inconvenience will be too great, and instead just
document the behavior and make it a part of the API.

With an airliner control system, you need to be a bit more careful, since a
pilot depending on undocumented behavior may do so in a way that could cost
lives if that behavior is changed. Is the pilot _correct_ to depend on that
behavior? No. But that's irrelevant when lives are at stake.

------
danielvf
Here's a video from a few years ago of two student pilots handling a trim
runaway in a 737 simulator.

[https://www.youtube.com/watch?v=3pPRuFHR1co&t=154](https://www.youtube.com/watch?v=3pPRuFHR1co&t=154)

You'll notice that it's a loud, physical event, with a very simple solution.

This happened over twenty times in the accident flight, and the pilots never
disabled the problematic automatic stabilizer system.

~~~
robocat
This is a great video showing the UI - it makes what is going on much clearer!

------
Someone1234
Makes one wonder if the FAA is too close to Boeing. Not only did they green
light this but they also put considerable pressure on EASA to do the same. The
FAA's first priority should be safety, not Boeing's bottom line or their
ability to more quickly deploy an aircraft update.

Pilots are pretty unhappy about this M.C.A.S. situation. They're literally
expected to fly an aircraft, and not even being told how that aircraft
functions. And while the checklist may eventually take care of this, that
isn't a substitute for a professional pilot in the cockpit. Just the lack of
training/simulated failures for this new system is highly irregular, pilots
are used to and expect such training while transitioning to a new major
aircraft version.

The biggest drivers here seem to be cost and Boeing's competitiveness, not
safety. I think it might be time for the EASA to trust the FAA a little less,
at least until they get their house back in order.

~~~
PedroBatista
The revolving door between FAA and cushy positions at Boeing is not a secret.
The shortcuts Boeing has been making with the blessing ( or willful omission
of the FAA ) have been discussed but with many interests in the middle, like
strong competition from Airbus, geopolitics, internal politics, States
outbidding each other to create more jobs, national ego, and straight up
greed.

"Word on the street", is that Boeing has lost most their safety reputation and
most people just do their job and try to not get burned when the planes start
to crash. I want to believe most are just blowing of steam, but I don't know..

~~~
dingaling
The 737 models of two generations ago ( 300 / 400 / 500 ) had several fatal
accidents in the 1990s due to runaway rudders. That dented Boeing's reputation
with users but not with the FAA.

------
danielvf
There's something called the "Swiss Cheese Theory" of accidents.
([https://en.wikipedia.org/wiki/Swiss_cheese_model](https://en.wikipedia.org/wiki/Swiss_cheese_model))

In a mostly-robust system, different layers catch and defend against the
errors of other layers in the system. For a major accident to occur, holes in
multiple layers have to line up that day.

In this case we have four holes that lined up that day - a plane model with a
possible rare software bug, an aircraft with a faulty sensor feeding bad
information to the computers, an airline company with internal culture that
continues to fly a specific aircraft that keeps trying to point at the ground,
sometimes without even making an attempt at fixing the problem, and finally,
on this fatal day, a crew that did not follow the proper procedures even after
twenty-three nose down incidents during the flight.

Even without the MACS system present, the last two holes seem like they would
bring down an airliner eventually, from one cause or another.

Pilots are rightfully mad about not being told about the MACS system. But it's
just one of many systems on a 737 that can trim the stabilizer to point that
it can't be flow. That's why the procedure for any stabilizer problem is to
disable automatic control of the stabilizer. The training and checklists that
the accident pilots had covered this, and previous pilots flying the accident
aircraft did this and then had uneventful flights.

~~~
FabHK
Still, to have a system on board that, with _one_ sensor malfunctioning,
repeatedly trims you down (unless you switch the cutout switch or physically
arrest the trim-wheel), is pretty tough.

By the way - in small airplanes, you can overcome trim with elevator pressure.
That's not necessarily the case on a passenger jet; and not only because it's
much bigger, but because the trim works differently [1]. I wonder whether that
played a role. I must admit that before I read [1], I had assumed that bad
trim is something I can overpower, when push comes to shove.

[1]
[https://www.skybrary.aero/bookshelf/books/2627.pdf](https://www.skybrary.aero/bookshelf/books/2627.pdf)

~~~
danielvf
Yeah, when the trim is a giant screw changing the angle of the whole
stabilizer, rather than just a little tab, it's a whole different ballgame.

There are plenty of single components on an airliner whose failure can cause a
stabilizer trim runaway. Different airliners handle it differently. On a 737
can you can cut out automatic control, and use wheels connected to the
stabilizer jackscrew with metal cables. On other airliners, you can cut out
automatic control, then switch second electric control system and use it
manually. A 737 stabilizer runway isn't an instant thing, and is a loud event
in the cockpit.

------
eternalny1
Boeing should face some major fines for this, and additional regulation is
going to be needed to make sure this doesn't happen in the future.

This all seems to come down to the fact they wanted to avoid having to retrain
pilots ($$$), so these automation changes were kept in the dark.

The crew before them dealt with this same problem but they successfully cut
out the trim system. They got lucky and they should have been more vocal in
expressing the fault outside of just a post-flight note about it.

The fact that the 737 can auto-trim itself beyond manual elevator authority,
due to a SINGLE faulty AOA sensor, is mind-boggling and scary.

~~~
extrapickles
From what I can tell, the previous crew did not get lucky, they just followed
the checklist which would have solved the issue in this case.

Auto-trim beyond the elevator authority is not a problem as the pilots can
take manual control of the trim by grabbing the trim wheel (its in a very
obvious spot on the 737).

The actual fix is hard as adding another alarm can get tricky from a UX
perspective during an emergency. Probably the only “fix” is to reinforce the
value of following the checklist.

~~~
kelnos
There already _is_ another alarm: an optional "angle-of-attack disagree"
indicator that Lion Air was apparently too cheap to install.

Now, that wouldn't have _directly_ pointed to what was wrong, but it would
have been pretty suggestive.

(I would suggest, though, that having an optional configuration that lacks
robustness for a system that can automatically point the plane toward the
ground... a really poor choice of options.)

------
cmurf
Actual example: Normal takeoff in instrument meteorological conditions (no
external visual references, flight by reference exclusive to instruments). The
attitude indicator shows proper climb attitude, vertical speed and altimeter
show positive rate of climb, airspeed indicator shows speed increasing above
target speed. Pilot response? Probably nose up and/or power reduction; OK they
do both. Airspeed indication continues to increase. Pilot noses up and powers
down. Airspeed increases. Pilot noses up aggressively. Stall. Crash.

What happened? The pitot tube and drain were clogged. Static port was clear.
This turned the airspeed indicator into an altimeter - it was incapable of
showing correct airspeed from the moment of blockage.

The cause of the crash is pilot error. The pilot is expected to recognize from
other instruments that the airspeed indicator is unreliable, and this is part
of training for instrument rating.

If the MCAS in the Lion Air crash made a similar mistake - using a single data
point to determine a stall condition. That is an error. It's functionally
"pilot error" to have no means of determining if the angle of attack sensor is
wrong, and no mechanism for disregarding its data. Further, the corrective
action it took, had the flight condition actually been true, sounds excessive.
If a human pilot did the exact same thing MCAS did, I expect the human would
be blamed - it would be pilot error to so aggressively nose down that you've
exchanged a level flight high speed stall (a rare event indeed) for a high
speed dive. That is not a competent recovery, in particular that there's
apparently no recognition of the danger of high speed dives let alone recovery
from them it's probably a really good idea if your stall recovery does not
ensue in a dive!

------
coldcode
If it affects flight stability especially in an emergency, then pilots should
be trained to understand what it affects. Period. Not doing so to save money
or get more sales is beyond stupid. Watch Air Disasters to see what happens
when highly trained pilots fail to do the right thing because they hadn't
trained to deal with what went wrong because the problem was something
different than what they knew. Flying is easy when things are working, pilot
training is the difference between dealing with an emergency or being dead.

------
Animats
From the article: _" In designing the 737 Max, Boeing decided to feed M.C.A.S.
with data from only one of the two angle of attack sensors at a time,
depending on which of two, redundant flight control computers — one on the
captain’s side, one on the first officer’s side — happened to be active on
that flight."_

They created a single point of failure that way. Why?

~~~
danielvf
By having each redundant flight computer hooked to completely different
sensors, in case of a bad sensor the crew can bypass not only the sensor, but
also any computation done with that sensor.

It's not a single point of failure as we think if it - if it starts acting up,
you can easily disable the automatic stabilizer system, per the procedures.
737 stabilizer runaways take several seconds to take effect, and are
recoverable afterword. Later you can switch flight computers and then be using
clean data, though you are supposed to leave the stabilizer system off for the
remainder of the flight.

~~~
Animats
Using only one sensor at a time, there's no "sensors disagree" fault to tell
the pilot there's a problem. Or to tell the flight control system it shouldn't
be taking drastic action based on that sensor.

Airbus uses three angle of attack sensors and compares them. They've had at
least one crash when two sensors failed in a consistent way.[1] The
vulnerability of aircraft flight control systems to bad AOA data is well
known.

[1] [https://news.aviation-safety.net/2010/09/17/report-
blocked-a...](https://news.aviation-safety.net/2010/09/17/report-blocked-aoa-
sensors-caused-loss-of-control-during-a320-check-flight/)

~~~
kirykl
There wasn't a "sensors disagree" alert in the Lion Air plane because they
didn't have installed the optional AOA Disagree indicator.

As comparison, Southwest had the indicator but has now also installed an
enhanced AOA Disagree indicator as a result of the Lion incident

[https://theaircurrent.com/aviation-safety/southwest-
airlines...](https://theaircurrent.com/aviation-safety/southwest-airlines-is-
adding-new-angle-of-attack-indicators-to-its-737-max-fleet/)

~~~
Animats
That's not a feature which should be a extra cost option.

------
dsego
Mentour Pilot did a youtube podcast about this air crash back in November.
[https://youtu.be/zfQW0upkVus](https://youtu.be/zfQW0upkVus)

------
hlandau
When we first heard this crash was due to a change in computer-controlled
stabilizer behaviour, my question was "why on earth did Boeing do this?".
Perhaps I didn't read deep enough, but the summary explanation that it was to
improve handling was a poor answer.

I guess what really bugged me about it is how un-Boeing-like this behaviour
was; a computer overriding a pilot (even if there is a way for a pilot to
override it in turn). It's fundamentally an Airbus-esque design.

As I read this article though, everything fell into place. As you read it you
start to see, with utter clarity, exactly how this happened organizationally.

It's well known that Airbus uses software flight envelope protection to enable
them to reduce the safety margin applied to the airframe, reducing weight. In
other words, fuel efficiency is improved by making airframes less airworthy
and compensating for it in software. I don't actually disagree with this as
such; it's been demonstrated to be a sound approach, but historically Airbus's
domain.

Essentially, it seems like what happened here is that Boeing finally felt the
need to adopt similar techniques to compete with Airbus on fuel efficiency
(though regarding engine size issues, not airframe safety margins, but still
making a plane's airworthiness more caveated and fixing it in software).
Essentially, we're witnessing the point at which Boeing feels its traditional
user interface philosophy (do what the pilot says) is conflicting with market
pressures.

If this were a new plane with a new type rating, this wouldn't be
unreasonable. Trying to tack this on to an existing plane, and not only that,
but doing everything in your power to minimise the amount of transition
training, is OTOH extraordinarily egregious.

The problem with this change isn't so much that Boeing's reasoning for not
telling pilots about it isn't logical. If anything, the problem is that their
reasoning _is_ utterly logical: the checklist will solve the problem anyway,
no matter the cause. You can see how this decision must have percolated
through different teams at Boeing, through regulators, via this unimpeachable-
seeming logic. The market pressures involved (fuel efficiency and retraining
costs) would have made it particularly hard to contest. It's a completely
logical line of reasoning... yet here we are with fatalities.

I'm very interested to note, though, this new revelation (to me at least) that
the yoke behaviour re: extreme deflection mitigating stabilizer runaway was
removed in the MAX. So what was Boeing's justification for _this_ change? Was
it even mentioned? If not, what on earth were the regulator's justifications
for allowing it to go unmentioned? I want to hear those justifications, since
it seems impossible to justify. I was under the impression that compatibility
of type ratings fundamentally revolved around an absence of differences in how
two planes handle, and how they respond to the yoke.

I should add, the reliance on a single sensor is also remarkable; makes me
wonder if this entire subsystem was really rushed and not given proper design
review, which would make sense given the circumstances (panicking to get a
product to market).

