Hacker News new | comments | ask | show | jobs | submit login
Behind the Lion Air Crash, a Trail of Decisions Kept Pilots in the Dark (nytimes.com)
104 points by simonbr 15 days ago | hide | past | web | favorite | 82 comments

Based only on what I read in this article, I can kind of see Boeing's point? From what I understand, the procedure/checklist for an uncommanded nose down didn't change from the old to the new version, even with the addition of MCAS. So from the Pilot's perspective, there is nothing that they should do differently in the new vs. the older 737s when this happens-- follow the checklist, which will (eventually) cause you to flip the Stabilizer Trim Cutout Switches, and that will fix the problem.

So the interface didn't change, and the procedure's the same. Should Boeing and airlines update training every time they change something "under the hood", even when the procedure for pilots is the same? How about when they make software updates to already-flying models?

One interface change was the effect of 'pulling hard back on the stick' in case of runaway stabilizers. That worked with the old system, but not with MCAS.

This seems to be exactly the interface change that lead to the crash.

To use a car analogy ...

You can always override cruise control by stomping hard on the brake (like to avoid an imminent crash). That's how it's always worked and you've gotten used to this, and done it on occasion when warranted.

Now imagine that the next generation of adaptive cruise-control/"auto-pilot"/whatever comes out, and stomping on the brakes no longer does anything. You have to first disable the cruise control by pressing a button on the steering wheel, and only then will inputs to the brake pedal do anything.

And then you don't tell drivers about this change.

You can totally see how, right in the lead-up to a fatal accident, a driver is going to be focused solely on stomping on that brake pedal in increasing panic, wondering right up to the moment of death why that's not doing anything. They won't consider the cruise control off button because it's not their most immediate need (braking is), and they've never needed to use it before.

But the difference is not only do the pilots of a commercial airliner have vastly more training and experience than your average driver, but they literally have a list of things to try (and extensive training on following the list) in the case of a malfunction like an uncommanded nose-down.

Imagine that the failure that caused the nose-down wasn't a failed AOA sensor giving bad readings to MCAS, but some other reason that _also_ wouldn't have been solved by pulling back on the yoke, but would have been by completing some later step on the checklist. Suppose the pilots didn't follow the checklist. Would that be enough information to say that Boeing is responsible?

To be sure, in this case the pilots may have followed the checklist! It may well be the case that Boeing is completely responsible! The checklist items might not have worked, or there may have been a good reason that the pilots didn't follow it, or the checklist might have been crazy, or they might not have had time to do what needed to be done, etc. There's still a whole lot that's not known (or hasn't been released) about what happened.

I'm just not sure that the current evidence, _viz_ that Boeing made an internal software change, that they didn't explicitly call it out to pilots, and that there's no difference in the actions prescribed in the event of an uncommanded nose-down pre- and post-change, is enough to say that the fault is entirely Boeing's for this accident.

More training and experience is a problem as previous plane model behaved differently. Pilot trained this situation and plane did X and now does Y - you don't really see the problem? - now pilot has extra burden to quickly determine if something else is wrong as plane behaves differently than pilot was trained to.

And you are writing this after many, many accidents that root cause was pilot not exactly knowing what or why something happens with a plane or plane autopilot (ie. AF 447)

Being apologetic of cost cutting on safety issues is dumb as it erodes culture of safety and encourages others to skimp on safety.

> Pilot trained this situation and plane did X and now does Y - you don't really see the problem? - now pilot has extra burden to quickly determine if something else is wrong as plane behaves differently than pilot was trained to.

The pilots did not train for a specific root cause of a fault. They trained for a symptom (uncommanded nose down), and the procedure for that situation was unchanged.

> extra burden to quickly determine if something else is wrong

This isn't an extra burden. Pilots aren't doing root cause analysis for failures while they are responding to them. They are trained to try actions in a specific order until something works. It's not like the checklist used to have one action on it and now it has two - the solution in this case was a standard action on the standard checklist.

Pilots do not look around, say "ah, electrical short in elevator actuator" or "ah, bad angle of attack sensor" and then take a single action.

They did train recovery scenario on simulator and know how plane should behave (based on old model knowledge) - now it behaves differently. All you wrote is they should ignore it as not important detail and stick to the checklist.

Unexpected situations are always extra cognitive burden.

> Unexpected situations are always extra cognitive burden.

From my perspective, it is a human factors issue that Boeing failed to consider.

Yes, the emergency procedure remained the same, but even well-trained pilots are still people. And changing the behavior of the system in such situations was not well-advised.

I do not think this is a good analogy.

Firstly, there is no single equivalent to "slamming on the brakes" for uncommanded nose-down. This could be caused by a variety of faults, and pilots are trained to respond in a fashion that will be effective for even those in which the first thing to try doesn't work. There is a standard procedure in place to use in this type of situation - arguing that the pilots should only be expected to do one thing with increasing desperation is essentially arguing that they will not be able to respond to a whole host of emergencies causing uncommanded nose-down.


> Information from the flight data recorder shows that the plane’s nose was pitched down more than two dozen times during the brief flight, resisting efforts by the pilots to keep it flying level ... The standard checklist for dealing with that sort of emergency on the previous version of the 737 focuses on flipping the stabilizer trim cutout switches and using the manual wheels to adjust the stabilizers. [emph mine]

your argument essentially hinges on the assumption that pulling hard back on the stick is a sufficient solution for all the problems that may happen with a plane with the exception of a fault with MCAS (the new system). I don't think that is accurate. Even prior to the the 737 max! It sounds there were a lot of things that would require further action that pulling back on the stick.

Their response would have worked to return to level flight in non-MAX variants of the plane, though. Regardless of what the manual or checklist says, they had years of experience flying 737s that behaved in a specific way, they developed an unconscious/intuitive mental model of the plane based on those behaviors (that is faster and quicker to react than consulting checklists), and then a significant and deadly change was made to those behaviors and not communicated to them for cost-cutting reasons.

There's clearly a problem on Boeing's end here.

From the article:

"Older 737s had another way of addressing certain problems with the stabilizers: Pulling back on the yoke, or control column, one of which sits immediately in front of both the captain and the first officer, would cut off electronic control of the stabilizers, allowing the pilots to control them manually.

That feature was disabled on the Max when M.C.A.S. was activated — another change that pilots were unlikely to have been aware of. After the crash, Boeing told airlines that when M.C.A.S. is activated, as it appeared to have been on the Lion Air flight, pulling back on the control column will not stop so-called stabilizer runaway."

If the above is true, it's near criminal that Boeing didn't notify pilots about the change. It's also not just about following checklists but understanding the systems of the aircraft.

There's a Faustian bargain made by accepting so much abstraction from the true nature of an airplane by using software, and yet also that abstraction is not absolute. Pilots are made responsible for knowing all of the aircraft's behaviors, with full abstraction (normal mode), as well as partial abstractions (error and failure modes, of which there can be quite a few for a specific made/model, many of which must be inferred rather than being clearly displayed). If pilots get confused, however complicated and unintuitive the system behavior, they are blamed.

The more "piloting" computers are doing, it seems appropriate that they will be increasingly, properly, accused of the equivalent of "pilot error". And yet here is Boeing, taking more and more piloting responsibility, while still blaming the pilots when that doesn't work out. It's a deification of engineering: when things go properly and safely, praise engineering; when things fail, blame the pilots.

The point is that their response may have worked for this particular fault, but would not have worked in general. There is a reason that there are more procedures than "pull back on the stick" - pilots do not know what the fault is.

> they developed an unconscious/intuitive mental model of the plane

If the plane has a fault, it's frequently not going to behave like their unconscious model says it should. In the happy case, rely on your intuition. In the unhappy case, when things are working as they should, follow the emergency procedure.

Sorr, but this is a little like saying "follow standard procedure to put the landing gear down. If that procedure doesn't work, repeat your intuitive action over and over until hopefully the landing gear goes down." No - follow the emergency procedure in place for landing gear fails to descend.

> not communicated to them for cost-cutting reasons

Take it up with the FAA and the airlines?

> Take it up with the FAA and the airlines?

Sure. It's not just Boeings fault. I believe the FAA will course correct from this and probably all changes to flight controls will need to be reported to pilots. More lessons written in blood.

It's still more Boeing's fault than anyone else's though. They were the ones who made this change and then tried to hide it to the fullest extent possible, so that it wouldn't trigger mandatory retraining.

> and then tried to hide it to the fullest extent possible

I haven't seen any evidence of this. It's definitely not supported by this NYTimes article.

Do you have a source?

>Firstly, there is no single equivalent to "slamming on the brakes" for uncommanded nose-down.

Actually in the "old" version, there is.

From TFA:

Older 737s had another way of addressing certain problems with the stabilizers: Pulling back on the yoke, or control column, one of which sits immediately in front of both the captain and the first officer, would cut off electronic control of the stabilizers, allowing the pilots to control them manually.

Which makes the car analogy apt.

Indeed, any pilot will respond to nose down with ,pulling back on the yoke, or on the joystick, in other and older aircraft. This is a basic flying control, as basic as steering wheel and brakes, making this an excellent analogy. Any system that complicates this in the way described is a terrible system, for the reasons explained above.

Drivers aren't trained to follow checklists and usually don't have dozens of seconds to respond to mechanical emergencies. Cars also don't fall out of the sky if they break down. It's not a great metaphor.

> have dozens of seconds to respond

12 minutes, in this case.

Yeah, I figured it's usually many minutes but didn't want to underplay the stress and difficulty of following a checklist in an extreme emergency. In contrast, car major mechanical failures generally resolve themselves extremely quickly.

And yet the pilots of the previous flight flipped the cutout switches.

EDIT: ... Presumably because it's literally the second thing on the list of things to do in the case of a runaway stabilizer.

I strongly agree. I've read most of the other comments here, and as an armature pilot myself, I think most of the responses to you are missing the point.

Its extremely unfortunate, but these pilots had ~10 minutes of flight time between their request to return to the airport noting aircraft control issues, and their crash, during which they completely failed to follow their checklists. Lion Air put those under-trained (if we're being generous) pilots into that cockpit and they are the only ones who carry responsibility for this crash.

Its anecdotal, but most of my older flight instructors have been lamenting the loss of critical pilot skills in the industry, and this appears to be just another example of that.

>your argument essentially hinges on the assumption that pulling hard back on the stick is a sufficient solution for all the problems that may happen with a plane with the exception of a fault with MCAS (the new system).

I don't read it that way. To me, it's specifically based on removal (without notice to pilots) of a function that would absolutely have terminated a trim runaway condition.

In other words, Boeing eliminated one well-known and trained-on trim runaway fix, and didn't tell the pilots. This is different from saying that stick pullback is a universal solution.

I think that's a pretty good analogy. If you do everything right, you'll get this malfunction under control. But it's easy to see how pilots whose plane is acting up unexpectedly can miss it, and it's reasonable to ask whether Boeing could've done better - and the regulators (note that according to the article, FAA and EASA were convinced by Boeing that this did not require re-training, while the Brazilian regulator wasn't).

> If you do everything right, you'll get this malfunction under control. But it's easy to see how pilots whose plane is acting up unexpectedly can miss it, and it's reasonable to ask whether Boeing could've done better

Absolutely reasonable to ask what Boeing could have done better. I'm just not sure the information about Boeing's actions contained in the article gets me all the way to "indefensible", paraphrasing another comment.

> That's how it's always worked and you've gotten used to this

but still, instead of going trough the full checklist they stopped and tried repeatedly whatever fixed the issue in the past/on the simulator.

a more complete analogy:

"follow these ten step do diagnose a bug on the software"

"but last time it was just a compilation flag, I'll check the compilation flags"

"bug persists"

"last time it was just a compilation flag, I'll check compilation flags"

"bug persists"

"last time it was just a compilation flag, I'll check the compilation flags"

"system halts"

From what I understand, part of the problem was that it helped temporarily... and then the system pointed the nose down again.

yeah I'm not siding with Boing here, what I'm saying is that pilots are given checklists for a reason, and going by the usual hunches instead of following the checklist as they were supposed to is as much a failure in training as is a failure in the plane user interface.

While the trim is not too far down, you can still counteract it with elevator (pulling the yoke). But then, MACS kicks in again and pushes the trim further and further down, and if you don't trim up and/or switch the trim cut-out, then you get into the situation where you can't recover anymore using the yoke.

"system halts"

That's one heck of an analogy! Brutal

Or that the system fights your stomping hard on the brake by accelerating. In the Lion Air crash, the system fought the pilots by aggressively nosing down. That it was doing this because of faulty sensor data, is different than how other 737's behave in the same situation.

For reference, here is the runaway stabilizer memory item (the "checklist") for 737:

1. Control column ............................. Hold firmly

2. Autopilot (if engaged) ..................... Disengage

3. IF the runaway stops:

------------------------ [done]

4. IF the runaway continues:

STAB TRIM CUTOUT switches (both) ...... CUTOUT

IF the runaway continues:

Stabilizer trim wheel ............ Grasp and hold


It's #4 that's of interest here. People saying that the interface changed are saying that it's fine if pilots stop after #1, even when dealing with runaway stabilizer for 12 minutes.

Accurate but irrelevant. This assumes the pilots knew they were dealing with a runaway stabilizer, and considering M.C.A.S. wasn't behaving similar to their previous training/experience/simulations with Runaway Stabilizers, it isn't clear they'd know they should follow this checklist. Now had they been actually trained on M.C.A.S. inc. faults, they may have known to do exactly this and we wouldn't even be discussing it.

This has been discussed in great detail on other flight forums.

It's not really possible to not know that you are dealing with a runaway stabilizer. MCAS (and every other automatic system to adjust trim) causes large physical wheels at the side of the pilot's knee to spin.

And these large wheels have small bell/clackers on them so you get a distinctly audible signal in addition to the large black wheels with white stripes on them.

Here's a video of the wheels in motion: https://vimeo.com/34501723

Disregard the text on the video description as it's blatantly wrong (describing that the trim tabs still move without the wheels moving; wrong on two accounts: the trim doesn't move without the wheels moving and the 737 uses a jackscrew for horizontal stabilizer trim rather than trim tabs [which is why the pilot can't simply override the aerodynamic force as they could with a trim tab])

Jackscrew operation: https://www.youtube.com/watch?v=rxPa9A-k2xY

The 7-3 has balance tabs to make the control forces lighter in the event of a hydraulics failure, but these are not trim tabs in any sense of the word.

As I read the article, it seemed like before, step 1 would suffice before. If this is not the case with MCAS, it could at the very least throw the pilot's diagnostics. "Is it runaway stabilizers? Well, no response from holding the control column firmly so I guess not. Lets check other things".

Sure, the correct response is to follow the checklist, but should we really rely on pilots always knowing the correct response? Especially when that goes against their previous experience? Not that I am defending these pilots not knowing the checklists, instead I am arguing we should take into account learning from experience rather than theory.

That's fair, but what does the checklist say? I mean, say there are a thousand different malfunctions that could cause an uncommanded nose-down. This one happens to be #359, but the pilots don't know that, they just know that they're pointed at the ground. Maybe it used to be the case that the first item on the checklist (pull back on the stick) fixed problem #359, and now it's the second. But there are several hundred other malfunctions that might have caused the problem that also aren't solved by pulling back on the stick, so the next move is to go to the next item on the checklist, right?

Pull back wasn't in the original manual but was a relied upon method. Boeing fucked up years ago by not documenting the function so it was allowed to be dropped transparently. How is that defensible?

Why was a method that wasn't in the official checklist "relied upon"? Pilots following some undocumented, non-standard procedure sounds like their fault rather than Boeing's.

Who knows? Have you ever been taught an undocumented procedure by the expert and been told to use it regardless of what the manual states? Happens all the time? Is a pilot in a position to affect Boeing beauracracy?

Sorry but how could Boeing design any plane if they can't assume pilots will follow the manual or any checklists?

By including features in the manual. The alternative is that they don't, people rely on the features, they change the features without notifying anyone, and when shit goes wrong they can say "It wasn't documented: shouldn't do that".

The real question here is how commonly this feature was used. If it was common, then not putting it in the manual is a big problem!

The analogy to code would be if a library had a documented API to do something, but some people were using undocumented behavior in another part of the API to do the same thing, and the undocumented behavior changed. The difference is the consequences and how you prepare for them. With a library, there are many steps at which the change could be noticed by users: issue trackers, mailing list discussions, prerelease builds, integration tests, and test environments. Plus for most software nobody will be killed if the application goes down.

I see Boeing's point, too, but to me it just means both sides are at fault. The pilots are at fault for not following the emergency checklist. Boeing is at fault for abusing rules to slip in a change based on the assumption that pilots never rely on their own understanding of the aircraft, which I'm sure they know to be false. Air safety is all about human factors, and that's a pretty obvious one.

The interface isn’t opaque in airplanes like in software, everyone learns about how the internals of the aircraft work in order to troubleshoot problems. So pilots are reasoning based how how they think it works under the hood.

> In designing the 737 Max, Boeing decided to feed M.C.A.S. with data from only one of the two angle of attack sensors at a time, depending on which of two, redundant flight control computers — one on the captain’s side, one on the first officer’s side — happened to be active on that flight.

The one thing that seriously surprised me was that an automated system that is able to point the airplane towards the ground is intentionally fed by a single, non-redundant sensor.

Everything else I've read about various safety systems that limit or override the pilot has explanations about how redundant sensors are used. And how the system does switch itself off in case the sensors don't give consistent results.

There is one more technical surprise in this article for me. Pulling hard back on the control collumn would override the stabilizer runaway in old versions, but not MCAS. That is a major interface difference between the old and new planes.

It sounds like the old flight manual stated one of two possible methods for dealing with runaway stabilizers. Because the second method (hard back on the control) wasn't in the manual, changes to that way weren't taken into account. Hence a non backwards compatible change slipped through.

One hopes that in the fix version of the software that goes out, they'll retain that backwards-compatible manual override again. It seems like a flat-out mistake that MCAS, which solely takes input from a single non-redundant sensor, overrides manual inputs silently.

From a UX perspective what should have happened is the plane telling the pilots: "I've detected the danger of a stall and corrected for it!" with the pilots being able to answer either "Ooops thank you!" but also "Don't do this for the rest of the flight!"

I'm a bit baffled about how the previous flight had the problem, apparently misdiagnosed but resolved it, and the new crew had the same problem but was unable resolve it. How's it possible that the plane takes corrective action without telling the pilots? It's not like getting close to a stall is a routine occurrence, is it?

I would guess the previous flight's pilots had the same initial 'pull back' impulse but then reverted to their training and followed the checklist after that.

Yes, there should definitely be visual and auditory warnings as well when MCAS is engaged. When the autopilot is doing something so important as struggling to prevent a stall (and in its mind, failing, because of the faulty AoA sensor), it definitely needs to be raising alarms.

Also, given how important the AoA data apparently is, it may not be displayed prominently enough. In clear weather, level flight flying, it should be really obvious if this is drastically wrong.

When anything in a 737 moves the stabilizers, physical wheels in the cockpit loudly turn right next to the pilots. In the case of a stabilizer runaway, it's really obvious. (See https://www.youtube.com/watch?v=3pPRuFHR1co&t=154)

MCAS is just one of many different systems on a 737 that can automatically adjust the stabilizer. Because there are so many things that could be the cause of the problem, the checklist procedure for any runaway is just to cutout automatic control of the stabilizers.

Here is Boeing official update to all 737 MAX operators after the crash, which boils down to saying "follow the checklist, dummies".


Thanks. So that blaring and the wheels turning is what we assume happened 24 times in that cockpit before the crash?

If I understand this right the pilots must have known the plane did adjust stabilizers, but they didn't have a way of finding out why it did it. And per checklist they shouldn't have cared for the why, but just cut-out the automatic control.

Southwest has added enhanced AOA indicators to its Max fleet as a result [0].

Lion Air didn't even have the basic (optional) AOA Disagree alert[1]



It makes you wonder why such a seemingly important alert is optional. Cars these days can't come without seat belts, brakes, ABS, traction control, ESC, and more, but apparently seemingly essential safety features on planes are optional??

Having yet one more thing beeping / talking at you in an emergency situation isn't always the right design decision...

From the Semver perspective, change to an undocumented (and therefore not public) feature is not a major change. The point is that their safety documentation doesn't change from one system to the next.

Anyone who was "doing it by the book" was not pulling up on the stick.

Now, maybe Boeing was suggesting via side channels that there were alternate ways to solve certain problems and those side channels should qualify as public documentation... but it may have been intuition earned through experience overriding standard procedure.

That ignores the fact that people will rely on undocumented behavior anyway, and a responsible developer should keep that in mind.

With a normal software library, you might make the decision to cause breakage anyway, even if it inconveniences users of your library. Or you might not, because you believe the inconvenience will be too great, and instead just document the behavior and make it a part of the API.

With an airliner control system, you need to be a bit more careful, since a pilot depending on undocumented behavior may do so in a way that could cost lives if that behavior is changed. Is the pilot correct to depend on that behavior? No. But that's irrelevant when lives are at stake.

This is a classic case of 'work to spec' vs 'work to practice'. Yes, people should go of the official documentation rather than what works in practice, but it is obtuse to presume everyone will do it in the same way.

Normally, I'd say that those who ignored the spec deserve less consideration. However, when that involves giving less consideration to 100+ passengers as well, that changes.

Here's a video from a few years ago of two student pilots handling a trim runaway in a 737 simulator.


You'll notice that it's a loud, physical event, with a very simple solution.

This happened over twenty times in the accident flight, and the pilots never disabled the problematic automatic stabilizer system.

This is a great video showing the UI - it makes what is going on much clearer!

Makes one wonder if the FAA is too close to Boeing. Not only did they green light this but they also put considerable pressure on EASA to do the same. The FAA's first priority should be safety, not Boeing's bottom line or their ability to more quickly deploy an aircraft update.

Pilots are pretty unhappy about this M.C.A.S. situation. They're literally expected to fly an aircraft, and not even being told how that aircraft functions. And while the checklist may eventually take care of this, that isn't a substitute for a professional pilot in the cockpit. Just the lack of training/simulated failures for this new system is highly irregular, pilots are used to and expect such training while transitioning to a new major aircraft version.

The biggest drivers here seem to be cost and Boeing's competitiveness, not safety. I think it might be time for the EASA to trust the FAA a little less, at least until they get their house back in order.

The revolving door between FAA and cushy positions at Boeing is not a secret. The shortcuts Boeing has been making with the blessing ( or willful omission of the FAA ) have been discussed but with many interests in the middle, like strong competition from Airbus, geopolitics, internal politics, States outbidding each other to create more jobs, national ego, and straight up greed.

"Word on the street", is that Boeing has lost most their safety reputation and most people just do their job and try to not get burned when the planes start to crash. I want to believe most are just blowing of steam, but I don't know..

The 737 models of two generations ago ( 300 / 400 / 500 ) had several fatal accidents in the 1990s due to runaway rudders. That dented Boeing's reputation with users but not with the FAA.

There's something called the "Swiss Cheese Theory" of accidents. (https://en.wikipedia.org/wiki/Swiss_cheese_model)

In a mostly-robust system, different layers catch and defend against the errors of other layers in the system. For a major accident to occur, holes in multiple layers have to line up that day.

In this case we have four holes that lined up that day - a plane model with a possible rare software bug, an aircraft with a faulty sensor feeding bad information to the computers, an airline company with internal culture that continues to fly a specific aircraft that keeps trying to point at the ground, sometimes without even making an attempt at fixing the problem, and finally, on this fatal day, a crew that did not follow the proper procedures even after twenty-three nose down incidents during the flight.

Even without the MACS system present, the last two holes seem like they would bring down an airliner eventually, from one cause or another.

Pilots are rightfully mad about not being told about the MACS system. But it's just one of many systems on a 737 that can trim the stabilizer to point that it can't be flow. That's why the procedure for any stabilizer problem is to disable automatic control of the stabilizer. The training and checklists that the accident pilots had covered this, and previous pilots flying the accident aircraft did this and then had uneventful flights.

Still, to have a system on board that, with one sensor malfunctioning, repeatedly trims you down (unless you switch the cutout switch or physically arrest the trim-wheel), is pretty tough.

By the way - in small airplanes, you can overcome trim with elevator pressure. That's not necessarily the case on a passenger jet; and not only because it's much bigger, but because the trim works differently [1]. I wonder whether that played a role. I must admit that before I read [1], I had assumed that bad trim is something I can overpower, when push comes to shove.

[1] https://www.skybrary.aero/bookshelf/books/2627.pdf

Yeah, when the trim is a giant screw changing the angle of the whole stabilizer, rather than just a little tab, it's a whole different ballgame.

There are plenty of single components on an airliner whose failure can cause a stabilizer trim runaway. Different airliners handle it differently. On a 737 can you can cut out automatic control, and use wheels connected to the stabilizer jackscrew with metal cables. On other airliners, you can cut out automatic control, then switch second electric control system and use it manually. A 737 stabilizer runway isn't an instant thing, and is a loud event in the cockpit.

Boeing should face some major fines for this, and additional regulation is going to be needed to make sure this doesn't happen in the future.

This all seems to come down to the fact they wanted to avoid having to retrain pilots ($$$), so these automation changes were kept in the dark.

The crew before them dealt with this same problem but they successfully cut out the trim system. They got lucky and they should have been more vocal in expressing the fault outside of just a post-flight note about it.

The fact that the 737 can auto-trim itself beyond manual elevator authority, due to a SINGLE faulty AOA sensor, is mind-boggling and scary.

From what I can tell, the previous crew did not get lucky, they just followed the checklist which would have solved the issue in this case.

Auto-trim beyond the elevator authority is not a problem as the pilots can take manual control of the trim by grabbing the trim wheel (its in a very obvious spot on the 737).

The actual fix is hard as adding another alarm can get tricky from a UX perspective during an emergency. Probably the only “fix” is to reinforce the value of following the checklist.

There already is another alarm: an optional "angle-of-attack disagree" indicator that Lion Air was apparently too cheap to install.

Now, that wouldn't have directly pointed to what was wrong, but it would have been pretty suggestive.

(I would suggest, though, that having an optional configuration that lacks robustness for a system that can automatically point the plane toward the ground... a really poor choice of options.)

> and additional regulation is going to be needed to make sure this doesn't happen in the future.

Given that the FAA made the decision that it was fine to not retrain the pilots, sounds like there's going to have to be someone to regulate the regulator.

Actual example: Normal takeoff in instrument meteorological conditions (no external visual references, flight by reference exclusive to instruments). The attitude indicator shows proper climb attitude, vertical speed and altimeter show positive rate of climb, airspeed indicator shows speed increasing above target speed. Pilot response? Probably nose up and/or power reduction; OK they do both. Airspeed indication continues to increase. Pilot noses up and powers down. Airspeed increases. Pilot noses up aggressively. Stall. Crash.

What happened? The pitot tube and drain were clogged. Static port was clear. This turned the airspeed indicator into an altimeter - it was incapable of showing correct airspeed from the moment of blockage.

The cause of the crash is pilot error. The pilot is expected to recognize from other instruments that the airspeed indicator is unreliable, and this is part of training for instrument rating.

If the MCAS in the Lion Air crash made a similar mistake - using a single data point to determine a stall condition. That is an error. It's functionally "pilot error" to have no means of determining if the angle of attack sensor is wrong, and no mechanism for disregarding its data. Further, the corrective action it took, had the flight condition actually been true, sounds excessive. If a human pilot did the exact same thing MCAS did, I expect the human would be blamed - it would be pilot error to so aggressively nose down that you've exchanged a level flight high speed stall (a rare event indeed) for a high speed dive. That is not a competent recovery, in particular that there's apparently no recognition of the danger of high speed dives let alone recovery from them it's probably a really good idea if your stall recovery does not ensue in a dive!

If it affects flight stability especially in an emergency, then pilots should be trained to understand what it affects. Period. Not doing so to save money or get more sales is beyond stupid. Watch Air Disasters to see what happens when highly trained pilots fail to do the right thing because they hadn't trained to deal with what went wrong because the problem was something different than what they knew. Flying is easy when things are working, pilot training is the difference between dealing with an emergency or being dead.

From the article: "In designing the 737 Max, Boeing decided to feed M.C.A.S. with data from only one of the two angle of attack sensors at a time, depending on which of two, redundant flight control computers — one on the captain’s side, one on the first officer’s side — happened to be active on that flight."

They created a single point of failure that way. Why?

By having each redundant flight computer hooked to completely different sensors, in case of a bad sensor the crew can bypass not only the sensor, but also any computation done with that sensor.

It's not a single point of failure as we think if it - if it starts acting up, you can easily disable the automatic stabilizer system, per the procedures. 737 stabilizer runaways take several seconds to take effect, and are recoverable afterword. Later you can switch flight computers and then be using clean data, though you are supposed to leave the stabilizer system off for the remainder of the flight.

Using only one sensor at a time, there's no "sensors disagree" fault to tell the pilot there's a problem. Or to tell the flight control system it shouldn't be taking drastic action based on that sensor.

Airbus uses three angle of attack sensors and compares them. They've had at least one crash when two sensors failed in a consistent way.[1] The vulnerability of aircraft flight control systems to bad AOA data is well known.

[1] https://news.aviation-safety.net/2010/09/17/report-blocked-a...

There wasn't a "sensors disagree" alert in the Lion Air plane because they didn't have installed the optional AOA Disagree indicator.

As comparison, Southwest had the indicator but has now also installed an enhanced AOA Disagree indicator as a result of the Lion incident


That's not a feature which should be a extra cost option.

Cost cutting. The article points that picture quite clearly. They created the whole MCAS system to mitigate problems with the aircraft's design in a very short span of time, and needed the whole thing to be cheap.

Mentour Pilot did a youtube podcast about this air crash back in November. https://youtu.be/zfQW0upkVus

When we first heard this crash was due to a change in computer-controlled stabilizer behaviour, my question was "why on earth did Boeing do this?". Perhaps I didn't read deep enough, but the summary explanation that it was to improve handling was a poor answer.

I guess what really bugged me about it is how un-Boeing-like this behaviour was; a computer overriding a pilot (even if there is a way for a pilot to override it in turn). It's fundamentally an Airbus-esque design.

As I read this article though, everything fell into place. As you read it you start to see, with utter clarity, exactly how this happened organizationally.

It's well known that Airbus uses software flight envelope protection to enable them to reduce the safety margin applied to the airframe, reducing weight. In other words, fuel efficiency is improved by making airframes less airworthy and compensating for it in software. I don't actually disagree with this as such; it's been demonstrated to be a sound approach, but historically Airbus's domain.

Essentially, it seems like what happened here is that Boeing finally felt the need to adopt similar techniques to compete with Airbus on fuel efficiency (though regarding engine size issues, not airframe safety margins, but still making a plane's airworthiness more caveated and fixing it in software). Essentially, we're witnessing the point at which Boeing feels its traditional user interface philosophy (do what the pilot says) is conflicting with market pressures.

If this were a new plane with a new type rating, this wouldn't be unreasonable. Trying to tack this on to an existing plane, and not only that, but doing everything in your power to minimise the amount of transition training, is OTOH extraordinarily egregious.

The problem with this change isn't so much that Boeing's reasoning for not telling pilots about it isn't logical. If anything, the problem is that their reasoning is utterly logical: the checklist will solve the problem anyway, no matter the cause. You can see how this decision must have percolated through different teams at Boeing, through regulators, via this unimpeachable-seeming logic. The market pressures involved (fuel efficiency and retraining costs) would have made it particularly hard to contest. It's a completely logical line of reasoning... yet here we are with fatalities.

I'm very interested to note, though, this new revelation (to me at least) that the yoke behaviour re: extreme deflection mitigating stabilizer runaway was removed in the MAX. So what was Boeing's justification for this change? Was it even mentioned? If not, what on earth were the regulator's justifications for allowing it to go unmentioned? I want to hear those justifications, since it seems impossible to justify. I was under the impression that compatibility of type ratings fundamentally revolved around an absence of differences in how two planes handle, and how they respond to the yoke.

I should add, the reliance on a single sensor is also remarkable; makes me wonder if this entire subsystem was really rushed and not given proper design review, which would make sense given the circumstances (panicking to get a product to market).

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact