Hacker News new | past | comments | ask | show | jobs | submit login
737 Max Explanation by a Software Engineer (twitter.com)
759 points by paulsutter 36 days ago | hide | past | web | favorite | 270 comments



I'm an assistant professor of aerospace engineering and I find this analysis quite spot on, in which this is representative of a much larger issue of economic and regulatory negative incentives, rather than just a "software issue" as some news outlets have reported. What I find downright criminal is this:

> Boeing sells an option package that includes an extra AoA vane, and an AoA disagree light

The fact that the redundancy of a sensor on which a system capable of sudden, large control inputs relies is an optional package to be purchased separately... I simply have no words.

How was this package advertised in the brochure? Pay extra and when the airplane nosedives at high speed, this useful indicator will helpfully warn you it's because AoA reading disagreement?


I was amazed at that. Boeing used to be known for overdesign for safety. The B-747 had four redundant hydraulic systems. Here's a 787 doing aerobatics at the Farnborough air show to show that it can operate way outside the normal passenger aircraft flight envelope.[1]

Boeing used to be an engineering-first company. HQ was at Boeing's own airport near Seattle. Then they got new management and moved corporate HQ to Chicago.

[1] https://www.youtube.com/watch?v=vzr313wSY_Y


I have noticed “safety fade” can happen in projects and organisations.

In particular when you are doing something for the first time, uncertainty creates a space for a safety culture to operate.

Design, process and safety emphasis lead to operational success.

Over time the original designers and engineers leave, new staff join. Field experience shows that most of the safety problems that were anticipated in design never happen. The organisation starts to become complacent.

All the safety features and procedures start to be considered, psychologically, as a “safety margin” or a kind of budget that can now be spent by taking more risks. Triple or quadruple redundancy just starts looking like extra weight.

Within the org, pressures to lower costs and deliver faster start to win out...


You have independently discovered a phenomenon that is known as the normalization of deviance. It has been seen in many engineering failures, such as both Space Shuttle crashes, the Deepwater Horizon explosion, etc.

https://flightsafety.org/asw-article/normalization-of-devian...


> Over time the original designers and engineers leave, new staff join. Field experience shows that most of the safety problems that were anticipated in design never happen. The organisation starts to become complacent.

This lifecycle pattern can be found at other organizational scales, including countries, e.g. soldier -> law -> artist -> merchant -> warlord -> soldier ..


Where does this soldier -> lawyer etc formulation come from? Interesting idea!


Inspired by this John Adams quote from 1780, which covers the first three phases, https://www.masshist.org/digitaladams/archive/doc?id=L178005...

> I must study Politicks and War that my sons may have liberty to study Mathematicks and Philosophy. My sons ought to study Mathematicks and Philosophy, Geography, natural History, Naval Architecture, navigation, Commerce and Agriculture, in order to give their Children a right to study Painting, Poetry, Musick, Architecture, Statuary, Tapestry and Porcelaine.


FWIW "Boeing Field" is known as that because it just happens to be named after William Boeing, not because it was owned or operated by Boeing. It's owned by King County. In fact, the real name is King County International Airport.


[flagged]


That has nothing to do with the comment above or the conversation in general


I was at Farnbourough in 2014 and the A380 was even more amazing: https://www.youtube.com/watch?v=Ew_kzgqCD2k

And the military A400M -- 4 engine turbprop -- https://www.youtube.com/watch?v=voK2o62KpqI -- quite amazing to see when you are used to those tiny regional turboprops.


That A380 take off is very impressive. Pity in 10 years time we will probably feel similar to it the way we feel about Concorde today. Thanks for posting.


Not quite - they're still manufacturing A380s until 2021. Those will probably still be flying for at least 15 years. It won't be a distant memory like the Concorde for 25-30 years.


Yeah that take off roll is crazy short for such a huge aircraft.

Impressive performance with the plane empty and probably only an hour or two of fuel.


Ignorant here, why will we feel about A380s the same way as the Concorde? What is happening to it?


Airbus have scrapped it.


I was driving past on the motorway and remember nearly crashing in shock at the aggresive maneuvering of something so big - first time seeing either an airliner do a display, or the A380.


Similar video with a test pilot pushing a 707 pretty hard. Includes a barrel roll, which isn't hard on the aircraft, but unusual to see with a passenger aircraft.

https://youtu.be/Ra_khhzuFlE

The 707 was pretty similar to the 737...same main fuselage dimensions, similar pax capacity, etc.


Tex Johnston, the test pilot on that flight, did that off his own back and left president of Boeing fuming as it really wasn't part of the plan.

I believe they fired him for it - though then hired him back after 24 hours and the realisation that it was a stunning piece of advertising.

https://www.seattletimes.com/seattle-news/60-years-ago-the-f...


Love the 4k footage.


Ah, well, it is from 1955.


All of that changed when McDonnel Douglas merged with Boeing.


Why do you pin it there?


When Boeing merged with McDonnell Douglas in 1997 essential McDonnell's management moved into Boeing's c-suite.

What essentially was a very engineering driven culture was replaced by finance and marketing geeks that very much moved away from this approach (and moved Boeing's headquarters to Chicago in 2001 and in the process management away from engineering and construction).

Sadly (and arguably) culminating in the two crashes we read about recently.


> quite spot on

> Nowhere in here is there a software problem.

I worked on the design of the 757 stab trim system, am trained in aerospace engineering, and am a programmer. I'm not a pilot. I am not explicitly familiar with the 737 stab trim system.

It is indeed at least partially a software problem. I had thought of two possible improvements: one is to limit the authority of the MCAS's commands to the trim, and the other is to not issue further commands if the pilot fights the trim commands. I read today that Boeing's proposed software fix includes both of those.


In that sense, I agree with you. I meant to say that this wasn't a software bug or programmer's fault like "oh, someone wrote max_authority = 6º instead of 0.5º in line 8745679", but something bigger: the very design and specs of the system and a certification process that didn't challenge those aspects.


I have to agree. If you're saying "the software is working to spec, but the specs are bad," then you're still describing a software problem.


> If you're saying "the software is working to spec, but the specs are bad," then you're still describing a software problem.

This is a pretty bold statement. The spec includes many non-software related features.


In aerospace, lack of proper requirements in the flight software would be a system engineering problem... a semantic difference really, but important nonetheless. System engineering and validation has a metric ton of 'tools' to catch bad system design and improper algorithm implementation...

I really question what the hell happened here: - Did they just pencil whip the FMEA on this or what? (failure mode and effects analysis)

- What happened with the Hardware-in-the-loop flight simulation when they tested the scenario where the AoA sensor givesg spurious data, both high and low (but especially high)? I mean... they did test this, right?


I would imagine having not one, not two, but three sensors would help (it would have redundancy and software could determine which one was malfunctioning)


What concerns me is the evidence of regulatory capture. Boeing was telling the FAA what the approval schedule was going to be. And the FAA sucked it up. Instead of saying "our review capacity is x pages per day. You will receive our reviews on such-and-such a date."

(And yes I am oversimplifying.)


Boeing has spent more money lobbying the US federal government over the last 20 years than any other company. Last year they had the government kill Bombardier's attempt to break the Airbus/Boeing duopoly. Boeing also prevented the US military from buying an Airbus tanker a few years back. Boeing has definitely captured the US government.


> Boeing has spent more money lobbying the US federal government over the last 20 years than any other company.

This is simply false. Boeing has spent a lot, but less than General Electric, or the National Association of Realtors, or many (many!) healthcare and pharmaceutical companies.

Source: https://www.opensecrets.org/lobby/top.php?indexType=s


While your source seems to suggest that my statement is false, a closer looks shows that the only corporation (I.e. not an industry special interest group like the national association of realtors) that spent more money than Boeing was General Electric. So I would say your statement is even more misleading than mine. General Electric has fallen off a cliff since 1998. I suppose my "20" years statement is false according to "OpenSecrets", but if you change my statement to "15" years, it is probably correct, since General Electric did most of their lobbying closer to 1998 than 2018.


> I would say your statement is even more misleading than mine.

Fair enough, I guess. I think of groups like Blue Cross/Blue Shield or the NAR as representing corporations, but I can see how you'd disagree on a strict reading. I didn't mean to mislead.

> General Electric has fallen off a cliff since 1998.

In 1998, GE spent $7.28m on lobbying. Nearly every year since then, they have spent more than that, generally staying north of $16m per year, with the exception of the last three years.

Your statement is true only for the last three years.

Same source: https://www.opensecrets.org/lobby/clientsum.php?id=D00000012... for GE, https://www.opensecrets.org/lobby/clientsum.php?id=D00000010... for Boeing.


I appreciate your fact-checking. I do not mean to mislead anyone.


What is the evidence for this claim? It certainly may be true, but I’d like to see evidence.


You are not clearly questioning me or the other responder.


Additional comment:

After regulatory capture, it is not clear to me that any subsystem reviews are valid.

MCAS is not the only subsystem revised (or introduced) for the MAX series.

All "FAA" so-called reviews need to be inspected for validity. (Quotes because Boeing not the FAA did the reviews). And recertification will take months.


I, too, have no words to communicate my astonishment.

A phrase comes to mind: "To screw up this bad you need an MBA." Which is not really intended as a dig on MBA's... but rather to indicate that this debacle has the feeling of business decisions overruling sound engineering design.

This seems poor system engineering on multiple levels:

- the whole airframe / engine thrust composite flight characteristic requiring an agressive AoA limiting safety... on a commercial airliner? what?

- The default MCAS relying on a single sensor, which seems to be occasionally quite unreliable, without sensor truth'ing based on other measurements (i.e. dividing ground speed by rate of altitude change to estimate AoA, then also figuring for control surface position feedback.

- The MCAS not having proper situational behavior: i.e. "I'm only 500 feet of the deck... maybe this not a great time to try to tip the nose down too hard... you know, in case that stupid sensor is wrong. Oh... and the pilot is also pulling back on the yoke trying to correct something I'm doing. yeah... my bad, MCAS will chill right now and assume the pilot is NOT trying to fly this thing like a stunt plane 500 feet off the deck".


Angle of Attack cannot be inferred from ground speed and vertical speed. It's quite possible to have a massively negative vertical speed (10000 feet per minute descent), and have an Angle of Attack greater than 45 degrees. Angle of Attack is strictly the angle of the airflow compared to the aircraft.


We could argue about this (technically, you are not wrong) but the point is, the aircraft MCAS should be able to do this robustly with available sensor data.*

Without going into a lengthy analysis... you can infer AoA from groundspeed, rate of climb/descent, pitch/yaw/roll, and position feedback of control surfaces. Recent information on these data points can be used to calculate present and near future flight dynamics. The fields of Kalman filtering, adaptive control, and optimal control as applied to aerospace engineering are decades old.

There are many examples where adaptive flight controls can fly an aircraft that has inherently unstable flight dynamics... In fact most modern jet fighter require computer controlled flight controls to maintain stable flight.

Applying these algorithms to an airliner for sensor truth'ing is a cake walk.

* I think you actually can infer situational AoA from ground speed and altitude data: An example you could build a regression model of standard take-off scenarios and using recent groundspeed, altitude change, and the rate of change of (groundspeed/altitude). You could build this off of flight test data and simulation. In practice, there may be no point to doing this because a wealth of additional flight sensor information is available.


This isn't exactly right. The aircraft has two sensors, but the system in question (MCAS) is only ever looking at one of them. That is part of what the software fix is reportedly going to address.


For a system with the responsibilities and impact on the safety-case that MCAS has, not even a 1oo2D design would be sufficient, since there's no clear way to determine which input is correct or incorrect if they diverge, but worse still is that MCAS doesn't seem to leverage even that basic, yet still insufficient, safety design from what I've been able to find on it.

At the end of the day the fail-safe is to throw control back to the human. And, then of course pin all the liability onto the human operator when things go wrong.

This was the problem I had with the Lion Air conclusion. Sure, it may be the case that a pilot can potentially override this system with situational awareness and training. However, the pilot didn't create this band-aid pile in the first place.


> For a system with the responsibilities and impact on the safety-case that MCAS has, not even a 1oo2D design would be sufficient, since there's no clear way to determine which input is correct or incorrect if they diverge

Wouldn't that depend on how often MCAS is actually needed, and how it handles divergent sensors?

My understanding as a layman is that there are situations where a 737MAX is more likely to stall than a "regular" 737 would, but I haven't seen anything on how often those occur. Are they expected to routinely occur, so a fully functioning MCAS is important for safe operation of the plane?

Or are they supposed to be rare, one in several million events, and the MCAS is there to make handling those events the same as they would be handled in a "regular" 737, to minimize the need for retraining?

If the later, then I'd expect two sensors would be fine, or even just one sensor IF there was a reliable way to tell if it had failed, as long as the MCAS response to a sensor problem is to disable itself and let the pilots know they have to stay out of those "more likely to stall" situations for the rest of the flight.


The problem comes in where the MCAS use should be infrequent, unless the sensor is borked.

In computing, we have a principle stated Garbage in, Garbage out. Circuits and electronics don't think. They compute. There is no error checking except that which is specifically designed and implemented into the system.

If you're getting data from a biased sensor with +/- 10 degrees AoA, a 10 degree actual AoA (well within the safe operating envelope) suddenly appears to MCAS as a 20 degree AoA (oh shit territory).

The system therefore engages doing exactly what it was designed to do.

Thus is the crux of the matter. The pilot was flying safely while his AoA sensor was telling a safety system he knew nothing about that he was flying dangerously.

The AoA system on earlier models of aircraft that the MAX was based on were a functional luxury/situational awareness aid. The flyability was not impacted by a horked sensor.

That changed once they had to add a software driven mechanism to keep the flight characteristics similar enough with the old airframe to be able to release the aircraft to 737 trained pilots, and not have to worry about retraining. Maintenance and pilot alike both needed to be aware that the AoA sensor became a safety critical component due to a failure or miscalibration jeopardizing the controllability of the airframe.

If they had gone through a full recert of this airframe, and not an expedited self-certify/grandfathering, these tragedies would have had a much smaller likelihood of occurring due to the increased scrutiny. Props on Brazil for doing their own footwork, and not blindly trusting the FAA's delegation to the manufacturer.

Personally, as a software engineer and quality assurance specialist, if I'd seen anything remotely like this come across my desk, I'd be raising hell, even if it meant yanking someone into a VP/C's office and giving them a dressing down for skimping on a cross-cutting safety critical concern, deadlines be damned.


his AoA sensor was telling a safety system he knew nothing about that he was flying dangerously.

Is that true with the Ethiopian Airlines crash? I can believe it was true with Lion Air, but that crash was so widely publicized that I'm surprised that any passenger airline captain had not heard of the MCAS system.


You need to realize this problem before it is too late. If there is full trim down when you realize it there might not be enough room left to offload the stabilizer and trim back (manually, no less). Afaik if you cut off stab trim you can only trim back manually by moving the trim wheels.

I'd not be surprised if there will be another problem with MCAS detected with the ET flight where the pilots had effectively no chance despite knowing of the Lion air crash.


yeah, I think you're right -- since you don't normally need the MCAS, you could probably get away w/ 2 AoA sensors and just disengage MCAS when they don't agree. you'd only get into trouble when you need MCAS and the AoA sensors are busted, which is a (rare) * (rare) event, so probably low enough odds to tolerate (i.e., probably never over lifetime of plane).

(just speculation of course.)


There are other ways of estimating the AoA. If you know the airspeed and the load factor, you can back out AoA to a reasonable accuracy. That's less precise than a dedicated sensor, but it should be good enough to tell which sensor is the good one if they diverge.


> since there's no clear way to determine which input is correct or incorrect if they diverge

In systems with dual redundant channels you don't try to determine which one is correct (unless it's obvious, like lost comms to one sensor), you just throw an error and stop trusting both sensors.


It depends on whether the system needs to be Fail-Safe or fault-tolerant.

But, yes. Most of these systems are not designed to be resilient to that class of issues, and just send control back to the operator.


That... makes it even worse? Why would someone on Earth disregard an available extra sensor for a system that has unlimited authority to bring the horizontal stabilizer to full deflection?


Yeah I don't disagree. I don't fly, but have sort of been following this out of curiosity. One comment I've seen from some pilots is basically "ehh, runaway trim isn't a new thing, we train for it in the sim, and there's a standard way to deal with it (disengage the automatic system and trim manually)."

So perhaps Boeing felt that this didn't really change anything in that regard. They seem to have been proven wrong.


Technically MCAS activation isn't runaway stabilizer trim, and I'm sure that tripped up the Lion Air pilots. Check out the quick reference for a runaway stabilizer[1]. Note the steps:

* Control airplane pitch attitude manually with control column and main electric trim as needed

* If the runaway stops, stop.

Well, electric trim input on the yoke will stop MCAS temporarily. Of course if these guys had any experience on an NG they're already used to the computers trimming the plane in a counterintuitive manner via the speed trim system (STS). So not only is MCAS not a runaway trim situation, but pilots flying the NG will get used to the computer trimming the stabilizer "at random".

1: http://www.737ng.co.uk/737-800%20Quick%20Reference%20Handboo...


This sort of MCAS failure presents itself to the crew as a trim runaway, and the instructions in the document continue:

  4 If the runaway continues:

      STAB TRIM CUTOUT switches (both)   . . CUTOUT

      If the runaway continues:

        Stabilizer trim wheel    . . . . .   Grasp and hold

  - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

  5  Stabilizer    . . . . . . . . . . . .   Trim manually
(That bit about grasping and holding the trim wheel was a surprise to me.)

This is what the Lion Air crew of the flight before the one that crashed did, in response to the same sort of MCAS failure, and completed the flight without further trouble. While Boeing made a serious error in hiding the differences between this variant and its predecessors, it seems that the prior trim runaway procedure does work for this sort of MCAS failure (unless evidence to the contrary comes out of the Ethiopian Airways investigation.)


Boeing is "right" in the sense that, if you follow the normal 737 flight manual, you'll be fine, even with MCAS enable or even MCAS misbehaving due to the sensor readings being bad or whatever.

I think the problem is that the symptoms are different from other 737s, and if you don't KNOW about the MCAS stuff, that's very surprising. So as humans, even though logically the same procedure applies, instinctively you don't think to do it. And given the very short time frames involved, and the margin of error, it's not too surprising that these events have occurred.

Perhaps the need-to-retrain criteria needs to include the following amendment:

If the craft exhibits new symptoms, even if the response to those symptoms are covered correctly be the old operations manual, pilots will need to be retrained to account for the new symptoms.


This sort of MCAS failure presents itself to the crew as a trim runaway

No, it doesn't and that's the problem. A runaway won't stop when you hit the trim buttons.


> A runaway won't stop when you hit the trim buttons.

Depends on what causes the runaway. If the autopilot goes berserk, the trim buttons will override it (if I recall correctly, it's been a long time).


No simulator accounted for a borked AoA sensor being able to nosedive the plane, or the altered aerodynamics from the different engine configuration.

So while yes, runaway trim is a thing, the circumstances under which an error could happen are substantially different.

For instance, a pilot could check the maintenance log book and see some work or inspection was done on the auto-trim subsystem. This would prime the pilot to be more on the alert for the possibility of trim misbehavior during the flight. Crisis averted, right?

But no pilot would look at an entry for an AoA sensor being off or worked on, and would think, "Boy, better look out for that safety system I never knew existed that could cause a trim runaway because my previously airworthyness agnostic AoA sensor is unreliable."

Flying and complex system diagnosis in time sensitive conditions requires extensive ahead of time mental model creation ahead of time.


It is weird, though, that this was not caught in simulations (not training simulations, but in design simulations.) I would assume that running simulations to see how the design behaves when sensors fail would be an absolutely standard thing to do. I don't work on airplanes, but we routinely simulate sensor failures and their effect on software systems and you'd think this would not be optional for passenger aircraft certification.


When all of your testing is in house, and you don't have people who are willing to put the project at jeopardy for the sake of doing it right, you'll be amazed what can get overlooked.

If anything, the process of engineering is most difficult in that answering most questions tends to be straightforward (not easy, but straightforward once you know the right methods to apply), while figuring out if you've asked all the right questions is the thing that keeps me awake at night. If you don't ask and dig in until you've answered it fully, it's easy to get blind sided.

I have difficulty believing it could have gone so horribly wrong now, but I can't deny that just from the information available, even if MCAS isn't the root cause of the Ethiopian Air crash, there is some egregious failures in sound practice going on with Boeing to have slipped up this badly.

That's the thing with sound Engineering, when you do it, it just verks. When you don't...


Key quote is training on a simulator

That's a tall order when the manufacturer doesn't inform their customers out of commercial expediency.


If you only have two sensors and you're taking input from both sensors, assuming each sensor has the same probability of malfunction then you're doubling the chance of processing bad input.

You could choose to select the input that results in the least-worst bad outcome. But that may not bring the total risk below the increased risk of utilizing both sensors. The sensors are doing something purposeful, after all.

So it may be logical what they're doing. OTOH, the logic may make sense only in a path dependent sort of way, after a series of bad decisions that put them in a corner with poor options. There probably really should be 3 or more sensors.


You take input from both sensors only when they don't disagree (by a margin). There may be failure modes where both have the same bad input but I don't think that this doubles your chances of processing bad input.


You have double the chance that one of the sensors is bad. Which one do you choose? The one that says AoA is normal? Maybe that's the one that is bad. Worse, maybe such a bad input is more likely in a stall scenario.

Without any ability to differentiate bad from good input, you may be better off simply relying on a single sensor at any one time and completely ignoring the other, at least as far as controlling MCAS is concerned. Tossing up a warning that says the sensors disagree is another matter.


I meant disabling the system if they disagree.

I agree that one can argue that this doubles the chance of system shutdown compared to properly detecting that one sensor fails. That's obviously only ok if the system is expendable to a certain degree.


Because there was no available extra sensor. Trevor gets a few other things wrong too.

The elevator feel system (EFS) uses multiple sensors, in fact it has its own pitot probes[1] (two of them) on the elevator. It's not clear if it uses both, but I suspect it also uses both alpha vanes up front like MCAS should be doing. Input from the alpha vane (AoA sensor) is new on the MAX, IIRC. His tweets were a bit ambiguous, but the elevator feel system does not change the trim at all (also EFS was non-op on the Lion Air bird).

It's almost certainly NOT a sensor problem. Look at the graphs from the black box on the Lion Air flight. The angle-of-attack was almost exactly offset by twenty degrees left to right[2]. That sensor was working just fine, but something else on the path was fucked up (bad wiring? bad ADIRU? both have happened before). The fact that the alpha vane was replaced and that didn't fix the angle-of-attack readings supports this. Also, the 737 NG and 737 MAX use identical alpha vanes. While the angle-of-attack data is less important on an NG, we haven't seen people come out of the woodwork talking about how common alpha vane failures are on an NG. I'm inclined to believe that they're not all that common.

The Lion Air flight crew did write up the sensor problems[3].

I saw another one of these "I'm a software engineer so I've got great insight" posts[4] and chuckled. It's a bit of swing and a miss as well. MCAS is hardly silent or subtle. The same trim wheels his Cessna have are present on the 737 and they're noisy[5]. MCAS doesn't quite override the pilot either, and you can see that on the black box graphs from the Lion Air flight[2]. Twenty times the pilot flying entered opposite trim to counteract MCAS and twenty times MCAS paused for five seconds. On the twenty-first time something changed and the plane crashed. It's not quite that simple and that's why there are active investigations.

1: https://4.bp.blogspot.com/-ZsL1asAUBRU/W-pSdWs2dYI/AAAAAAAAF...

2: https://reports.aviation-safety.net/2018/20181029-0_B38M_PK-...

3: https://theaircurrent.com/wp-content/uploads/2018/10/jt43-jt...

4: https://medium.com/@jpaulreed/the-737max-and-why-software-en...

5: https://www.youtube.com/watch?v=3pPRuFHR1co


I’m the OP Dave BTW. Trevor copied a Facebook post I made a couple of days ago into those tweets. I don’t know all of the details about the EFS system but I’d be pretty sure it doesn’t use data from both sides for anything, because that would be against the brick wall isolation concept in the air data and flight control computers.

You say that the MCAS is neither silent or subtle, but there is no light or master caution or anything, just the trim wheel turning and clicking. I’m not sure you’d notice it if you were wearing a headset and were being distracted by the stick shaker.

The reason they died on activation #21 and not #1 is that the effects are cumulative. Every time it activates it dials in another 2.5 degrees of nose down trim. Eventually you can’t pull on the yoke hard enough to hold altitude.

I agree that the jury is still out as to whether it’s a sensor issue per se or some sort of ADC or data transmission issue.


The reason they died on activation #21 and not #1 is that the effects are cumulative.

The stabilizer position was graphed on that preliminary report (see: pitch trim position). There was a slight trend towards nose down pitch, but no, at the time they lost control there's a sharp decrease in pitch that doesn't correlate with MCAS. If you look at the "trim manual" line you'll see that the pilot flying was continuing to input nose up pitch on the switches to no effect. Something changed and it wasn't simply that the pilot was overridden by MCAS. After 21 even MCAS continued to counter the pitch up inputs as the stabilizer continued to contribute additional nose down pitch.

I agree that the jury is still out as to whether it’s a sensor issue per se or some sort of ADC or data transmission issue.

To be clear I highly doubt the alpha vane was defective. It's a possibility, but look at that graph. The left "angle of attack indicated" mirrors the right one almost exactly except that there's a near constant twenty degree offset. If that's a sensor problem in two separate sensors I'll pull a Werner Herzog and eat my freaking shoe.


I totally agree here. Im an AME with Max Licence and i would have to say. Im quite sure this wasnt an AOA sensor. The FCC thats active with MCAS as its function (both have this Function) would have been looking at the right vane on one flight and then the left vane on the next flight, however on the last 3 legs the fault was the same. And trust me on this one. May have been different indications bbut all pointed at the same fault. I suspect the FCC or ADIRU was at fault or something it aircraft installation that was inducing the 20 degree difference between the 2.


I don't believe there is a brick wall design here since the FCC outputs are not compared so why would you isolate inputs. I believe it's just a bad design to use the single AOA input.


You say that the MCAS is neither silent or subtle, but there is no light or master caution or anything, just the trim wheel turning and clicking. I’m not sure you’d notice it if you were wearing a headset and were being distracted by the stick shaker.

BTW the trim wheel has little stripes on it, moves quickly when being operated automatically, and is fairly noisy. It will get your attention. However, my suspicion is that pilots experienced on the NG will mistake the trim wheel doing counterintuitive things for normal STS operation (especially after takeoff). I don't know if the Cessna trim wheel(s) are any quieter or more subtle, but the Cessna certainly doesn't have all the band-aids that a 737 NG or MAX would.

To this point here's a post from a 737 NG pilot[1]:

As a long-time 737 driver I'll just chime-in a few points

[...]

(1) Just after takeoff there is a lot going on with trim, power, configuration changes, and as noted above, the darn speed trim is always moving that trim wheel in seemingly random directions to the point that experienced NG pilots would treat its movement as background noise and normal ops. Movement of the trim wheel in awkward amounts and directions would not immediately trigger a memory item response of disconnecting the servos. No way.

(2) The pilots could very reasonably not have noticed the stab trim movement. Movement of the stab trim on the 737 is indicated by very loud clacking as the wheel rotates. On the -200 it was almost shockingly loud. On the NG, much less so. HOWEVER, the 737 cockpit is NOISY. It's one reason I am happy to not be flying it any more. The ergonomics are ridiculous. Especially at high speeds at low altitudes. With the wind noise, they may not have heard the trim wheel moving. The only other way to know it was moving would be yoke feel and to actually look at the trim setting on the center pedestal, which requires looking down and away from the windows and the instruments in a 'leans'-inducing head move. On the 717, for example, Ms. Douglas chimes in with an audible "Stablizer Motion" warning. There is no such indication on the 737.

[...]

Finally, runaway stab trim is a very, very rare occurence up until now. We trained it about once every other year in the sim because it is so rare. And when we did it was obvious. The nose was getting steadily heavier or steadily lighter with continuous movement of the trim wheel. That is a VERY different scenario than what these pilots faced.

We also trained for jammed stabilizer, the remedy for which is overcoming it with force. The information they were faced with could very reasonably have been interpreted that way, too.

1: https://www.pprune.org/rumours-news/619272-ethiopian-airline...


One thing I’ve seen kicking around is that MCAS alternates input between the left FCC and right FCC, which might help understand why the JT43 only had to fight the stick shaker the whole flight (!) but JT610 had the automated trim input as well.


serious question: which sensor has failed?

you need at least three sensors. (although the "sensors disagree" option you can purchase is better than nothing)


So, a modern aircraft probably have a few thousands sensors (if not way, way more), not strictly redundant but overlapping in some fashion -- I have to believe there's been some modelling trying to check that sensor input is consistent; if only to have some kind of way for the computer to figure out that a sensor has gone crazy (say your compass suddenly detects a significant change in heading, but no sensor has detected even a degree of roll, a change in speed, heading computed by GPS over time is stable... compass might be borked, whaddya say?)

Why isn't such modelling used to let sensors be disabled when they're clearly sending bogus input?


My guess is that such systems are complex, hard to model for safety assurances, and hard to reason about in the moment.

Yes, it will probably make the system safer, but your guarantees become a lot more fuzzy. This is not the field for fuzzy guarantees.


> Why isn't such modelling used to let sensors be disabled when they're clearly sending bogus input?

You need to crash a lot of planes to train that NN.


there should be at least 3 AOA sensors - so you can not only detect a mismatch, but figure out which sensor is most probably incorrect.


We will see. Part of the avionics design is that there is a “Brick Wall” between the two sets of sensors, air data computers, and flight control computers. If the FCC uses both for MCAS, that safety concept doesn’t exist anymore. That is certainly the reason they didn’t compare the AoA sensors in the MCAS logic. I think they could probably use pitot/static as a cross check on AoA, but it has some tricky edge cases.


Reports say it alternates each flight between them; if that's correct, in the case of one good sensor and one bad, that you wouldn't see back to back flights with anomalous behavior related to a bad sensor. Otherwise, perhaps the problem isn't with the sensor itself, but with some part of how sensor information is communicated to the interpretive computer.


AIUI the entire FCC only uses one set of sensors and which set of sensors is used is switched together with the FCC?


I'd go further.

Use of software to make a plane airworthy when it isn't natively airworthy is a problem. Civilian aircraft should have a different standard than fighter aircraft: one that allows graceful degradation under failure.


Being a total layman in terms of aircraft design -- I've always wondered why they weren't built more akin to distributed systems. Instead of 1-3 large fuel tanks, have hundreds of small ones, so losing one is not a disaster. Similar with hydraulics and electronic wiring -- instead of a single cable running along wings, why not a redundant network that can survive everything short of a wing being sheared off?


The things you've mentioned are likely to have significant downsides specifically in terms of weight and complexity


The A-10 Close Air Support aircraft manages to survive in exceptionally stressful circumstances. The flight control systems are triply redundant: two independent hydraulic systems, and a set of cables if both hydraulic systems fail.

Having a second angle of attack sensor on the Boeing 737 MAX (which is critical to aircraft safety) be an extra cost option is unconscionable.


They are built with extensive redundancy. Airbus A320s have 5 separate computer systems, programmed by 4 different teams, for aircraft control, of which you only need one to fly the plane.

See https://pdfs.semanticscholar.org/d4ac/17fcc0db3396a3219785bd...

It's a question of what design choices you make. For instance Boeing only has three in its comparable 777 fly by wire system.


That's pretty much the case. Fuel tanks are split into multiple smaller tanks in the wings, and most systems have one or multiple redundancy (multiple hydraulics systems, multiple flight computers, multiple black boxes, graceful failure protocols, etc.)


The answer is economics - complex systems are expensive to build and maintain, and the air travel industry is notoriously low-margin. I don't see travelers accepting higher ticket prices because the airline bought (costlier) airframes with more redundancies. Any airline that attempts this will be undercut by those that do not.


How do you fill hundreds of small fuel tanks?


You can link them with one-way seals, make them fireproof, etc. But I don't think you need hundreds. The A-10 example is great, that's an example of a plan that's designed to be shot at from close range and still fly.


Same way, with more pressure. They have to be connected anyway. (Honeycomb pattern, probably.) The real question is how to seal them on failure without huge weight cost.


Yes, was just about to post that exact section before I saw your comment. I've been following this story fairly closely, but this was news to me. I've seen other discussions that were along the lines of "We're not 100% sure why the planes crashed, so now that they've been grounded without a clear root-cause understanding it may take a long time before they are certified to fly again because we don't know exactly what to fix."

I call BS on that line of thinking. At the very least it seems like an easy decision to require redundancy in the AoA vane before letting the planes fly again.


Airbus A320 used to have GPWS (Ground Proximity Warning System) as an optional extra. A budget airline didn't order it, and quite possibly encouraged pilots to make excessively fast approaches, and led to a CFIT (Controlled Flight Into Terrain) crash of a perfectly good jet.[0]

Another crash related to the 737 MAX crash, is Scandinavian 751 [1]. On a MD-81 airliner, with "Automatic Thrust Restoration", A system the pilots were not trained on, caused the thrust to be increased, when the pilot tried to throttle back the damaged engines and clear the compressor stall. "Pilot Betrayed" was the Air Crash Investigations episode.

[0] https://en.wikipedia.org/wiki/Air_Inter_Flight_148

[1] https://en.wikipedia.org/wiki/Scandinavian_Airlines_Flight_7...

[2] https://www.imdb.com/title/tt1862850/


Not quite as egregious, but Amazon will sell you an Amazon Store Credit Card and then continually pressure you into buying the account protection.

Giving away the liability and then up selling the cure is immoral.


Wait what? Aren’t there consumer credit laws that offer the protection you need?


It's probably insurance that pays your outstanding balance if you are badly injured or get a terminal disease or something. It's usually nearly impossible to claim and therefore a scam.


(I originally wrote what my Brother in law posted on Twitter) I believe that the package is for Cat III hand flown landings. I think that to do that, you need AoA on the HUD, and in the event of an AoA failure you have to go around. It is strange for sure that you need AoA consistency checks for that, but not for MCAS.


For anyone who is old enough to remember when airbags first came onto the scene in cars, they were sold as optional packages.

Most car manufacturers called these "safety packages". These optional packages were sold basically in the same way as you jokingly mentioned above. "Buy this optional package and when you hit a tree at 50mph, this will keep you from flying through the windshield and will probably save your life."

I actually remember a conversation that my parents had with eachother when I was young about whether it was "worth it" to buy airbags in their new car. Yes, that was a REAL conversation that Millions of people had and a decision they had to make for nearly 25 years before airbags became standard in cars in the late 90s.

Lots of people felt that the optional saftey package weren't worth the money and many people died through the '80s and '90s from car crashes that they would have survived had they splurged for the "safety package" when they purchased their car from the dealer.

It turns out that when people are buying cars, they aren't planning on crashing them. So spending extra money that only helps them when they crash felt unnecessary. These packages could cost in the neighborhood of $2,500 extra. Which on a $20,000 car is over 10% extra! Would you pay 10% extra for a safety package if cars today didn't come with standard safety equipment?

This is what led to the USA making airbags mandatory in 1998, forcing this safety feature on manufacturers in order to save lives.

I imagine the 737 AoA sensors were sold as the same thing. Boeing sold this as a "additional redundancy package" or "safety package" and probably included many other features as part of a larger package.

Just like thrifty car buyers of the 70s, 80s, and 90s who figured they didn't need airbags in their cars and opted out of the safety package. The budget airlines of Lion Air and Ethiopian Airlines had a good history with the normal 737 planes and figured this additional safety package wasn't worth the money.


You do understand that a single light in an aircraft doesn’t cost 10% of it’s total budget?

That’s like saying that car manufacturers decided to add a light saying their ABS is disengaged depending on whether you paid for it.

That’s more like it.


> "The fact that the redundancy of a sensor on which a system capable of sudden, large control inputs relies is an optional package to be purchased separately... I simply have no words."

As far as I'm aware, all Boeing 737 aircraft have 2 AoA sensors. There isn't an option to add a 3rd redundant one like the Twitter poster suggests.

From what I understand, what is optional is the display of the AoA on the main screen and/or HUD, and the "AOA DISAGREE" warning which cross-checks the data from the 2 sensors.

(Airbus A320s have 3 AoA sensors. A380, A350, and Bombardier C-Series, aka A220, have 4.)


I got this in the other way: there is no actual redundancy even with the optional package, the new sensor is only used to compare inputs with the one used by the control system and warn the pilots if they disagree.

Pretty sure I'm misinterpreting things (I hope!).

In the other hand, if there are two sensors, what the control system does when they disagree and there is no hardware to show this to the pilots?


Right. The optional packages don't add additional sensors and don't add redundancy. But they may improve situational awareness by drawing pilot's attention to a sensor problem faster.

The MCAS system, apparently, always takes its input from one of the two AoA sensors - it alternates each flight. Lots of details here: http://www.b737.org.uk/mcas.htm


I'll also point this out "Unlike the EFS system, MCAS can make huge nose down trim changes."

That's a potential reason why an unaware pilot wouldn't disengage the auto trim system. Because he wouldn't think the trim system as being capable of causing an extreme pitch down.

Also this response.

> Interesting. So how did Boeing get a plane without redundancy in a critical sub system approved?

Answered own question, by hiding it in what is normally a non-critical subsystem.


Can I first correct you on two major errors in your statement. TWO AOA vanes are NOT an option on any aircraft. Every aircraft have two vanes as standard. the option is for a small AOA position indicator icon on the top corner on the main screen for the captains side. also there is no AOA disagree light. This is an amber AOA disagree text that appears on the screen. I can only say that on my airline this is standard as it was on the 737 NG in our fleet.


One would hope MCAS would not do anything if the AoA input was known bad. This looks really bad for the company. I'd be asking for FMEAs and/or FTAs on this, what the conclusions were, and how they intended to mitigate this exact failure - because it's got to be in there.


If you've only got 2 sensors and you aren't even reading from one of them then "bad" input is hard to know. You can do nothing if they both disagree but you can't actually know which sensor is bad and which is good without a tie breaker.


You don't need to automatically fix the disagreement. You just need to stop taking nose-down action and inform the human in charge .

Problem is, if you need to inform the human, you also need to train the human, which leads to extra costs.


Who knows but it the chatter makes it sound like the system doesn't have redundant sensors. You have two systems with one sensor each. One system is active and the other is inactive. And no voting logic.


Read somewhere a few days ago that Southwest, operator of maybe the most 737's in the world, didn't like the MCAS on the MAX and that extra "package" was essentially customizations/modifications created specifically to make them happy.


Might as well make the trim wheel or flap control optional packages as well at that point.

Except they can't because all pilots know about those since day 1 of their career as opposed to the MCAS sensor-software system designed to deal with airframe engineering issues.


> What I find downright criminal is this

Not much different than how with most car companies, the only way to get headlights that don't blind oncoming cars is to pay to upgrade to leather seats. C.f.:

https://www.iihs.org/iihs/news/desktopnews/night-vision-head...


Safety as an option is pretty standard in aviation, unfortunately. For example, collision avoidance systems have been in development since the 1950s, but TCAS remains only partially required to this day.


Stuff like this is usually triple redundant


Could it be an ETOPS requirement?


How is this different from extra safety features available as option packages for automobiles?


If my airbag is missing my car won't crash into traffic by itself, if the airplane system fails the airplane points you into the ground.


No, but, say you don't pay for the automatic breaking system. You may rear end someone that the car might have otherwise saved you from.


The difference here is that MCAS is an automatic feature that Boeing needed to mimic the existing 737 flight characteristics. They Created the need for the automatic system in the first place, then sell the safety redundancy as an option.

To extend (torture?) the automotive analogy, it would be as if the manufacturer substituted a new braking system that required preheating the rotors, installed software to automatically ride the brakes for the first 2 miles of driving in order to get that preheat done, then sold an optional safety feature to verify that the brakes were at the proper temperature.

The old way worked just fine without need for “emulation”. The new airframe required some software to mimic the old one, but they decided to charge you for the add-on to make sure that their band-aid worked reliably.


Again, yes, you may rear end someone, but this is not the correct analogy here. In this situation, the car rear ends someone while you are fully breaking because you didn't pay for the automatic breaking upgrade. This is a fail.


But here it was: pay to know if the automatic braking system will speed up the car for himself every time you first press the brake and then stop pressing it as you see that it's seems to be better:

https://www.nytimes.com/interactive/2019/03/13/world/boeing-...

Look at the charts above. Both pilots "fought" with the controls, correcting the plane course, then believing "it's corrected now" and then the plane sunk again and again.


Most of the safety features that keep the car out of an accident (ABS, ESC, rear cameras, TPMS) are mandated by law in the EU and USA.

Yes, lane-keeping and auto-braking are not mandated. I expect they will be within a few years.


Somewhat related to this thread and topic ... I'm driving a rental car right now and its the first time I've driven a car with a lane-assist feature (sensors help steer car back into lane when drifting out of lane). In France you are supposed to give 1.5m margin when passing a cyclist so I drifted left and out of my lane to pass - and really freaked out when the lane assist feature kicked in and steered me aggressively back toward the bicyclist...

I think I could have avoided this situation by turning on the left-turn signal to disable the lane-assist and generally its a good idea to use the indicator even when just passing a bike -- but that's not something I normally need to do to avoid this particular failure mode ...


> I think I could have avoided this situation by turning on the left-turn signal

That's exactly what you had to do according to the regulations.


People shouldn't drive drunk, but disabling the airbag when the driver is drunk is not acceptable.


Well, LKA would actually help to stay in lane while drunk, no?

That thing has a good default that is still overridable by the brute force (force that's less than the force needed to steer if the power assist fails altogether).


My point is that unexpected hazards when people break regulations is a problem. Because people will break regulations.


Yeah, that's the way it works on my dad's Subaru.

The adaptive cruise control in my VW actually gets confused sometimes. Drove through a puddle at the beach last year and it jammed on the brakes. Glad nobody rear-ended me.


As soon as an auto safety feature is proven effective for 5+ years, it’s made mandatory.


Rate 1/5 stars. App is crippled without in-flight upgrade.

SMH


I really wish people would wait for the report before drawing conclusions like this. These investigations take a long time, and it's often not the issue that gets circulated on Twitter.

AirAsia 8501 was widely suspected to be caused by a thunderstorm. Wired [1] and WaPo [2] still have articles up blaming the weather. When the investigation came out a year later, it turned out to have nothing to do with weather. The fly-by-wire system malfunctioned and the pilots got confused.

[1] https://www.wired.com/2014/12/airasia-qz8501-thunderstorms/

[2] https://www.washingtonpost.com/news/capital-weather-gang/wp/...


> The fly-by-wire system malfunctioned and the pilots got confused.

The Wikipedia summary of the investigation report sounds quite a bit different.

It says there was an intermittent malfunction that could be cleared by following a procedure, which was done three times during the flight, with no impact on flight safety (AIUI). The fourth time, instead of following procedure, the pilot toggled the flight computer's circuit breaker, which he is not allowed to do in flight, which reset the flight computer completely, disabling various automated systems that they would now have to re-start, which they did not do. Then the plane entered a stall and due to communication issues pilot and co-pilot gave contradictory control inputs which resulted in no control input to the plane.


The analogy does not make much sense because the majority of what is in this twitter thread is not new information or disputed. We also know that boeing is fixing it by a software patch already.


Eh, we still don't really know if MCAS is the cause of the Ethiopian crash, though. Some things point to it (flight fluctuating up and down, jackscrew found with full nose down trim), but some things are different, too, like crazy acceleration and handling issues right from takeoff, when the flaps would still be up and MCAS inactive.


Even if it Egyptian Air didn't crash their plane, Lion Air investigation alone exemplifies systemic negligence, not from the software standpoint, but from the top-down executive level and the negligence of the FAA.

So, your point is valid about that we need to wait for Egyptian Air's investigation, but misplaced because of the aforementioned argument.


*Ethiopian


All we know is that they're issuing a software update. We don't know that this update actually addresses what caused the crashes.


...or that the crashes are related. Too many people are skipping the step where the cause of the second crash is actually determined.

This twitter thread, in particular, is just summarizing information available in news articles, and leaping to the conclusion that crash #2 is the same or related to crash #1.


The report for crash #1 isn't even out yet. Let alone #2, which hasn't even had it's blackbox fully analyzed.


And how exactly does someone reading this twitter thread know that?


What do you mean? The entire thing is conjecture. We just don't know if there was even an issue. It's possible that the two accidents were uncorrelated. It's possible that the software worked perfectly but the pilots made wrong decisions. It's possible that the software worked perfectly but pilots panicked because they were unaware of what was happening. It's possible the software was at fault .. and on and on ...

He also claimed that there was a sensor problem. Are we sure about that? I have a hard time believing that a critical system would rely on one sensor to function properly. These things are built like space shuttles - with multiple redundancies and a fail-over strategy.

At the end of the day, we just don't know. Smart people are trying to figure it out, and we'll get to the bottom of it. Let's just wait a bit.

>Boeing sells an option package that includes an extra AoA vane, and an AoA disagree light, which lets pilots know that this problem was happening. Both 737MAXes that crashed were delivered without this option. No 737MAX with this option has ever crashed.

Jesus. What an atrocious statement.


Wanting to make a better and cheaper jet isn't at all controversial, though the twitter thread seems to try to make it appear so.


We still don't know whether the pilots reacted properly or not. We don't know even if they had the time to react properly. These tweets are written with some assumptions, but correct, in general.


This is a "Twitter sucks" off-topic rant comment, so if you're not interested in that, just move along. At no point in my reading on this topic or any other, did I say to myself "Boy this thing would be great if it were broken up into a series of small brainfarts and served up one at a time on a bloated, slow-as-molasses web platform." I'm embarrassed every time someone tries to express a complex thought on Twitter. It's like a machine that turns your thoughts instantly into listicles. And every time I go view something there, I'm astonished all over again at the dismal user-experience people put up with in exchange for "access" to a "network." (Facebook is worse... it looks like some crap I built for my big-company employer. sic. I'm not much in the front-end department, so yes, my UI sucks balls. But my users have to use my app, and they get paid for doing so. Facebook users, I can only weep at the thought. But I digress. This was supposed to be about Twitter.)


I actually originally posted this on Facebook and then my brother in law broke it up into tweets and posted on twitter.


I'll take a contrary opinion -- forcing each thought into a tweet is a nice constraint that compels people to get to the point. This would probably be less well-written as, say, a Medium article.


I actually agree with that - it's an interesting exercise to communicate concisely within a limit. (Sort of like Vine, which Twitter destroyed, but oops there I go getting smart-assy again.)

It's just, if that's your game, stick to the game, don't cheat by sprawling across 14 of those. It fails as an instance of the Tweet artform because cumulatively it's too long, and it fails as a longer-form piece because it's all broken up.

If I strung together 1780 Vines to make Fellowship of the Ring (and yes, nerdily, the math works out there), what have I proven? My powers of conciseness and economy? My respect for rules and limits? My ability to choose the right tool for the job?


Well yeah the point of medium is sound like you're giving information when really you're just making yourself sound smart. So if this were medium Wed never actually talk about what caused the crash and instead spend an hour going over why you NEED to use some esoteric library.


https://threadreaderapp.com/thread/1106934362531155974.html

And whilst I'm not a Twitter fan or user, on Mastodon I've found the practice of writing in <500 character chunks, posted publicly (and hence not easily revisable) is an interesting and useful writing vibe. In part because feedback can be specific to a given chunk, letting me see what resonates, or doesn't, what communicates as intended, or not.


Hear hear. This was almost painful to read. Normally I just close twitter streams as soon as as I realize it's Twitter, but this one was interesting enough to try reading. I wish LJ didn't die or people would adopt Dreamwidth or similar "normal" blog platform. Twitter is simply horrible in UX.


Often times as a Software Developer I encounter a bug which has an obvious two line fix. Rather than implementing that though I often spend another few minutes digging into how and why that bug was introduced. Often times I'm left with a greater understanding of the problem or encounter a requirement that the previous developer was trying to implement that my fix would have broken.

Other developers will simply assume the previous developer was an idiot and bash in the fix.

I feel like in this case a lot of people are assuming the engineering team were idiots, or criminally trying to make an aircraft which didn't pass safety standards. Rather than taking a look at what caused the bug in the first place.


I deal with this constantly. Someone gets a bug report for a crash, let's say a null pointer dereference, so often I see:

> if (pPointer == nullptr) { return; }

> Crash is fixed!

I mean sure... but that's not the problem. Why was pPointer null here in the first place? So few people take the time to understand that :(


>So few people take the time to understand that :(

Because "fix this null pointer exception" is ticket number 14 this week, and your PM just wants it checked off. They don't want to hear that you need another week of digging through layers of spaghetti to track down the source; that doesn't bode well for their KPI goals.

This is a systemic issue in the way software companies function.


We can blame management but sometimes the developer just doesn’t give a f* or it doesn’t fit their agenda. I guess both are ultimately management issues, but it’s a shared responsibility.


Ultimately it doesn't matter if your engineers don't give a fuck if management will sabatoge their efforts when they do care. Only if you have management who won't forgo quality for ticks can you really blame the dev.


OMG, yes. Every time I see this in a code review I go "why is this receiving a null here?" And 90% of the time I get back "dunno, I saw null pointer exceptions in the logs"


Typical. Fix the symptom instead of the cause.


Sounds like a failure of the type system...


>Often times as a Software Developer I encounter a bug which has an obvious two line fix. Rather than implementing that though I often spend another few minutes digging into how and why that bug was introduced. Often times I'm left with a greater understanding of the problem or encounter a requirement that the previous developer was trying to implement that my fix would have broken. >Other developers will simply assume the previous developer was an idiot and bash in the fix.

That is exactly the case of Chesterton's fence (JFYI):

https://en.wikipedia.org/wiki/Wikipedia:Chesterton's_fence


It sounds additionally like the bug, caused by the issues presented in the tweets, made it past the usual safeguards and into production because of an abnormal certification process: https://www.seattletimes.com/business/boeing-aerospace/faile...

So there were failures at almost every level it seems.


> So there were failures at almost every level it seems.

They aren't failures, they were designed in. There are forced at work, since the 80's, that have been trying to kill effective government institutions because they are seen anti-free market (the exact quote was "My goal is to cut government in half in twenty-five years, to get it down to the size where we can drown it in the bathtub."[1]).

This is the end result, as are E-coli outbreaks, etc. The government is being made ineffective so people can stand and point "see, I told you so, and we need to move this to the private sector were it will be done properly."

[1] https://www.brainyquote.com/quotes/grover_norquist_182534


To be fair there has been plenty of incompetence in the government. I mean as this or the 2008 financial crash shows us - total free market is folly, but there's some back and forth discussion to be had on how much government is good.


Wow... thanks for the link. The details are highly troubling. They assessed the safety of the MCAS based on a statement saying it had max. authority of 0.6º, being thus permissible to rely on a single sensor for this system. But in fact it had an initial authority of 2.5º, which is already pretty high by itself, and then that authority increased after each activation until full stabilizer deflection. That they implemented such a high authority system without triple sensor redundancy and without briefing the pilots is just insane.


Yeesh. If true, it seems like it should have been a 3oo4 system, with added heuristics for handling soundness of the sensor inputs, and with easy override defaults (e.g. control surface override) and auto-disable on disagreement.

What I don't get is how this drifting compounding boundary condition wasn't caught during formal modeling?


We don't do enough adversarial testing.

We are increasingly reliant on systems that have poor closed loop behavior with little to no memory. We need to design systems that are redundant, that have short and long term memories and have domain knowledge to compare their internal understanding of the universe with. And these autonomic systems need to alert the operators on what they are doing, why they are doing it and how to turn it off.


> Other developers will simply assume the previous developer was an idiot and bash in the fix.

I think it’s worth avoiding working in teams with devs that do that. It’s a nightmare.

The only excuse is an inexperienced dev, and this should be picked up during code review and they are told why it’s a bad idea to chuck in fixes without considering surrounding code.


I haven't seen anything blaming engineering for these problems, although I'm not saying those comments aren't out there. I think the impression I get is that Executives and Regulators are to blame. Engineering generally isn't in a decision making capacity, they are given problems and provide solutions in a pretty compartmentalized role. Sure they could have resigned if they had known what it was leading to (and perhaps that happened) but that is about the extent of pressure they can exert in these situations.


More information about the MCAS than you probably ever wanted to know: http://www.b737.org.uk/mcas.htm

That page includes this noteworthy and unusual design decision:

"MCAS is implemented within the two Flight Control Computers (FCCs). The Left FCC uses the Left AOA sensor for MCAS and the Right FCC uses the Right AOA sensor for MCAS. Only one FCC operates at a time to provide MCAS commands. With electrical power to the FCCs maintained, the unit that provides MCAS changes between flights. In this manner, the AOA sensor that is used for MCAS changes with each flight."


> the AOA sensor that is used for MCAS changes with each flight.

My first thought was that, in the Lion Air case, it happened both on the crash flight and the one before - but an attempt was made to fix the problem between flights, so the FCC may well have been powered down (alternatively, maybe both senors were faulty.)


One of the trends I find most disturbing in business over the last few years is the nonchalant passing of the buck on hard business problems, down the food chain to software engineers.

The Silicon Valley mantra of "software can change the world!" has infected every corner of our lives but frequently people misinterpret this as "software can solve anything! (so I don't have to)".

Software engineers also tend to eagerly say "yes" to solving every problem with code, when sometimes a problem just can't be solved with code. Thus compounding the issue.

I'd argue that many of the macro problems in our world right now stem from this cycle.

My PSA to all devs - if someone asks you to patch a major business problem with software, push back. Sometimes a puzzle to solve, isn't your puzzle to solve. Send it back up the food chain. You don't have to say yes to everything.


That's easy enough to say while talking about crashing airplanes. Harder when your H1B or family's dinner relies on you keeping your job.

I think everything keeps pointing to more punishment for management and corporate decisions. I mean management doesn't really do the work, they should at least be responsible. Otherwise it's just a system to attenuated blame.


The failure of the MCAS system does not indict using automatic controls to adjust the flight envelope of the airplane. Lots of systems do that already:

1. The autopilot

2. The feel computer

3. The device that reduces elevator authority at high speeds

4. The stall stick pusher

5. Hydraulically boosted controls

Modern jets would not be flyable without these, and the net effect of them is to make the jet much safer.

The failure of the MCAS system does not indict the purpose of the MCAS system, either. The problem with it was it continued operating with a failed sensor.


"Hey, Bob, we need you to write the software for this system. It's based on one, non-redundant sensor and can move the elevator trim to an extreme position. Sound good?"

"Sure, no skin off my nose."

Isn't software engineering a wonderful field to be in?


Except that's not really how it works

Bob to team: "What should happen when the two AoA sensors disagree?"

Team: "We should alert the pilot"

Manager: "We can't alert the pilot because the manual will need to change. What if we only use one sensor?"

Bob: SMH

Team: "That's not a good idea. We need redundancy."

Manger: "Well, we're not alerting the pilot. Use the one sensor."

Bob: Writes the code to use only one sensor.


This shouldn't happen in a vacuum. Who's writing the requirement? The test case?



Just in case anyone is wondering why more efficient engines are bigger: the energy is quadratic in speed (mv^2/2) while the momentum is linear (mv). For a given amount of energy (which comes from burning fuel) you can choose to push the airplane forward by pushing air back in 2 ways: 1. less mass, more speed, 2. more mass, less speed. It turns out 2 is better, for example you can push 4 times as much mass for half the speed, which results in twice the momentum of the air pushed backwards. Now the amount of air you can push is the amount of air you can get, and that's proportional with the front area of the engine. So, you always want to have as large an engine as possible. Bonus: the larger the engine, the slower the air moves through it, and so the less noise it produces. When you read that engines have become both more efficient and more quiet over the years, the second part was just a nice side-effect of the first.


This is one of the reasons I don't poo poo hybrid electric aircraft. With electric you can drive two or more fans off one turbine. Which allows you to increase the bypass ratio. As you mentioned the gains from that are quadratic where the efficiency penalty is linear.

Notable is using larger diameter high bypass ratio engines is what lead to the 373-MAX design compromises.


Sure, it’s a system failure not strictly a software failure, but I don’t think the Boeing software engineers are off the hook here. Software is where the whole system comes together. Software is what can mitigate sensor failures. Software is the top of the stack that gets certified for reliability.

A good safety culture will not have even a whiff of a “not my job” attitude. The software team should never have signed off if they noticed that a single sensor failure could cause their “correct to spec” program to crash the darn plane (if that’s indeed what happened).


I think when people say software error it is in a general sense. It mean the problem is in the software as opposed to hardware. The are different types of software errors. A software error can be a coding error or a bad requirement. In the case the requirements are the issue. In my career we had safety, systems and software engineers. An experienced software engineer might have challenged the requirement in this case but the design safety would fall more on the system and safety engineers.


I think a lot of the discussions are missing the point. The mcas system itself is indeed just a duct tape for a known design defect, ie using a new engine on an old body. It is like you replace a part in your car, find it over heating, and put an ice bag on it. The planned software “fix” is something like changing the volume of ice. I think it is a dead end and it is scary.


>ie using a new engine on an old body.

Would you ever be surprised if an old car got brand new tires? No? Then why do you find it so surprising that engine manufacturers would build new engines for existing airliner designs?


That's more like fitting oversized wheels / tires that will rub into the well / body every time you hit some proper bumps. Sooner or later they will fail, spectacularly.


Obviously the analogy breaks down once you start unpacking it.

Question to you though, what makes you so sure that this is in fact what happened here?


Tires, and a powerplant are two different domains. There's lots more at play with structure & what other parts can withstand. Resonance, materials, & aerodynamics all play a factor in the design process. Combine it with flying through the air in a seat instead of the ground, makes it all even more a factor.

Ever see Mazda Miatas with a Chevy LS motor? Kits are sold to adapt, especially now everything is drive by wire / software.


I understand your point, but this is the reality of modern aviation. New engines are released for existing airframes. In this case though, it wasn't an existing airframe. This was a new model built around new engines. Where the analogy starts to break down is that unlike putting new tires on, any face-lift that is done to an airliner is backed by testing and and under the watchful eye of multitudes of regulators.


I would certainly be surprised if after installing the new tires you are required to install a new system to compensate the brake or you car may crash. The MCAS system is a clear indicator that the airframe and the engine are not compatible and yet they decided to do so for profit.


All of this stems from a pointy-haired marketing decision to push the MAX as an upgrade to current technology requiring no new training for airlines. If they had made the ethical decision to seek a new type rating and force every pilot to be trained in the MCAS system, 400 people would still be alive. Those executives have blood on their hands, and they know it.


The ET pilots were trained with the MCAS system.

But to speculate: ET pilots seem to have experience issues before the MCAS would be active, so it's not unlikely that there was another issue with the plane, compounded by the MCAS kicking in when pilots where already fighting the aircraft...


No new information here, just another person pretending to be some authoritative source on this.

There's 0 proof the software worked correctly or that it's fit for purpose whatsoever.


Fixing an "aerodynamic" problem with a "software" solution is already cutting over to a different problem domain and it will lead to unforeseen circumstances. What can go wrong, will go wrong.

People at Boeing who made decisions for this project whether it is a team lead or a test lead or a project manager or a sales exec or a CEO; are all equally blamed for this. These deaths are on their conscience.


>Fixing an "aerodynamic" problem with a "software" solution is already cutting over to a different problem domain and it will lead to unforeseen circumstances.

I have a hard time parsing this. A modern airliner is a conglomerate of physical aerodynamic design, electronics and software. I am not convinced that something like MCAS is so out of the norm from modern aviation design principles.

>People at Boeing who made decisions for this project whether it is a team lead or a test lead or a project manager or a sales exec or a CEO; are all equally blamed for this.

Maybe. Or maybe there is no actual underlying problem. Or maybe the problem has nothing to do with the MCAS system. Let's wait a little and see how it plays out.


For your own peace of mind, I suggest that you look no further into what modern airliners have between the pilots and the control surfaces.


I really don't appreciate this attempt to shift blame away from any group and onto any other group. it's unprofessional and suggests working out who we can point the finger out and convincing people not to point the finger at our group is more important than the tragedy that happened and trying to work out ways to take responsibility for that. I'm sure all systems involved in the failure could be improved in some way. to emphasize how one system is not responsible is not a very empathetic response.


The group you are complaining about, the software people, are already being falsely blamed. This is a rebuttal to that. Just refuse to be a doormat.


Accepting responsibility, is not being a doormat. No matter what systemic faults were in play, the software was a part of it, and if the software engineers had made different choices - such as refusing to allow a flawed system to go forward - then the outcome would have been different.


> if the software engineers had made different choices - such as refusing to allow a flawed system to go forward - then the outcome would have been different.

You're assuming that the software engineers had sufficient information to identify the system as flawed.

The MCAS problems appear to stem from faulty sensor data, we don't yet know much more. However, suppose, for example, that the software engineers were told in by the sensor manufacturer that when the sensor had an error, it would shut off entirely and no signal would be sent. If that was the case, it would be difficult for the software developers to forsee and account for incorrect sensor data, rather than just no data.

In something as complex as a commercial airplane, no one person can know all the systems. There has to be information "hand-offs", and it's understandable that the person receiving the information would rely on it.

It's not that different in more prosaic software development. If an API has a bug in it, it's hard to blame the API users for not accounting for the bug. You generally trust that the API does what it says it does.


> You're assuming that the software engineers had sufficient information to identify the system as flawed.

No, I'm assuming that the software engineers had sufficient information to know what the gaps in their knowledge might be.

Following your example, if the engineers were told by the manufacturer that an error in the sensor would result in no data rather than bad data, there should immediately be followup questions: What is the redundant source of data? What is the valid range of data? Is there a positive way to detect and identify errors? How should detected errors be handled? The answers to these questions should be provided by the manufacturer. It may not be the software engineer's responsibility to double-check all the answers, but they do need to check that they were answered in the first place.

There absolutely needs to be information hand-offs; blindly accepting such a hand-off does not absolve you of responsibility.


> No, I'm assuming that the software engineers had sufficient information to know what the gaps in their knowledge might be. ... if the engineers were told by the manufacturer that an error in the sensor would result in no data rather than bad data, there should immediately be followup questions ...

Yes, of course there should be due diligence with any hand-off. However, lets assume for the sake of argument that there was, and the engineers using the sensor data received appropriate answers to their questions, and yet still the sensor did not perform as specified.

It's hard to blame someone who did their due diligence, did everything right, and relied on ultimately inaccurate information.


My point (poorly made) is that there is a difference between blame and responsibility. "Blame" is answering the question "who screwed up"; "responsibility" is answering the question "who is going to make this better".

The original tweet-stream post was making the argument that the software (and so naturally the software people) did nothing wrong and thus was not to blame, but also made the argument that since everything else was wrong ("not my fault!") that there was nothing the software people could have done to make it better, i.e., they have no responsibility.


sigh. It feels scary to feel blamed, I know, and whether you think you're a doormat or not, is important but it's not the main point. The point is, you're not the victims. The people who died are the victims. And pointing the finger, playing the blame game (instead of asking how can we do better in the wake of this tragedy) doesn't honour them, and it doesn't help. Just refuse to be the fake victim. That's weak. Don't make it about you, choose to make it about what can be improved.


What's the reason people write long stuff like this on Twitter? Literally unreadable.


With this line of argument pretty much nothing is a software issue since software is mostly there to compensate for something else: speed, errors, efficiency, manual labour, etc?

Highlighting the facts behind the design decisions of 737Max 8 is good for general knowledge but doesn’t help with much else in this context.

To follow this line of argument, I’d claim that this is the fault of old airports that didn’t have jetways so 737 had to be designed with lower body to allow folding stairways and so on...


Yes... and furthermore: it seems that a key problem was:

> MCAS can make huge nose down changes

This, to me, is really odd. All the hardware changes could not, AFAICT, require that. It seems really dumb that the MCAS system was made to be capable, in principle, of completely overpowering pilot input.

Is it a software problem...? Well, if the MCAS was limited to making only small changes in the stabiliser position, that could be counteracted by pilot's input to the elevators, these accidents would not have happened. AFAICT. It does seem that software contributed to the accidents.


I read the entire thread, but the summary is that this is a harsh indictment of Boeing, its handling of this aircraft and the accidents. It describes Boeing as cutting corners in many places, and makes it seem like what has happened was inevitable (in retrospect).


We don't have the official word on what happened yet.


I'm a frequent flier and I'm scared. I try avoid companies with low standards reading all the news about incidents related to the use of used or counterfeited parts, lack of maintenance, etc. But now it's clear that, in times of low cost companies, cheap airplane are requested and even the redundancy in critical subsystems is sacrificed both by the producer and the flight company that didn't pay for a "optional" that actually is a lifesaver. How many critical subsystem haven't redundancy to reduce the costs of airplanes ? Maybe some regulation in this market is needed to avoid other disaster like this one, imposing standard for the critical system and denying the routes to the airplanes that do not meet the specifications. I don't think that the market can play with human lives.


This kind of compromise is made all the time. In the past you needed 4 engines to cross oceans, now it is normal to just have two. It saves lots of money, and has been a massive success in terms of safety. The industry generally seems to get these compromises right when you look at the amazing safety record.


Be that as it may, statistically flying has never been safer. I think your fear is misplaced.


"we're ... called on to fix the deficiencies of mechanical or aero or electrical engineering"

As an embedded and firmware developer, I can tell you that this happens almost every day. If you ask how it is even possible to fix mechanical issues with software, know that it is true.

But, you know, this time the electrical engineer screwed up the power supply and there's noisy glitches everywhere, we just can fix it with software they say. Or the mechanical engineer designed the cover plastic with the wrong material and LEDs light comes out ugly: no problem, let's arrange the weirdest PWM sequence with SW so it looks nice.

This time, people died. Don't throw at us badly designed system so easily because it's just software.


This makes Boeing sound like VW - “We don’t want people to have to refill AdBlue except at oil changes”


Do you mean DEF Blue?


Yes


I think this is perhaps a serious design flaw with the plane.

Boeing wanted to make the 737 more fuel efficient, but they didn't want to re-certify the frame, and design a new body. So, they put bigger engines on the wings. This sounds simple enough.

Except that the engines were too powerful for the frame to handle. So on take off, these extra powerful engines would push the nose of the plane up, to such an extreme angle that it could cause the plane to stall, and risk falling out of the sky.

In order to compensate for this, they introduced software and sensors that would mechanically adjust the flaps of the plane, in order to help "level out" the plane. This is probably ok for inherently unstable fighter jets, but for commercial aviation, a single crash is devastating.

So, this issue is not just a software defect, that can easily be fixed with code. This is a serious design flaw, where the planes are a death trap just waiting to happen. There is a mismatch between the geometric placement of the powerful engines, in relation to where it should be on the plane, in order to achieve balanced flight, without the need for software to auto-correct for an excessive nose-up pitch. It was probably only a matter of time, before sensors start to fail, and the software can no longer handle the situation.


This is incorrect. The engines are not too powerful for the airframe. The problem is that the engines themselves create lift at high Angles of Attack, pitching the nose of the plane up.

"This new location and size of the nacelle causes it to produce lift at high AoA; as the nacelle is ahead of the CofG this causes a pitch-up effect which could in turn further increase the AoA and send the aircraft closer towards the stall."

http://www.b737.org.uk/mcas.htm


If you have 20-ish minutes to spare, I suggest this video about the same topic by Mentour Pilot who is a 737 NG captain: https://www.youtube.com/watch?v=TlinocVHpzk

It's not very technical, but very easy to understand. Assumes you have some basic aviation knowledge e.g. what a stall is and how weight & balance affects flying.


I wonder how this will affect pre-sales and sales of the Boeing 797. Apparently, they're going to pull the trigger on whether to build it this year: https://en.wikipedia.org/wiki/Boeing_New_Midsize_Airplane

I think it would be a good decision to do this. Not only because the 757 design in 50 years old, there's no planes Boeing offers that easily substitute for the 757, it would fit well alongside the business direction of the 787 (which has proven itself out quite well), but also because it would be a completely new plane, with few to no band-aids. I would trust a 797 over a 757 refresh, because Boeing would be much more terrified of a new plane with so much invested capital never achieving market acceptance than an older plane that has already been sold with money in the bank.

I would also hope Boeing's sales/marketing department understands planes falling out of the sky is bad for current and future sales growth, and now appreciates the difference between a properly safe plane and an unsafe plane with lots of band-aids.


The 757 production line doesn't exist anymore, so a 757MAX is completely off the table. Boeing even refused to build more passengers 767 even though the line is still running (for freighters and KC-46). The 797 as it is currently showed to prospective airlines is closer to a 767 replacement than a 757 replacement.

The real kicker is: if 737MAX becomes a hard case with lots of cancellations, or Boeing simply cannot sell it any further without cutting the price too much, Boeing will have to build a replacement from scratch sooner rather than later. The nickname for this project is NSA (New Single Aisle, I think). Or Boeing could try to build both at the same time (similar to what they did with 757 and 767).

The 797 is in an interesting situation: I believe Boeing is sitting on an incredible plane from a technical perspective, but the business case is hard to close. Of course the engineers want to build it: it's an incredible plane. But in my completely uninformed opinion it would be a mistake: no matter how great an airliner is from a technical perspective, and how alluring it is to engineers, it should not end up being a perfect solution looking for a problem.

Delta really wants the 797, but the design might be a little bit too US-centric, as if I understand correctly the capacity to haul cargo is sacrificed to keep flying costs low. But that makes it a complete no-go in the asian market, and is arguably not very forward looking (assuming ongoing rise of cargo needs). If the business case is hard to close, Boeing should just move on and build the NSA.

Airbus did that mistake with the A330NEO. They didn't have a clear business case, but a couple of customers and lessors kept pushing because they really wanted it, so eventually Airbus agreed. At least it's a "cheap" mistake, compared to a clean sheet design...


Do you have pointers to A330Neo problems which put it into this fail bucket? I found stuff about delayed delivery, and I found some scuttlebutt about the RR engines, but I can't find something which says its a fundamentally flawed idea. Bearing in mind that the 787 did not exactly have a stellar launch, having an Airbus A330 in the space feels to me logical: Many airlines have pilots trained in the A330.

Oh wait.. Is that what you mean? That there may be lurking differences in the flight envelope in a NEO to any prior experience on 330?


The problem is very simple: the A330NEO is not selling well at all. It was supposed to be a cheap alternative to the 787: not quite as good, but much cheaper. The problem is that Boeing has managed to reduce the manufacturing cost of the 787 so much that you can essentially buy a 787-9 for the same price as an A330-900.

The A350 is also suffering from the cheap price of the 787: it is too expensive, so Airbus has to work hard to lower the manufacturing cost...


This is declaration by fiat. Do you have pointers to back this up?

Web searches are much more equivocal. Many pro Boeing but not all. Observations that by type and training and flexibility an a330 fleet with a mix of ranges can suit.

Emirates has made big orders.


> If the pilots had correctly and quickly identified the problem and run the stab trim runaway checklist, they would not have crashed.

I'm curious how long it takes to run that checklist, and how much altitude would be lost while doing this? How long does it take to reach sufficient altitude to have time for this?

Also, I have a question about stall recovery and altitude. Are there any altitudes for which it is better to go ahead and stall and fall flat out of the sky than to nose down and risk flying into the ground at above terminal velocity? If so, do any automatic systems on any planes recognize you are in such a "must crash" situation and try to pick the least worst crash?


Video of training for runaway stabilizer trim: https://youtu.be/3pPRuFHR1co (time 2:45)

* The clanking sound is the stabilizer trim "runaway". In the video, it starts while the video is zoomed in; when the video zooms out you can see the trim wheels (next to their legs) spinning.

* The trainer (left hand seat) says "rudder", but he means "stabilizer" (he says it correctly later in the video)

* The pilots in a real plane would likely not hear the noise because they will have noise canceling headsets on, but the manual trim adjustment is the big wheel next to their leg that spins very visibly and they would feel the trim pushing the plane's nose down

* The stabilizer trim adjustment is relatively slow - it takes just under 10 seconds to travel end-to-end, so runaway time is going to be at least five seconds.


URL has a load of junk on the end of it.

Suggest change to: https://twitter.com/trevorsumner/status/1106934369158078470


Is used to think aerospace industry was the most safesty conscious industry because people trust manufacturers with their lives but now Boeing is selling an essential feature like sensor redundancy as a option to make extra money .


Kind of how Mixpanel used to sell single sign on security at an extra price and free users didn’t get it.

Then they got hacked and shit blew up on their face.

Safety and security aren’t add-ons. It seems that in the name of making a bit of $$$, Boeing cut corners and led to loss of life.


Whenever there's talk about "causes" of things, it makes me wish that more people had studied Aristotle and his four causes:

* https://en.wikipedia.org/wiki/Four_causes

This was a pretty good example of material, formal, efficient, and final causes.


Given given the relative shortage of talented software engineers in a world where software is eating the world, I find worrying that aircrafts are increasingly relying of software systems to make them airworthy.


Whilst this is an interesting read and is almost certainly largely true it does not entirely square with the fact that Boeing are working (and the FAA expect the certify) a software fix by April - as noted in their press release.

Software bugs or not - it does seem a major factor was the lack of an extra "AoA sensor" and a "sensor disagreement indicator". Presumably a very low cost option in reality that Boeing should have made standard fitment at least for the first year or so whilst they worked out any kinks in the MCAS system.


The fact that Boeing is working on a software fix doesn't contradict the thread. It's explicitly mentioned in the thread.

https://mobile.twitter.com/trevorsumner/status/1106934422249...

> Nowhere in here is there a software problem. The computers & software performed their jobs according to spec without error. The specification was just shitty. Now the quickest way for Boeing to solve this mess is to call up the software guys to come up with another band-aid.

(some related follow up tweets)

> I'm a software engineer, and we're sometimes called on to fix the deficiencies of mechanical or aero or electrical engineering, because the metal has already been cut or the molds have already been made or the chip has already been fabed, and so that problem can't be solved.

> But the software can always be pushed to the update server or reflashed. When the software band-aid comes off in a 500mph wind, it's tempting to just blame the band-aid.

> Follow @davekammeyer if you want to dig in.


I don’t get this thinking, if you’re in the bandaid making business, maybe make sure it doesn’t cause an infection ?

In this case the software was developed to compensate for the system characteristic,it did not fully do that. Of course, it is immensely frustrating that software is always called upon to the papering up, but that is another issue.


Sure, I'm not necessarily endorsing the thread.

I'm just saying the existence of an in progress software patch in no way contradicts the thread itself.

It's part of the premise of the thread itself.


I was looking at it from the narrow view that it was to do A (the papering over), and it did not do that (fully).

On second thought, it’s more a systems engineering issue not to take that case into account. Software engineering doesn’t get off scot free though, as they are an important voice.

With the presumably tight engineering controls that are practised, I can speculate that it may have fallen into “the pilot disables and takes over control” branch. The gap would then be that they did not think that the airlines would be given the option to “not” install the sensor failure warning.


The software fix is probably further automation to disable the system when some other sensor indicates that it is misbehaving - i.e. it is probably a band aid on top of a band aid.


> Software bugs or not - it does seem a major factor was the lack of an extra "AoA sensor" and a "sensor disagreement indicator".

Per the thread, they were sold as optional extras, but not sold in these cases.



What worries me is if we're going to see some kind of relevant similarity with the 777X which afaik is 'type rated as the same as the existing 777', despite having new mechanical processes to fail (wing fold) and entirely new undercarriage.

Still, the 777X is still some way from hitting customers, so maybe Boeing will spend some time contemplating the way they gamed type rating with the MAX before hitting customers.


What do you expect? Boeing is so integrated with the government that it's hardly surprising that poor regulatory decisions influenced the crashes. In fact, that's like every project. Nobody willing to assert themselves and say no. So they start integrating a bunch of unnecessary systems to compensate for flaws until you get one, big giant mess that you can't control with a deadline looming.


This story keeps getting worse and I was already shocked and stunned beyond belief.

- Boeing recycles 737 airframe moving the engines. This seems largely about reducing costs, decreasing time-to-market and (importantly) maintaining a common type rating.

- To compensate for the engines moving, which could cause the nose to dip, they add a software solution (MCAS) that could dip the nose without really telling any pilots or airlines. Worse, it's based on a single input (well, one of two but it only listens to one at a type), this being the AoA sensor.

- Blaming pilots for the Lion Air disaster. Whatever the truth, that's certainly premature.

- Boeing refusing the ground the aircraft after the second crash.

- The FAA apparently complicit in this until it finally capitulates to the inevitable and grounds the plane after Europe and several others already have.

- The hubris of not wanting to appear wrong or like they're capitulating to public pressure, Boeing sticks to their guns til the better end.

- An AoA sensor upgrade as an option for what is arguably a critical system.

What's also fascinating is all the Boeing apologists who have come out of the woodwork (eg [1]). I've seen comments about how the airlines "demanded" the 737 MAX. There might be a demand for a low-cost narrow body passenger jet and I'm sure that's the reason the 737 MAX was developed. Anecdotally, it seems to be terrible for passengers (eg [2]), which would certainly be compatible with the idea that this is a low-cost solution.

It's also worth mentioning the rudder issues of the 737 that was posted here a few days ago [3].

I honestly don't understand how Boeing's management can be so reckless with the hard-earned reputation for safety. They've done so much damage to their brand with this that if it wasn't for the fact that hundreds of people have died here, Airbus would be laughing all the way to the bank (or at least it would take the edge off the giant A380 boondoggle).

As much as pilot error has been a significant cause of air disasters (eg experienced pilots pulling the plane up to cause a stall as in the Air France crash), you get a sense of how hard it would be to fully automate piloting a plane. What I find disturbing is how hard overriding automated systems seems to be. When a plane's automated systems fails, shouldn't a pilot be able to easily take full manual control? I would've thought so. You see examples of this like Qantas Flight 72 [4].

And flying a plane is in some ways a much simpler problem than driving a car. You takeoff, you fly a predetermined route and you land. There are some adjustments for weather and other factors and occasionally you have to turn around or deviate and make a landing. I'm obviously oversimplifying here but cars seem to have so many more corner cases here. People seem to think autonomous cars are right around the corner. I'm not so sure.

[1] https://news.ycombinator.com/item?id=19389791

[2] https://thepointsguy.com/2017/11/first-look-aa-boeing-737-ma...

[3] https://news.ycombinator.com/item?id=19385980

[4] https://en.wikipedia.org/wiki/Qantas_Flight_72


> To compensate for the engines moving, which could cause the nose to dip

The engines could cause the nose to go up, leading to a higher chance of stalling the plane. The reason why the nose would pitch up is because the engines are below the center of gravity and that's what more engine power would cause the plane to rotate around. To offset that they came up with the idea of changing the trim.


There's some discussion here although the post was mysteriously shadow-banned from Reddit: https://old.reddit.com/r/programming/comments/b1f5zh/the_737...


* Management problem. The senior executive who smooth-talked every department into bending their own rules, using phrases like "working together as a team", "focusing on the solution, not the problem", "agile" and "MVP", was hailed as a hero and financially rewarded.


It's not a software problem. It's a software engineering problem. It's the attitude of "it met the specifications, so I did my job and it's not my fault" that separates this kind of software "engineer" from the likes of William LeMessurier and Bob Ebeling.


Are attitude indicators or gyroscopes used as as inputs to automated systems? Or are only external sensors used?


If you found it difficult to read the twitter thread, this is the same thing in blog format. https://threadreaderapp.com/thread/1106934362531155974.html


Yet more deaths because people aren't looking at what the software controlling their lives actually does (in this case, ignoring extra sensor readings that could indicate a failure of one of the sensors.)

I feel like it's getting better in most industries but not in things like aerospace.


Likely, the reason the 737max story receives so much attention is because software devs (consciously or not) feel this could affect our industry.

There may even be some guilt involved (justified or not), if it involves software in any meaningful way.

Various community members have been warning for some time that we'll face regulation sooner or later; all that needs to happen is a sufficiently large disaster. The dependence between life-critical hardware and software will only increase in scale.

Whether or not this begins our "Iron Ring" moment, I think it's something devs implicitly feel, and is culturally resonant for them.

---

> my brother in law @davekammeyer, who’s a pilot, software engineer & deep thinker.

> I'm a software engineer

The thread does feel a little defensive, no?

I'm not saying that software was the cause, or even the main cause; though even if the other causes appear to be the precipicating factors, we should be on the watch out for defensiveness, without knowing the _whole story_

* I don't care that in this case, the software was not to blame - that is not the main point i am making.


I wouldn't say defensive, but rather informative.

The main point of that thread is not actually to "solve the mystery" of the crash, or even to point fingers at where the fault lies.

The point I took away from the thread is to show that these issues are complex, and there is never one single thing you can point to as "the problem". In most cases, it is really a series of interconnected events.

Our media (and I think many of us - so I'm not singling them out) loves to simplify problems in an effort to make them understandable by the average person, and while that may be necessary for them to get people to pay attention, it does us all a disservice in the long-term I think.


...as long as you aren't pointing at the software engineer, whose products did exactly what they were supposed to do.


Ah yes.. the code meets the requirements. It follows the spec. No further involvement necessary.

And if Boeing does release a 'software upgrade', what is there to say about why the software wasn't required to be that way in the first place?


>Various community members have been warning for some time that we'll face regulation sooner or later; all that needs to happen is a sufficiently large disaster.

I'm reading that line as if it is a bad thing. I don't see why and couldn't disagree more. Software failures should be handled just the same as in hardware or harsher as you can much easier and faster update software. Bad enough bugs should be handled like they are in hardware, IE. cars that are pulled of the marked if they pollute too much, planes that are grounded, etc. Throwing your hands in the air saying "we don't support version X" shouldn't be an excuse. If a car have a 3 year warranty and they start going full throttle killing people or you can exploit a jeep to disable its brakes the manufactures should be just as liable as if it happened on day one or year 10 and with no difference if the error is hard or software.

If IoT and router device manufactures should pay the damages their unpatched devices cause, they would be fixed, unlike now where even my consumer-grade managed Cisco switch had had exactly zero firmware patches since I bought it.


Your points precisely match the various community members i alluded to, calling for best practices and safety.

_I_ wasn't implying that regulation was was a bad thing, but I do think that the industry, or at the very least, established entities in tech, fear it.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: