
How the Boeing 737 Max disaster looks to a software Developer - pross356
https://spectrum.ieee.org/aerospace/aviation/how-the-boeing-737-max-disaster-looks-to-a-software-developer
======
ncmncm
This is a very good analysis, but fatally incomplete.

One really essential reason those planes crashed was that each time the MCAS
triggered, it acted like it was the first time. If it added 1 degree of trim
last time, it adds a second this time, a third next time, up to the five
degrees that runs the trim all the way to the stops.

A second reason is that, under the design still on file at the FAA, it could
only add a maximum of 0.8 degrees (each time). This was raised to 2.4 degrees
after testing, so only two hits could, in principle, put you almost to the
stops.

A third was that the only way to override the MCAS was to turn off power to
the motor that worked the trim. But above 400 knots, the strength needed to
dial back the trim with the hand crank was more than actual live pilots have,
especially if it is taking all their strength to pull back on the yoke.

A fourth was that, with two flight control computers, the pilot could (partly)
turn off a misbehaving one, but there is no way to turn on the other one. You
have to land first, to switch over, even though the other is doing all the
work to be ready to fly the plane.

A fifth was that it ignored that pilots were desperately pulling back on the
yoke, which could have been a clue that it was doing the wrong thing.

A sixth was that, besides comparing redundant sensors, it could have compared
what the other flight computer thought it should be doing.

~~~
kakwa_
This analysis is completely right, but in my opinion, focuses too much on the
technical aspects.

Is MCAS a hack? yes. Is it fixable? yes. Will the 737 MAX continue to fly for
two to three decades after all the items above have been addressed? yes.

But from an engineering perspective, putting an additional system to "fix"
another system feels always a bit weird. Sometimes it's not avoidable (ex:
cooling), but when it is avoidable, something is at least a bit wrong. A few
hacks like that are manageable, but too many, and you dramatically increase
the chances of one of these hacks misbehaving.

And if an organization is pushing a lot for this kind of hacks as Boeing did,
the issue is not even technical.

The story of MCAS reminds me a little about the MD11. The DC10 as a tri-jet
could not really compete with new dual engine airplanes in the late
80ies/early 90ies in term of fuel consumption, but McDonell Douglas try to
anyway. They optimized the wings, change the engines, add winglets and more
significantly reduce the horizontal stabilizer size. This made the MD11 quite
hard to land as it needed to go-in with a very high speed for a wide body jet.
It was a contributing factor in several accidents (Fedex 14, Fedex 80,
Lufthansa Cargo 8460), and pilot training/technical fixes never fully
compensate for the design flaw. And in the end, the aircraft kind of failed to
reach the target fuel consumption. However, it's still flying today, and it's
still a workhorse for cargo companies.

~~~
ncmncm
Ultimately it was not a technical failure: the failure was in allowing the
plane to be sold with such a thoroughly bad design. Thus, a management and
regulatory failure. Regulatory, because FAA signed off on it, obviously
without applying any of the process that would have prevented it. Management,
because the cost of this debacle will be many, many times what they saved by
trying to skate by with a faulty design.

~~~
Cacti
The airlines knew what they were doing when they didn’t pay for the “upgrade”
and they doubly knew when they didn’t pay for it after the first crash.
Airline management is just as culpable in this debacle.

~~~
kadendogthing
Yeah, they willingly bought these planes. Airlines should be held as equable
responsible.

~~~
bronson
You’re saying that Boeing intentionally sold unsafe airplanes? And the
airlines, knowing this, still bought them?

~~~
kadendogthing
I'm giving money to the airlines. Not Boeing.

>You’re saying that Boeing intentionally sold unsafe airplanes?

Yes.

>And the airlines, knowing this

You know airlines have their own engineering teams, right? And they were
bought explicitly to save money.

------
DuskStar
This article reads well. Unfortunately, it's _filled_ with fundamental
mistakes.

> In the old days, when cables connected the pilot’s controls to the flying
> surfaces, you had to pull up, hard, if the airplane was trimmed to descend.
> You had to push, hard, if the airplane was trimmed to ascend. With computer
> oversight there is a loss of natural sense in the controls. In the 737 Max,
> there is no real “natural feel.”

Whoops. The 737 is one of the few airliners produced today that DOES still
directly connect the pilot's controls to flying surfaces. There's literally
12mm wires connecting the yoke to the control surfaces. So, in fact, the 737
does have "natural feel". In fact, that's the whole problem MCAS was designed
to solve - how to add force to the yoke when at high AOA in certain speed
regions, in order to ensure a linear AOA response. (Certification requires
something along the lines of "5 pounds force -> 5 degree AOA, 10 pounds force
-> 10 degree AOA, etc") If the 737 was fly-by-wire, this force could be added
directly in software. Instead, they added this force by changing the
aircraft's trim.

> In a pinch, a human pilot could just look out the windshield to confirm
> visually and directly that, no, the aircraft is not pitched up dangerously.

No, you can't. There isn't a visual indication for AOA - the author is
confusing this with pitch. You can have a 45 degree AOA while still having the
nose of the plane pointed at the horizon. (The author mentions using the
artificial horizon for this purpose, too) If you could read AOA from sensors
that didn't require being exposed to the airflow, planes would use them.

I'm sure there's more, and it greatly reduces my confidence in the article.

~~~
lisper
> There isn't a visual indication for AOA

Private pilot here. What you say is, strictly speaking, true but that doesn't
mean that you can't tell an awful lot about what's going on by looking out the
window, and, more importantly, at the other flight instruments. If your
attitude, airspeed, and rate of climb are all looking normal, then if the AOA
says you're stalling it's almost certainly wrong. And it it says that your AOA
is 70 degrees (which the Ethiopian airlines AOA sensor did) then it is
_definitely_ wrong.

MCAS was designed to blindly trust a single AOA sensor, which was known to be
prone to failures. To call that inexcusable would be quite the understatement.

~~~
DuskStar
>If your attitude, airspeed, and rate of climb are all looking normal, then if
the AOA says you're stalling it's almost certainly wrong.

Agreed - you can certainly sanity check the AOA data with other data sources.
Unfortunately, I'm not sure that will always let you pinpoint AOA as the cause
- imagine some nice dirt-loving wasps have built nests in all of your pitot
tubes, or they've iced over, and now those values are locked in place. So your
IAS and altimeter could both be reading normal despite the fact that you're in
a dangerously fast descent at a high angle of attack. (A radar altimeter
wouldn't have this issue, of course) Things like this have happened before. Of
course, this would require your engines to have failed at approximately the
same time as your airspeed and altitude data, but I can imagine failure modes
in which that would be the case. (A particularly stupid autopilot attempting
to reduce speed but not seeing the speed decrease, repeat in loop)

> And it it says that your AOA is 70 degrees (which the Ethiopian airlines AOA
> sensor did) then it is definitely wrong.

Or it's accurate and you're about to die unless you fix it. Air France Flight
447 comes to mind. (though 70 degrees is probably past the point of being
recoverable)

> MCAS was designed to blindly trust a single AOA sensor, which was known to
> be prone to failures. To call that inexcusable would be quite the
> understatement.

No disagreements here.

~~~
lisper
> imagine some nice dirt-loving wasps have built nests in all of your pitot
> tubes

 _All_ of them? And I didn't notice any of them during preflight? And none of
them were there on the previous flight? And I didn't notice that the airspeed
was not alive on the takeoff roll? Not going to happen.

> or they've iced over

Again, pretty freakin' unlikely on takeoff flying out of Addis Ababa. And this
is another thing a pilot can rule out by looking out of the window. If you're
not in clouds, you're not picking up ice.

Also, there are OAT (outside air temperature) sensors.

> your engines to have failed

Another thing that would be pretty apparent to the pilots.

No matter how you slice it, Boeing screwed the pooch bigly.

~~~
DuskStar
> > imagine some nice dirt-loving wasps have built nests in all of your pitot
> tubes

> All of them? And I didn't notice any of them during preflight? And none of
> them were there on the previous flight? And I didn't notice that the
> airspeed was not alive on the takeoff roll? Not going to happen.

You underestimate the abilities of people to fuck things up. Birgenair Flight
301 - one tube blocked by mud dauber wasp, leading to crash. [0] Later that
year was the crash of Aeroperú Flight 603 - the static ports were covered with
tape, which was never realized by the pilots. [1] (They continued flying as if
their altitude and airspeed readings were accurate)

> > or they've iced over

> Again, pretty freakin' unlikely on takeoff flying out of Addis Ababa. And
> this is another thing a pilot can rule out by looking out of the window. If
> you're not in clouds, you're not picking up ice.

> Also, there are OAT (outside air temperature) sensors.

Not applicable in this instance, sure. But "are there clouds around" isn't
something I'd expect the flight computer to be able to reliably detect, and
that's what would have to make the decision here.

> > your engines to have failed

> Another thing that would be pretty apparent to the pilots.

I'd agree that it SHOULD be.

> No matter how you slice it, Boeing screwed the pooch bigly.

Yep. But there's still more complexity here than people seem to want to
acknowledge.

0:
[https://en.wikipedia.org/wiki/Birgenair_Flight_301](https://en.wikipedia.org/wiki/Birgenair_Flight_301)

1:
[https://en.wikipedia.org/wiki/Aeroper%C3%BA_Flight_603](https://en.wikipedia.org/wiki/Aeroper%C3%BA_Flight_603)

~~~
lisper
No technology is ever going to completely protect you against an incompetent
pilot.

------
rkagerer

      I believe the relative ease — not to mention the lack of
      tangible cost — of software updates has created a cultural
      laziness within the software engineering community.
    

\-- This --^

As someone who carefully crafts their code to strive for perfection, seeing
sloppy work out there in the wild drives me nuts. I know folks here will
deride me for being "inefficient", but in the long term I still maintain from
my experience it's less efficient to push out buggy software and try to fix it
retrospectively.

~~~
amelius
> As someone who carefully crafts their code to strive for perfection, seeing
> sloppy work out there in the wild drives me nuts.

Do you use formal methods to prove the correctness of your code?

Because that is what Boeing engineers do (I hope).

~~~
Raidion
It's not the logic that's the problem as much as the fact that the systems are
meant to work with clean data, and the fact that data could not just be
missing, but straight up incorrect was never considered as a valid
possibility. This was made worse by the fact that the poorly formulated
"solution" was barely communicated to the pilots. I saw somewhere that the
plane could be unrecoverable in 40 seconds. You need to have bigger safety
margins than that.

~~~
ncmncm
40 seconds is an eternity to a pilot taking off or landing. Even Air France,
taken down by a plugged pitot tube from cruise altitude, was doomed after only
two minutes.

Airbus has a lot to answer for on that one: averaging inputs from the pilot's
and copilot's game controllers? Game controllers? Turning off the stall
warning in deep stall, so that starting to recover sounds it again? Failing to
teach pilots what stalls are, what they feel like, and how to prevent and
recover from them?

There is more than enough blame to go around for that one.

The Boeing software, like the Airbus's, apparently performed as specified, so
there was no problem with execution of the spec. The problem was that it was a
bad spec, in too many ways to count.

------
wpietri
What a stellar example of an article on a complex topic written to be clear
enough for the audience to understand. I especially like the way he brought it
back repeatedly to hands out the window and bitey dogs.

~~~
wpietri
I also heartily agree with him that software's general laxity with regards to
reliability is contagious. I've come think that calling it all "software" is
dangerous, like thinking of all things made with atoms as the same.

I think we as an industry should get together, divide the work into various
domains, and establish professional and ethical standards for the domains that
matter. Standards with teeth, such that developers who want to do the right
thing in the face of bosses insisting otherwise have the backing of their
peers. And also such that developers who don't care about the right thing fear
the professional consequences.

I honestly think this is something we should do regardless. But as a practical
matter, if we don't governments will do it for us soon enough. Software keeps
getting more important, as do its failures.

~~~
ubertakter
Something like (or exactly) software systems safety methods should be applied
to critical systems (such as aircraft systems). DoD does this for all of their
critical software. And I say "something like" only to indicate there may be
something particular about software in aircraft. I doubt it though. And as the
article indirectly points out, analysis was severely lacking.

DoD Software System Safety handbook [https://www.acq.osd.mil/se/docs/Joint-SW-
Systems-Safety-Engi...](https://www.acq.osd.mil/se/docs/Joint-SW-Systems-
Safety-Engineering-Handbook.pdf)

Full disclosure: The company I work at does this type of work. I don't work in
that group.

------
ReGenGen
The 737 Max keeps getting viewed as an "Engineering Failure" but we should
consider if this really was a "Management" failure. The unstated goal w/MCAS
was to avoid additional pilot type certification. (If you tell pilots about
MCAS, give them an off switch, or if the system switches off... then pilots
need to be trained specifically on the MAX.) If management gave the MCAS
project to Senior Boeing Engineers they would likely push back jeopardizing
unstated goals. Executives likely steered this project to junior engineering
or yes-men... who delivered a solution which avoided pilot training.

~~~
theclaw
The article states that MCAS was implemented "on the hush-hush," which makes
me wonder if it could even be subject to the same level of quality control as
other features of the software.

It might have had to bypass some of the more stringent parts of Boeing's
development process to avoid appearing in documentation that the customer or
the FAA might see.

~~~
ncmncm
This appears to be what happpened.

If so, it amounts to criminal negligence. People should go to jail, but if
anybody does, it will certainly not be the ones ultimately responsible. Most
likely Boeing will pay fines and court judgments, something probably already
factored into their stock price, impacting people holding the stock this year,
not those who might have demanded better management five years ago.

Certainly the whole top tier of management should be fired, but that won't
happen either.

~~~
jeremyjh
How about the fact that one plane full of people going down was not enough to
wake everyone up. No, we need two planes to go down, and still the FAA was
telling us we have nothing to be concerned about. The biggest problem here is
not the engineering or even the management. Its regulatory capture.

------
_bxg1
"Various hacks (as we would call them in the software industry) were
developed."

Dear God. That's a sentence I never, ever wanted to hear about an aircraft.

~~~
WalterBright
All engineering designs are full of compromises (the article uses "hacks"
meaning compromises), as there are a large number of competing issues at work.
Pretty much none of those issues are ever aligned along the same axis.

For just a taste of this, an airliner flies at high altitude, and at low
altitude. It flies at low speeds, and high speeds. It flies heavily loaded and
empty. Optimizing for any one of these regimes means unacceptable behavior on
the others. So compromises are necessary.

Ever notice the flaps on the wings? They're a compromise (a "hack" if you
will) to change the shape of the wing to make it work better across different
flight regimes. Because metal isn't very flexible, and a long list of other
issues with the machinery that operates the flaps, the shape of them is hardly
anything but compromises.

~~~
_bxg1
I don't even work on life-or-death machinery, only JavaScript UI's, but if a
higher-up requested a new feature from me that would require multiple
cascading changes just to keep other things from breaking, I would fight tooth
and nail to talk them out of it. The compromises you're talking about are
fundamentally unavoidable ones that come directly from the constraints at
hand. A hack is something that degrades the integrity of the overall system
for the sake of a short-term feature. Hacks are the payday loans of technical
debt.

~~~
WalterBright
They're the same thing. BTW, many aircraft crashes were due to a pilot doing
what would have been the right thing on another design they were familiar
with. Boeing's plan to make the MAX behave like other airplanes is a
reasonable plan to improve safety.

The "multiple cascading changes" is always an issue with a complex design, and
in fact Boeing's approach with the Max was choosing a route which minimized
those cascading changes.

~~~
mcny
» They're the same thing. BTW, many aircraft crashes were due to a pilot doing
what would have been the right thing on another design they were familiar
with. Boeing's plan to make the MAX behave like other airplanes is a
reasonable plan to improve safety.

» The "multiple cascading changes" is always an issue with a complex design,
and in fact Boeing's approach with the Max was choosing a route which
minimized those cascading changes.

I think the charges people are bringing is that Boeing wanted to make the new
airplane acceptable by airliners by making the 737 Max the same as the old
plane so there was no need to certify pilots for the new plane. My vote is
that this is criminal negligence at best and under an ideal government, the
then CEO and board would be in jail pending charges right now.

~~~
WalterBright
You wouldn't have any planes (trains or automobiles) under such a standard.
Keep in mind that horses killed far more people per mile than planes, trains
or automobiles ever did.

------
laydn
The article states: "When MCAS senses that the angle of attack is too high, it
commands the aircraft’s trim system to lower the nose. _It also does something
else: It pushes the pilot’s control columns downward_ "

No, it does not push the pilot's control columns downward.

~~~
Declanomous
I think that the author is mistaking the fact that trimming the plane forward
means that more force is required to pull back on the yoke for the plane
actively pushing the yoke forward.

~~~
cjbprime
Yeah, maybe also some Boeing/Airbus confusion. The author doesn't seem to
understand the extent to which the 737 cockpit is mechanical, with cables
running to control surfaces. It's an important point because it likely
explains why the Ethiopian pilots were unable to regain control after
disabling electric trim, becoming overpowered by aerodynamic load on the
stabilizer.

~~~
mfer
In newer 737 versions it's no longer cables...
[http://www.boeing.com/commercial/aeromagazine/aero_02/texton...](http://www.boeing.com/commercial/aeromagazine/aero_02/textonly/sy01txt.html)

~~~
ReGenGen
FYI That doc is about "Propulsion Control Systems" not elevator trim or
control surfaces.

------
raz32dust
Could someone please ELI5 what exactly went wrong that caused the crash? I
read the article but I still don't understand if the crash was caused by

(a) The MCAS doing something it wasn't programmed to do (by specification)

(b) MCAS was working as expected, but the expectations/assumptions were wrong
(excluding pilot mistakes).

(c) MCAS worked as designed, and the design was correct IF the pilots behaved
like Boeing expected them to.

Or has this question not been answered yet? I am talking purely from the
software correctness point of view, irrespective of the fact that using
software to work around this problem was already bad design.

~~~
ncmncm
It is none of the above.

The MCAS worked as specified. The specification was criminally stupid. It
violated at least two bedrock principles of avionics design. It would not have
been approved at all if Boeing had not bent over backwards to draw attention
away from it, for fear that pilots might have needed extra training, or the
plane need more exanination.

------
wnevets
>Unfortunately, the current implementation of MCAS denies that sovereignty. It
denies the pilots the ability to respond to what’s before their own eyes.

>In the MCAS system, the flight management computer is blind to any other
evidence that it is wrong, including what the pilot sees with his own eyes and
what he does when he desperately tries to pull back on the robotic control
columns that are biting him, and his passengers, to death.

Wow, I had no idea the issue with the 737 Max was so nuts. I can't imagine
what the pilots were going through when this was happening.

~~~
ncmncm
Well, no robotic control columns, but the effect of silently adding
uncommanded trim is the same.

------
NoNameHaveI
So, the linked article is an updated version of an earlier article in the EE
times. In the comments section, a reader (ie) Frankly, I am astonished that a
single point of failure (AOA) could make it through a FMEA (Failure Mode
Effect Analysis). For those who are unfamiliar, a FMEA is (basically) looking
at each part of a system and saying "What happens if it breaks?". Having
worked in commercial vehicle software development, I am astonished as well.

~~~
theclaw
I don't work in this field. Do you have to publish the findings from this
analysis? Is it possible that no such analysis was done because the MCAS
system had to be kept quiet?

~~~
cjbprime
Yes, you do, as part of certification. Boeing misrepresented the degree of
control MCAS has and was able to avoid making it a redundant system as a
result. Layperson's description here:

[https://www.seattletimes.com/business/boeing-
aerospace/faile...](https://www.seattletimes.com/business/boeing-
aerospace/failed-certification-faa-missed-safety-issues-in-the-737-max-system-
implicated-in-the-lion-air-crash/)

------
Xcelerate
Just a meta-comment, but I've noticed that for every single article posted
about the 737 Max on HN, the top comments all tend to say that something in
the linked article is dramatically wrong. I'm not sure what this means, but I
find it interesting.

~~~
cjbprime
I think the takeaway is that it's not usually the case that something this
technical makes it to world news. Everyone wants to have an opinion, but even
in this article's case of someone who's both a pilot and programmer, they
apparently know next to nothing about how a 737 MAX is controlled
(mechanically, not with forces artificially applied to inputs) because they
fly a Cessna. And there aren't that many 737 pilots with the time or ability
to write something frank about this without perhaps getting in trouble with
their employers.

------
murkle
> In the 737 Max, only one of the flight management computers is active at a
> time—either the pilot’s computer or the copilot’s computer. And the active
> computer takes inputs only from the sensors on its own side of the aircraft.

~~~
paulopontesm
Also wondered what was the logic behind this decision...

~~~
digikata
I suspect combined reliability of a simple switch and two independent systems
is likely higher than one composite system and pilots or software trying to
estimate and select which combinations of computers & sensors are "good" in
the middle of an emergency.

~~~
airbreather
And on the surface this looks like a reasonable comment, but it is exactly why
there is a whole branch of engineering dedicated to understanding how to build
safer systems. Counter-intuitive results abound.

So many issues - a simple switch usually has poor diagnostics, at least in one
mode of failure, so you dont know it has failed until it is too late. A
continuous measurement device connected to a computer/s will have a vast array
of available diagnostics, 'most probably leading to less "dangerous undetected
failures" than a simple switch, or combination of.

And "independent systems", sounds easy, but in practice full independence is
almost impossible to achieve, and messy unpredictable humans dominate the
common cause failures that overlap these systems.

There is more, much more, but this is why it is hard to right readable
articles about these things, so much devil is in the detail that is hard to
explain in bite sized portions.

~~~
digikata
"A continuous measurement device connected to a computer/s will have a vast
array of available diagnostics, 'most probably leading to less "dangerous
undetected failures" than a simple switch, or combination of."

Isn't this exactly the approach that failed in the MCAS system? And if you had
a switchable independent system, a copilot would have righted the plane and
flown on.

But really I agree with your overall comment, it's very difficult to know why
a given safety design decision was made unless you are well steeped in the
system - there are almost always little corner tradeoffs. That's why I added
the "I suspect" to the front of my comment.

~~~
airbreather
This the approach that IEC61508 leads you down by the numbers, but it is also
always better to cover off the unknowns with redundancy(multiple sensors) and
diversity (different kinds of sensors) wherever practical.

However, more instruments mean more potential disagreements, so more
complexity of possible outcomes/actions/diagnostics etc.

It becomes a balance for the best outcome and surprisingly when you go through
all the factors there is still quite a bit of subjectiveness and sometimes the
numbers for failure rates are so low that the calcs become extremely
sensitive.

Additionally, there is always beta factor, which allows for common cause
failures between instruments/systems. Often beta factors are the dominant
factor numerically in a performance calculation, but are a) essentially
traceable back to issues with humans (design, installation, maintenance) b)
often vastly underestimated and represented as an average value, where in the
worst cases are rare but very high - one tech installs both instruments
incorrectly so they both read wrong but same

------
blunte
I think a fundamental concept that is relatively ignored in press (mainstream
or non-fringe at least) is in business priorities.

As the essay points out, Boeing made decisions based on market (financial)
factors all along the way. Of course they did - almost all companies do.
Because it's so common, we forget it's not a RULE that cost, shareholder
value, profit, etc. must be the final judgement.

Certainly any company that might prioritize safety (as in the case of a
people-carrier) or some other non financial focus may not be as attractive to
investors, in some cases - particularly where human life is concerned - maybe
that's ok.

I know this is a bit sensational of me to suggest, but Walmart could make an
airplane. They have the funds to do so if they wanted (they would just buy a
company; that's the quick way to get rolling). And if the Walmart airline
charged half of what other airlines charged, you can be sure a whole lot of
people would fly it. And sure, after enough flights, the fatalities per flight
would become uncomfortable to most flyers.

The point is, it's a very long game. If people are not willing to consider the
long game, they are basically gambling. I fly a lot. I almost always choose my
flights based on cost/convenience. I didn't previously avoid 737 MAXs. (I
would now, but they're all grounded anyway.) That said, if I know an airline
cuts every corner to lower the price and increase the profits, I will
certainly choose the next more expensive airline that doesn't behave so
poorly. In this case it's the manufacturer, and the consumer has much less
choice where that is involved.

But let's get back to the point. If Boeing were to fall behind Airbus in the
737/A320 race, would that really be such a terrible thing? Would the cost be
human life, or might it be some stock price level? As an investor, do you
really care about your shareholder value more than human life?

I like to fly. It can actually be fun. And I really like to visit lots of
places in the world, eat lots of awesome food, and experience different
cultures. I don't want to die because of some shareholder value goal (fuck
you). I will die, and maybe it's tomorrow, but it shouldn't be for a stupid
reason unless I choose it (here, hold my beer.)

~~~
bumby
I agree that people often seem to miss that this may be rooted in a business
decision. The problem is it isn't a risk-informed decision. I would doubt
Boeing was accurately able to assess the actual risk of MCAS causing a
catastrophic failure or else the decision to rush to market wouldn't have
happened.

I think just as large of a problem is mis-aligned incentives. Management is
almost assuredly not singly focus on the extreme long term. The are graded
quarter-by-quarter, year-by-year. This pushes uncertain risks to the periphery
in favor of short term profits. I'm worried that unless the incentive
structure is re-evaluated (by jail terms as an example) management will
continue to make these types of decisions because schedule and cost will
remain king

~~~
linuxftw
> I would doubt Boeing was accurately able to assess the actual risk of MCAS
> causing a catastrophic failure or else the decision to rush to market
> wouldn't have happened.

That's most likely because upper management has been cutting staff and not
investing in their people.

They should be sued and fined out of business. Whoever picks up the pieces
will know: "Don't cut corners, or it's absolutely ruins."

~~~
bumby
I have no clue about Boeing's staffing practices but it's often the case in
large organizations that the people trying to prudently hold up a project
because it's not ready are looked at less favorably by those with "go-fever".

I agree they should be held accountable but there's also blowback from
bankrupting one of a nation's major aerospace manufacturers

~~~
linuxftw
> I agree they should be held accountable but there's also blowback from
> bankrupting one of a nation's major aerospace manufacturers

So let there be blowback. Flight costs will go up if they have to, nbd. If
there's too big to fail, then just nationalize them and get on with it.

~~~
bumby
I meant blowback bigger than just consumers pocketbooks. Issues related to
national security because the nation just lost one of it's primary aerospace
contractors.

The "too big to fail" issue is an important one, though many Americans tend to
hate the idea of nationalization because in their minds it's a social evil.

~~~
linuxftw
Yeah, it's a tough lesson. We shouldn't put all of our eggs in one basket. We
have other aerospace contractors anyway.

------
sisu2019
Sorry, I don't believe that our perspective on this is especially valuable or
insightful. Everyone with an average IQ can explain in 5 sentences what went
wrong with this plane. Yes new engines didn't fit the plane, fixed with
software, well done.

But why? It's not profits. Boeing has been for profit from the start.

It's something more. We are getting bad at doing hard things. Here in Germany
we can't seem to finish a new airport (BER). Or a new train station (Stuttgart
21). Or the new ICE trains. Or keep the autobahn bridges maintained.

And software? There is an article about problems with Win 10 updates ever day
now it seems. And what about all these new SPA type websites that pull down 30
megs of javascript in order to keep failing at basic tasks in new and exciting
ways? On latest android there is a packet fragmentation bug that has gone
unfixed for 6 month and counting now. Oh it only prevents people from using
amazon and netflix so no big deal.

I grew in a time where we expected stuff to get better and better but lately
it seems we'd do well to just hang on to what we have.

~~~
theclaw
It was profits. Airbus did effectively same thing with the A320 Neo in 2010,
and it became the fastest selling commercial aircraft in history [0]. Boeing
clearly was unhappy with that and needed to do something to compete.

[0]
[https://en.wikipedia.org/wiki/Airbus_A320neo_family#Orders_a...](https://en.wikipedia.org/wiki/Airbus_A320neo_family#Orders_and_deliveries)

~~~
sisu2019
okay let's back up and remember that you can't compete with a plane that
crashes all the time. So the reason is clearly not profit, in fact they will
lose a big chunk of money from this and that wasn't hard to predict at all.

~~~
bumby
I think it's a mistake to imply the managers making this decision had our
hindsight knowledge. Of course if they knew this would happen they would've
taken a different course, for profits and other reasons.

To your larger point, I think as systems get more complex it's much more
difficult for management to make accurate risk/benefit decisions. Take the
Shuttle Challenger disaster. In Feynman's report, the management estimated
something along the lines of a 1-in-100,000 chance of catastrophic failure. I
think the actual number was ultimately reported around 1-in 1,000. The exact
numbers from memory may be off, but the point is as systems get highly
complex, increased interfaces and interactions lead to more failure modes.
Understanding them all is really tough

------
rdiddly
Despite the possible negative consequences for our salaries, we need to work
to remove the divide between the software people and the subject-matter
experts. Those who use, but don't necessarily build, software, seem to place
undue trust in "the computer" to be always right, whereas we all know the
computer is just being instructed by ordinary human schmoes like us. And we
(the schmoes) are reliant on what to me seems like a tiny bottleneck of 2-way
communication with the SMEs.

It doesn't need to be binary technical/non-technical; everybody is, to some
degree, technical. If you can use a knife, you're technical. And it doesn't
need to be this rigid specialization, I do computers / I do aviation. If we
removed that divide we might start to widen and deconstrict that bottleneck,
and we might even start to find more people like this author who know both,
and who don't need to have an aviation SME tell them to check more than one
sensor for example, because when they're writing it, and coding the part where
you set or decide the value for AOA, they automatically say to themselves "hey
let's add a cross-check here like I always do when I'm flying." Specialization
seems like a magic bullet but it also creates a big burden of oversight &
communication, I guess is my point. It's like splitting the monolith of the
human world into microservices, and with the same resulting problems.

------
cmurf
Author suggests the 737 is fly by wire, by saying there's no direct feedback
in the control stick forces as it relates to control surface forces, but are
rather artificially presented by computer. That's simply not true.

Even in the case of MCAS, the side effect of stabilizer nose down trim is to
make elevator backpressure on the yoke more forceful. It is not a simulation.
That's presumably its design goal is, and why it was so simple without any
safeguards for overcorrection or a failed sensor.

I also don't like the suggestion the plane is longitudinally unstable due to
the engines. That is simply not consistent with FAR 25.173 (a). A central
feature of this stability requirement is the plane will recover from a stall
merely be releasing back pressure. Not all airplanes do this because not all
airplanes are designed that way nor are they required to, but FAR 23 (normal
category) and FAR 25 (air transport category) aircraft are required to exhibit
this kind of stability as well as lateral static stability. Yet people keep
saying the airplane isn't stable and MCAS makes it stable.

You can't have a damn switch that instantly makes the airplane unairworthy,
with a damn airworthiness directive that tells pilots to solve one problem
(runaway trim, or MCAS upset) by making the airplane unairworthy. If MCAS is
there to make the plane airworthy, turning off autotrim makes the plane
unairworthy. That's why I don't buy any claim that MCAS is there to make the
plane airworthy, until there's a preponderance of evidence from reputable
sources that includes an explanation how in the world such a thing is not a
violation of FAR 25.

The sensible explanation is it's a stick force moderator, in order to ensure
the FAA didn't require a type certification for this make/model derivative. A
plane with a type certificate triggers a requirement in FAR 61 for the pilot
to obtain a type rating to fly planes with that particular type certification.
I do find the suggestion of conspiracy plausible, among Boeing, the FAA, and
airlines, to avoid 737 MAX type certification different from prior 737s.
Consistent with that, is this week's ass covering by an FAA board saying they
see no reason for additional 737 simulator training for pilots to fly 737 MAX,
i.e. paving the way for a software update only solution for the current
problem.

------
_bxg1
"Neither such coders nor their managers are as in touch with the particular
culture and mores of the aviation world as much as the people who are down on
the factory floor, riveting wings on, designing control yokes, and fitting
landing gears. Those people have decades of institutional memory about what
has worked in the past and what has not worked. Software people do not."

I wonder if, once basic programming becomes a part of standard education,
these kinds of problems will be mitigated. Right now if you're a developer,
you're not anything else. So developers come in as outsiders to any given
industry and have to learn about it. But software is applicable to nearly
every industry, so we end up with lots of domain-specific code written by
people who don't know those domains very well. What if the domain experts
could not only speak the language of code, but could do some of the coding
themselves?

~~~
Sharlin
_> What if the domain experts could not only speak the language of code, but
could do some of the coding themselves?_

Some can, especially in the academia. It… sort of works but is certainly not
optimal. The truth is, it is not enough to be a coder who knows something
about the domain, and neither is it enough to be a domain expert who can code.
You have to be an expert at both, like the author of TFA. Or at the very
least, your _team_ has to be.

~~~
_bxg1
Could there not be domain experts who know how to express their general ideas
in terms of code (pseudocode, even), paired with coding experts who can
architect the overall software system? You'd still need software experts, but
you wouldn't have _just_ software experts.

~~~
Sharlin
Yes, as I said: _teams_ are a superorganism, a force multiplier. But a team of
coders isn't enough, you need a close-knit team of people with diverse areas
of expertise, and those people need to be great at communicating.

------
adreamingsoul
Something that stood out to me in this article was the author's opinion of how
software developers are removed from the subject matter.

From my experience I tend to agree but I hope we can change this.

For context, I'm a software engineer and designer with a UX focus. As part of
my process to design a system, interface, feature, or fix I first immerse
myself into the world of the person who is using the software.

What's the environment that these people are working in? What are their
motivations and frustrations? What allows them to be successful in their job?

Over the last couple of years I've had less and less time to answer those
questions. Either because of a high workload or because management didn't see
the value in spending time to research the people using our software. That
trend is alarming and concerning to me. If we as software engineers are not
aware of the people who are using our software, how are we solving the right
problems?

~~~
linuxftw
We're agile now. We don't have a solid view of the problems we need to tackle,
we're just given a little user story that says "Plane noses down with these
inputs" and we code it up, send it over the wall.

The larger the org, the less likely the actual engineers that will design the
system are invited to the planning and arch meetings. So when stupid ideas
come up in those meetings, there's no one to say "no, that's dumb" because
management only knows how to say yes to their boss.

~~~
mch82
The Agile Manifesto calls for frequent conversations and demos with customers
so developers understand customer needs and acceptance criteria. If you’re
given a story card & there is no customer interaction, then there’s a chance
the team is calling itself Agile without actually using Agile.

------
amluto
One thing I really don’t get about MCAS: even disregarding all its bugs, it
seems like a really awful style of envelope protection. I’m neither a pilot
not an expert, but I can imagine several more reasonable strategies when the
pilot pulls up too hard: ignore the problematic yoke input, offset the
elevators a bit, or apply more force to the yoke. Moving the _stabilizer_ out
of trim seems totally wrong.

As an analogy: almost every car has envelope protection. But no car designer
in their right mind would build an ABS system that responded to wheel lock by
moving the brake pedal or loosening the master cylinder (and not putting it
back!). Similarly, it would be crazy for a stability control system to counter
oversteer by offsetting the steering wheel (and leaving it offset).

~~~
cjbprime
Elevator probably isn't powerful enough to avoid the impending stall once you
get there.

~~~
cjbprime
... and the 737 MAX control surfaces are mechanical. I don't think there _is_
any ability to ignore yoke input! It's connected to the elevator with a cable.
This isn't an Airbus-style fly by wire aircraft where input signals are being
interpreted as suggestions for actions for a computer to take.

------
fogetti
Let me tell one thing: I absolutely hate sloppy engineering. And by sloppy I
also mean rushing. There is a class of engineers and engineering managers who
think that they are rockstars because they can churn out code quickly. I am
absolutely disgusted and sick of this attitude.

And why do I bring it up here? Because I suspect this kind of attitude was
partly blamed for this fatal and tragic accident. Of course this is pure
speculation. But my 12 years experience in the software industry makes me
believe that's what happened.

Also a relevant talk by Uncle Bob:
[https://youtu.be/ecIWPzGEbFc](https://youtu.be/ecIWPzGEbFc)

~~~
robertAngst
>I absolutely hate sloppy engineering.

>Of course this is pure speculation

Cool ideas, people have already figured out this was designed into the system.
Sure more testing can catch things, but how much money are you allowed to
spend on every single feature?

There isn't a 'right' answer to these questions, engineers are literally doing
cutting edge, never before jobs.

Everyone wants more time, more money, and better suppliers.

You need a call to action, not just a youtube video talking about groups not
being responsible.

------
tlc1970
This article increases my concern that the primary issue here is a failure in
process to ensure the safety of passengers before the 737 Max went to market.

We have world class expertise in the aviation industry, and there is no excuse
for rushing a product to market without fully vetting it for safety.

Boeing, the FAA, and the airlines moved too quickly to place the 737 Max in
the air and too slowly to ground the planes after the second crash. In both
cases it appears that these decisions were influenced by an over-reliance on
the capabilities of technology over people.

------
outworlder
> When MCAS senses that the angle of attack is too high, it commands the
> aircraft’s trim system (the system that makes the plane go up or down) to
> lower the nose. It also does something else: It pushes the pilot’s control
> columns (the things the pilots pull or push on to raise or lower the
> aircraft’s nose) downward.

Wait a minute. Since when does MCAS include a stick pusher? I haven't flown
anything outside simulators yet, but AFAIK 737's do not have stick pushers of
any kind.

~~~
ncmncm
This is an error in the article. 737-MAX is not a fly-by-wire system. The
forces felt by pilots are derived directly from aerodynamic forces on the
control surfaces.

------
HankB99
I'm unclear on how moving the engine up causes application of power to cause
the attitude to tend to go nose up. I would expect the opposite to happen. I
think something else must have changed such as the center line of the engine
relative to the center line of the plane. Or did moving the engine forward at
the same time cause this effect?

~~~
leoedin
I don't think it does. Other sources I've read suggest the larger nacelle
introduces a pitch up during certain flap configurations and at a high angle
of attack. It's not the engine thrust that's the problem, but the drag due to
the large and further forward nacelle. That's the justification for the MCAS
software.

[https://theaircurrent.com/aviation-safety/what-is-the-
boeing...](https://theaircurrent.com/aviation-safety/what-is-the-
boeing-737-max-maneuvering-characteristics-augmentation-system-mcas-jt610/)

~~~
ncmncm
It's both. The engine thrust center is also farther below the center of
pressure of the airframe. And it can apply a lot more thrust than the original
engines.

The point about lift from the engine nacelles was that it acts as positive
feedback: if you are already pitched too high, it acts to worsen the problem.
And, because it is farther forward, it has a greater lever arm to act.

~~~
HankB99
Thanks, Those factors make sense.

------
torgian
“ I believe the relative ease—not to mention the lack of tangible cost—of
software updates has created a cultural laziness within the software
engineering community. Moreover, because more and more of the hardware that we
create is monitored and controlled by software, that cultural laziness is now
creeping into hardware engineering—like building airliners. Less thought is
now given to getting a design correct and simple up front because it’s so easy
to fix what you didn’t get right later.”

This is why I feel the software industry has too many problems with security,
performance, etc.

I’d like to think that engineers care how good and efficient their code is.
But too often, it’s up to managers or customers how quickly software needs to
be completed.

This introduces bugs, incomplete features, and (in the case of Very Important
Thigs to keep you Alive) potentially cause dangerous breaks.

It sounds like Big Corp was trying to push that laziness and lack of foresight
into other engineering disciplines. If so, how many of our cars, planes, and
other items that potentially directly affect our lives are affected by
mechanical design flaws and software errors?

Hopefully the industry gets its shit together as a whole. As it stands, if I
ever work on anything that affects a life, I’m damn well blowing a whistle if
I feel like something is off.

------
PaulAJ
One issue about the "bitey dog" MCAS not mentioned was the story of Air France
447
([https://en.wikipedia.org/wiki/Air_France_Flight_447](https://en.wikipedia.org/wiki/Air_France_Flight_447)).
Thats the one that stalled into the Atlantic in 2009. The flight crew plus a
pilot that was riding in the back seat spent minutes trying to debug the
flight computer, and only realised too late that the copilot was pulling back
on the stick the whole time, keeping the aircraft stalled.

This happened because of mode confusion: the aircraft computer realised it had
compromised sensors and had switched to "alternate law", in which the computer
would not override a stall. I have no doubt that Boeing knew about this
incident and did not want to create a repeat.

------
euske
Slightly off-topic, but I generally found IEEE Spectrum having more
interesting articles than Communications of the ACM. Maybe they're more
inclined to the industrial side whereas CACM is more academic?

------
zerogvt
"So the FAA said to the airplane manufacturers, “Why don’t you just have your
people tell us if your designs are safe?”" This

------
tlc1970
Frankly, as a traveler, I have lost trust in Boeing, the FAA, and the
airlines.

This article underscores my concerns that the 737 Max was rushed to the market
too quickly, and that there was an unacceptable delay in grounding these
planes after two crashes.

The expertise of the airline and software industries is world class. However,
the process by which a determination was made to consider the Max 737 safe has
failed us as travelers.

------
salawat
Couple pedantic quibbles, forgive me:

>I will leave a discussion of the corporatization of the aviation lexicon for
another article, but let’s just say another term might be the “Cheap way to
prevent a stall when the pilots punch it,” or CWTPASWTPPI, system. Hmm.
Perhaps MCAS is better, after all.

It's not actually just intended for when pilot's pour on the thrust, it's for
any time they pour on the AoA. There need not be any throttle change involved
to bring this about. The most readily imagined example, however, is low speed,
large throttle increase like you'd have if you were aborting a landing attempt
to go around.

>When MCAS senses that the angle of attack is too high, it commands the
aircraft’s trim system (the system that makes the plane go up or down) to
lower the nose. It also does something else: It pushes the pilot’s control
columns (the things the pilots pull or push on to raise or lower the
aircraft’s nose) downward.

Nothing I've read characterizes MCAS as having an inbuilt-stick pusher. I've
mentioned before that MCAS shares a spot with stick-pushers in terms of what
they are trying to do, and problem they are employed to solve but MCAS does
not _actively put force on the stick through an extra mechanism with the
intent to actuate a control surface deflection, and alerting the pilot through
the haptic response_ , which is the defining characteristic of that type of
system as I understand it. All MCAS does is modify the trim, which has the
effect of passively modifying the flight characteristics of the aircraft.

>In the 737 Max, like most modern airliners and most modern cars, everything
is monitored by computer, if not directly controlled by computer. In many
cases, there are no actual mechanical connections (cables, push tubes,
hydraulic lines) between the pilot’s controls and the things on the wings,
rudder, and so forth that actually make the plane move. And, even where there
are mechanical connections, it’s up to the computer to determine if the pilots
are engaged in good decision making (that’s the bitey dog again).

As far as I am aware, Boeing has maintained manual reversion with regards to
the trim system and yoke in the 737 MAX 8. I.e. there are direct connections
to the control surfaces from the pilot's hand actuated controls. Boeing's
concession to FBW is by implementing parallel automation managed control
circuits that allow for electrically driven manipulation of control surfaces
fed back to the pilot through the mechanical linkage. The envelope-protection
-> bitey-dog analogy in this case, is still an accurate characterization.

>But it’s also important that the pilots get physical feedback about what is
going on. In the old days, when cables connected the pilot’s controls to the
flying surfaces, you had to pull up, hard, if the airplane was trimmed to
descend. You had to push, hard, if the airplane was trimmed to ascend. With
computer oversight there is a loss of natural sense in the controls. In the
737 Max, there is no real “natural feel.”

As far as I can ascertain, there still is "natural feel" in regards to pitch
control. There are no hydraulic boosters in place. The problem though, is that
MCAS actually hides the aberrant behavior at high AoA from the pilot by
intentionally down-trimming the plane. This gets the job done (by some
definitions), but does it instead through the trim system (which operates
automatically in other circumstances enough where the pilot may mistake the
behavior for some other system) and without doing anything to the stick (the
primary instinctual control for the plane).

Other than those quibbles, this is a beautiful write-up that is a treat to
read. The Supplemental Type Certification section was also informative for me,
as it illustrates unambiguously that there was a process to handle this exact
type of augmentation which was sidestepped by Boeing for whatever reason.

Bravo!

------
ggm
DNS: changing the engines mid-flight.

we've used this analogy.

------
torgian
But did they make it mobile first?

Apparantly not.

------
crimsonalucard
The irony of software is that it is the only discipline in engineering where
results can be proven with logic. Yet we still do testing on it as if it was a
blackbox.

~~~
bollu
I'm not sure you understand how difficult it is to prove software correct.
I've written a decent amount of Coq code. It's quite bonkers how much of proof
one needs to write to get anything done. For reference, the certified compiler
CompCert's code base is something like 10% code and 90% proofs.

~~~
tluyben2
It would ensure people thought about it a lot longer and harder than without
those proofs, so the code had a whole lot more critical thinking done over it.
Seems worth it in some areas like airplanes.

~~~
mikeash
The trouble with formal proofs is that you can only prove that the code does
what you say it does, not that it does what it needs to do.

MCAS performed as designed. Only reading one AoA sensor, acting on bad
readings, not rejecting values that are clearly out of bounds, operating
continuously without any limits on its pitch authority, all of this was how is
was designed to work. You could have proved MCAS “correct” and ended up with
the exact same result.

A lot of software problems are due to discrepancies between the design and the
implementation, of course. But it’s not a panacea.

~~~
tluyben2
Yes, agreed, but for life or death situations, as a programmer, I would prefer
to use all tools at our disposal; besides money it does probably improve the
quality if you spend this time.

~~~
mikeash
I think this is likely to be true, but I do wonder if there might be a better
way to spend those resources. For this particular example, you’d have been
better off putting more money into flight testing with various sensor errors.

~~~
tluyben2
Well, yes. But, and who knows if it is true, but you would think you can think
up that you need sensor errors to be tested and maybe figure out you need
three errors vs two for instance to cover all cases; formal systems can help
you model and reason about that and come up with cases that are not covered.

I think the point is which these disasters show, and some others in the past
(was there not a recalled Japanese car with a software issue?) that a few
million more for the right resources will save you a lot more down the line.
Problem is that this is not really a statement we can prove for formal
verification because we do not have enough to compare with; i have a very
strong feeling it will make quite a significant difference; if not for the
proofs themselves then for the sheer number of hours and time the proofwriters
thought about it by the time of delivery.

~~~
mikeash
Two sensors is enough if you can shut off the system when they disagree. MCAS
didn’t doom the plane if it shut off, so that was enough. The problem was that
they didn’t do this, and instead used only one sensor at a time. Basic sense
and standard practice tells you not to do that. I don’t know if more formality
would have helped them not do something so boneheaded.

However, this looks like an unusual case. Based on the fact that they had such
a dumb design for this system, that shouldn’t have passed muster in any
aviation context no matter your software methodology, it’s probably not
representative enough to draw a larger lesson about engineering. The larger
lesson seems to be business and regulatory.

Regarding Japanese cars, Toyota went through a big thing with unintended
acceleration that got quite a few people killed. Software was suspected, but
the cause was ultimately determined to (mostly?) be a combination of drivers
mixing up the accelerator and the brake (happens more often than you might
think) and unsecured floor mats pushing the accelerator down. I bought a
Toyota not too long after this and the dealer made a special point to show me
how the floor mat attached to the floor and that I must be sure it was solidly
connected.

Their software was audited and apparently it was really badly made. It was
described as “spaghetti-like” and had global variable abuse, ignored errors,
failed to restart unresponsive tasks, and had potential memory corruption
problems. It’s quite possible that this really was the cause, and a jury even
found this to be the cause in a civil trial over one of the crashes, but it
was never definitely linked.

