
Lack of redundancies on Boeing 737 MAX baffles some involved in developing it - bobsil1
https://www.seattletimes.com/business/boeing-aerospace/a-lack-of-redundancies-on-737-max-system-has-baffled-even-those-who-worked-on-the-jet/
======
crocal
I am a designer and implementor of vital speed and distance measurement
systems, with triply modular redundancy, used in mass transit application.

I cannot even imagine what the designers of this thing are going through now.
It must be terrible.

To make it worse, it's a confusing topic. There are two pillars to the design
of such system.

1\. Faulty sensor must be detected with a very high probability. The typical
way to achieve that is redundancy and diversification. The exact amount of
redundancy depends on the reliability of the sensor considered. In most cases,
it is sufficient to have 2 sensors, but of different models, in order to avoid
common mode of failure.

2\. In case of a failure, the system must have a graceful degradation, and in
aeronautics, this means a clean handover to the pilot.

So, in the case of this MCAS thing, having two sensors is not necessarily a
bad thing, and what M. Kornecki reports is 100% correct. What looks strange is
the way a single failure was managed by the software, and how the procedure to
recover was quite complicated and, even worse, not exported to the training
material of the pilots. In my world, we called this "exported safety
requirement application conditions" \- SRACs, and verification of their proper
allocation is a big chunk of the safety case. More than discussing
architecture, in my view, the investigation must explain why the organisation
failed to perform this activity.

The case for a third sensor can be made to decrease the likelihood of having
to bypass the system ("belt and suspenders", as says M. Kornacki), but based
on my experience, it will not be sufficient. As other correctly report here,
it's not a silver bullet.

Ultimately, the degradation scenario and pilot handover is part of the overall
system safety.

~~~
hbarka
I enjoyed your comment because I’m particularly interested in Tesla’s
Autopilot system and your mention of “graceful degradation and clean handover”
sounds like it should apply to driving autonomy levels as well.

~~~
wjnc
In the way we use cars there seems no time for graceful handover. You're about
the same time interval away from the next car as the time to gain situational
awareness, probably less. A plane is usually quite some minutes away from the
ground and other planes, or f.e. when landing on autopilot the pilots are not
distracted. If you'd apply this standard, the bar for autopilot in cars would
be very high indeed. (That might be warranted, although in a risk / reward
perspective humans are terrible drivers too.)

------
js2
Despite the headline, the article ends with:

 _That triple-sensor system isn’t foolproof, however.

In 2008, on a customer-acceptance flight of an Airbus A320, two of the angle-
of-attack sensors froze and those two sensors then outvoted the third. When
the pilots went to demonstrate the stall-prevention system, they were not
aware of the malfunctioning sensors. The plane crashed, killing the seven
people on board.

The same problem arose again on a 2014 Airbus A321 Lufthansa flight leaving
Spain. Eight minutes after takeoff, two of the angle-of-attack sensors froze
at the same pitch. This time, after a drop in altitude, the pilots were able
to regain control and complete the flight._

Any automation system can fail, and when that happens, hope that through some
combination of luck, skill, experience, and training, the pilot will get the
plane safely to the destination.

~~~
cletus
Yep, 3 sensors can fail too. Less often however.

These sensors exist to solve a problem with the MAX design. Changing and
moving the engines increased the likelihood of a stall when the engines could
push the nose up (as I understand it). Fine.

But here's the kicker: this should be something that pilots should be trained
on. They should be aware of how the MAX is different to the previous 737s and
know what to do to disable this system if it causes problems.

But that might be the end of a common type rating, which is something the
airlines (and apparently Boeing) didn't want.

The whole 737 MAX situation just looks like a giant clusterf--- that was a
kneejerk reaction to the unexpected success of the A320neo to which Boeing had
no answer and to avoid years-long development delays and losing more customers
to Airbus, it really looks like they made shortcuts. And that's so damaging to
their brand it really defies belief.

~~~
Aser
The pilots were trained on how to deal with a runaway trim stabiliser. The
procedure hasn't changed from the old 737, the only thing that has changed is
that it is that the failure mode is more likely to occurr on the 737 MAX.

From the article: “A properly trained pilot should be able to solve an MCAS
anomaly or any uncommanded flight-control input through procedures that are
taught to all 737 pilots,” said Menza, noting that the emergency information
Boeing distributed in December reiterated those procedures.

Here's a video of a competent 737 pilot showing those exact procedures:
[https://www.youtube.com/watch?v=xixM_cwSLcQ](https://www.youtube.com/watch?v=xixM_cwSLcQ)

Obviously the MCAS system is badly designed, but any automation system can
fail and pilots need to know how to deal with it.

~~~
bushido
Per initial reports the black box data suggests that the procedure was
followed, but the MCAS was re-engaged, looks like investigations are ongoing
if the MCAS can re-engage automatically or if the trim was too hard to
manually override and re-engaged by the pilots to be able to use the electric
trim to level the plane.

[https://www.wsj.com/articles/ethiopian-airlines-pilots-
initi...](https://www.wsj.com/articles/ethiopian-airlines-pilots-initially-
followed-boeings-required-emergency-steps-to-disable-737-max-
system-11554263276)

~~~
phire
That actually raised more questions than it answers.

Did the pilots fail to notice the MCAS problem before it put the plane too far
out of trim?

Could they have saved the plane if they stuck to the procedure and kept
adjusting the trim by hand?

Why does MCAS override even direct pilot trim up commands?

Is the runway trim procedure even viable on older 737s once the trim goes past
a certain point? Or are we just lucky that it never happens?

------
4sak3n
"Two is one and one is none."

That's pretty much been aeronautical gospel since dinosaurs strapped cardboard
to their stubby arms and dreamed of soaring.

~~~
4sak3n
To elaborate, I'm pointing out that redundancy has been an integral part of
aeronautical engineering for ages so to ignore it in any situation, never mind
in one where the pilots are kept out of the decision making loop, is
criminally negligent.

------
ptidhomme
What I fail to understand is why do they not design an overriding scheme to
MCAS, such as an "above-threshold" pull on the stick by the pilot. I can
recall that this was a classic design by Boeing. Whatever the number of AoA
sensors, the system should be easy and straightforward to disable.

Those logics of "let's guess automatically if that sensor is failing" are just
pushing the problem further. The real problem is too much reliance on
automatic systems and top little human integration in the overall system (the
human-machine system, that is to say).

~~~
j16sdiz
New overriding scheme means more training. The selling point of 737MAX is any
737 pilot can fly on it without new training.

~~~
tomnm
I can't stop thinking whether similar "compromises" had been made when
redesigning the airframe. It's engineering after all. The engineers could be
pressed by the management to make certain changes for the sake of profit.

~~~
golergka
But managers rely on engineers to let them know when some compromise can be
made and when it is absolutely vital not to do it.

~~~
iSnow
But management also relies on customers to tell them which compromise is vital
to selling a product. And in a market with such huge price-pressure as
aeronautics, I can easily see how this is going to override engineer's
concerns.

~~~
golergka
I don't think that any Boeing customer have said or implied that it's OK if a
plane can crash as long as they save up on pilot training.

~~~
dsego
Didn't they "vote with their wallets"?

~~~
golergka
They certainly voted with their wallets for a plane that doesn't require
additional training. But they sure as hell did not vote with their wallets for
a plane that does that on expense of crashing. And, to reiterate my point, I
think that managers who pushed for trade-offs between different objectives did
not push for this particular trade-off either.

~~~
iSnow
Well, that's the law of unintended consequences. Customers wanted so not pay
for retraining, Boeing wanted to get to the market faster and not wait for re-
certification, managers wanted a physical problem fixed in software - and in
the end the envelope got pushed too far.

I am sure the company operating the Titanic didn't exactly strive to have a
shipwreck either.

~~~
golergka
This is true, but it is a statement about causality and the thread was about
responsibility. Those are two different things. The fact that your actions
lead to a certain outcome does not automatically mean that you're morally
responsible for this outcome (that's the fallacy that leads to victim blaming,
among other things).

It is always engineer's responsibility to clearly explain the trade-offs to
the managers. Only a manager who's making the decision with accurate
information can be responsible for it.

------
FabHK
The crux of the matter:

> if the group had built the MCAS in a way that would depend on two sensors,
> and would shut the system off if one fails, he thinks the company would have
> needed to install an alert in the cockpit to make the pilots aware that the
> safety system was off.

> And if that happens, Ludtke said, the pilots would potentially need training
> on the new alert and the underlying system. That could mean simulator time,
> which was off the table.

~~~
LarryDarrell
It seems like we are determined to live in a Gilded Age Fantasy Camp. It's
like the Titanic not having enough lifeboats.

I've been around the block. I know Big Corp will always place value on profits
over safety, but we seem to be entering an age where they feel unconstrained.

------
jussij
From the information that keeps leaking out it appears the 737 MAX has some
serious design faults.

What is going to interesting is to see how quickly the FAA certifies the new
software changes this second time around.

The current perception would be the FAA was too quick to certifying the
original MCAS as being safe.

I suspect the second time around that certification process is going to take a
lot more time and effort and is going to be much harder to achieve.

And the final nail in the 737 MAX coffin might be that even when that
certification arrives, Boeing then needs to convince paying passenger that is
now safe to climb on board the aircraft.

~~~
magduf
>What is going to interesting is to see how quickly the FAA certifies the new
software changes this second time around.

With a Trump lackey running the FAA, I'm sure they'll rubber-stamp whatever
Boeing puts out there.

What I want to see is if any of the foreign regulators refuse to certify it.
I'd love to see a show-down in international aviation between the Trump
administration and better-run nations.

------
havkom
I have no idea of the facts in this case, but based on how organizations
function in general, one possibility is that the redundancy issues of MCAS
were raised internally by some engineers without result.

In that case, the engineers were probably not thanked internally back then and
are probably not thanked now either.

If their potential concerns would had led to changes on the MCAS design they
would probably not have been thanked either, but rather seen as the people who
unnecessarily made the project more expensive or less profitable.

In conclusion, people who avert or try to avert risk are seldom proportionally
thanked in organizations in general.

~~~
linuxftw
> MCAS were raised internally by some engineers

I think it's quite generous to Boeing to imply the MCAS system was developed
in house. I'm leaning towards 3rd party contract, built to Boeing's spec.
Whoever wrote the spec might be 3rd party as well, with Boeing's 'engineers'
just performing as project managers.

------
bb101
It's worth watching Al Jazeera's series on the production of the 787
Dreamliner, and related cultural issues at Boeing after their merger with
McDonnell Douglas [1].

Personally, I think it's worth taking with a grain of salt as it appears quite
one-sided, but the interviews with Boeing ex-employees and head engineers are
enlightening, and could go some way to explaining the lack of redundant
systems (i.e. cost and time).

[1]
[https://www.youtube.com/watch?v=rvkEpstd9os](https://www.youtube.com/watch?v=rvkEpstd9os)

------
jackson1way
I have a question since many years that I am afraid to ask so straight. There
is the AF447 and those 2 Boeing MAX that I remember where people will always
talk about redundancy of the sensors, or their maintenance.

But really, I want to know why the software and hardware of the aircraft fails
to understand that whatever actions it has performed automatically or
inputs/actions by the pilots, or any other important event, are making the
damn aircraft loose altitude really really fast. In the case of these Boeings,
how can your software keep pointing the nose of the aircraft down WHILE it is
CONSISTENTLY loosing altitude? How is this not a huge design flaw? I
understand that for stall recovery you actually have to loose some altitude to
regain speed/lift and recover from the stall. But if you STILL KEEP loosing
altitude and there is no turning point, then damnit, whatever caused the
loosing altitude should be stopped at least at 500m AGL, no? Switch on a huge
yellow light to the pilots „you are 100% on your own now, the computer is out
of luck and is shutting down“ and the pilot jumps on the gas pedal, 200% power
to the engines, and starts to figure out how to gain altitude. Both Boeing
flights were in daylight, no? Pilots should be able to see the ground/water
and be able to pull the thing back up just by visibility without any sensors?
It‘s a bit of a different case with the AF447, though.

How did the AoA sensors and MCAS won the authority here over the only
important data: altitude change. Who cares if the AoA shows 30 degrees or -271
degrees, as long as the plane is climbing we are good! No need to point the
nose down or do any other stall protection stuff? And that seems to be really
what happened with these 2 Boeings. They were climbing but some stupid AoA
sensor failed and some even more stupid MCAS decided to dive. And they kept
diving until they reached the ocean and MCAS was still confident it was a good
idea to dive?

In some planes my iPhone is able to get the GPS reading and some free app I
got 5 years ago will actually show me ground speed and altitude. Which I
always find very exciting. It usually matches with what the entertainment
system is showing on the pax screens. Give or take <1%. So reliable altitude
reading is solved, no? You have 300 phones on your plane (though only the
window seats have a chance for it to work, but still...). Not sure if the GPS
is gonna work if the plane is rolling and spinning like crazy, which I dont
think happened in any of these cases.

I hope I don‘t make it sound stupid or even i-know-it-better, but I really
just want to understand.

~~~
whatshisface
I only know a little bit about aerospace programming, but from what I do know
complicated chains-of-reasoning are avoided like the plague. With every
additional step of program reasoning the probability that you got it right
decays exponentially. As a result, simple controllers with clearly defined
areas of responsibility are elevated above all else. It is the pilot's job to
integrate the information coming to them from all areas of the plane. Yes, a
computer could do it, but once you get past three or so sensors the
programmer's ability to reason about what _might_ happen shrinks to nothing in
comparison to the pilot's ability to see what _is_ happening.

------
OscarTheGrinch
I'm gonna take a punt on the root cause of this mess: cost cutting / bonus
seeking managerialisim. In my experience people high up in the decision chain
with no technical expertise often have blind faith in what "technology" can
accomplish.

~~~
cjbprime
For what it's worth, "take a punt on" means "avoid dealing with", which sounds
like the opposite of your intent.

~~~
OscarTheGrinch
I was more going for the "make a general guess / bet" kind of meaning, which
is how the phrase is used today in England / Australia / New Zealand.

[https://english.stackexchange.com/questions/20742/why-did-
th...](https://english.stackexchange.com/questions/20742/why-did-this-brit-
say-took-a-punt)

Thanks for the heads up tho, I will retire it from my rhetorical toolkit in
favor of something more universally understood.

~~~
cjbprime
Oh, thanks for the info! Rare to see a phrase with antonymic regional
differences ("I'll try it / I won't try it") survive in today's globalized
world etc etc.

------
FabHK
> “Our proposed software update incorporates additional limits and safeguards
> to the system and reduces crew workload,” Boeing said in a statement.

... "reduces crew workload" so that they have a chance to figure it out in
time and survive. Nice move.

EDIT to add: I guess I find it a bit annoying that, instead of admitting that
they're fixing a terrible bug, they're cloaking it in some corporate "we're
adding even more cool new features" speak.

------
ken
> "A properly trained pilot should be able to solve an MCAS anomaly or any
> uncommanded flight-control input through procedures that are taught to all
> 737 pilots"

It blows my mind that anyone still talks like this.

My first job was writing (non-life-critical) scientific software for detail-
obsessed university researchers working in a lab with notebooks and
procedures. If "just solve it with training" could work for anyone in the
world, it would have been these people. Yet it was immediately and abundantly
clear that designing usability into the system would have far more impact on
their success rate at any task than documentation or training or even the
reliability of the software itself.

The movie "Apollo 13" showed astronauts working in a simulator, and engineers
giving them faulty sensor inputs to try to trick them. They had to train to be
able to identify this possibility, and react correctly to it. This isn't the
sort of thing you can figure out on the fly (so to speak). If the solution to
flying an aircraft with a faulty MCAS sensor was "the pilot should have read
it in a book last month", it's no wonder they were in trouble.

The software industry does make some programs with random inputs and hidden
internal state, and require users to figure out what's going on and solve it
anyway. We call them "games". Perhaps the defining characteristic of a game is
that, even if you've read the manual, you won't succeed on the first try.

> "But, he said, if he were designing the system from scratch, he would
> emphasize the training while also building the plane with three sensors."

Is even three enough? From an earlier version of the "2001" Wikipedia page
[1]:

> "In the story HAL features a design with triple redundancy, so that if one
> of the three modules fails the other two can outvote it. However, there is a
> [[theorem]] in [[computer science]] that proves that for such [[distributed
> systems]] a vote-based [[sanity check]] only works if ''less than'' one
> third of the modules fail. Thus the failure of a single one of HAL's
> redundant modules would be sufficient to compromise the system, as
> apparently happened in the movie."

I don't know what theorem this is referring to. Help?

See also: Segal's Law.

[1]:
[https://en.wikipedia.org/w/index.php?title=2001:_A_Space_Ody...](https://en.wikipedia.org/w/index.php?title=2001:_A_Space_Odyssey_\(film\)&diff=537589&oldid=530391)

~~~
jacobush
Not only that, but the pilots received no training, to uphold the fiction that
the 737 MAX flies the same as the old 737.

------
hairytrog
The only way to be 100% safe and 100% fool proof is to not fly. For a
mechanical system as complex as an airplane, triple redundancy with a good
pilot is pretty much the best you can hope for.

~~~
nutjob2
A plane can still fall out of the sky and kill you.

