
Boeing 737 Max Crashes: Sensors Vulnerable to Failure - bushido
https://www.bloomberg.com/news/articles/2019-04-11/sensors-linked-to-737-crashes-vulnerable-to-failure-data-show
======
Lind5
Sadly, another "the sensor did it" scenario.
[https://semiengineering.com/dirty-data-is-the-sensor-
malfunc...](https://semiengineering.com/dirty-data-is-the-sensor-
malfunctioning/) The tragic example of the Lion Air crash of a new Boeing MAX
8 aircraft, which on Oct. 29 killed all aboard, may be heading toward “the
sensor did it” category. The black box recovered from the flight showed
inconsistent data from one of the two angle-of-attack (AOA) sensors. With one
half of the data apparently incorrect, it was enough to trigger this plane’s
anti-stall system into a nose down action, which the pilots wrestled all the
way into the Java sea.

~~~
05
MCAS is not an ‘anti-stall system’, it’s a cheap type certification compliance
hack

~~~
Doxin
It’s a cheap type certification compliance hack that prevents stalls.

------
ggm
Root cause analysis: excessive pressure to keep type rating caused lax
engineering to be considered acceptable.

Fix:let the FAA do it's job independently

~~~
zepearl
On a technical level: personally I think that the root cause could be a lack
of "integration" of the system (MCAS) with the rest.

Meaning that e.g.

1) if the pilots' yokes/sticks are being pulled and

2) if the altitude based on gps and/or air pressure and/or radar is decreasing
and

3) if the speed registered by the airspeed probe is high respectively
increasing and

4) if the fuel being used by the engines to keep or even increase the speed is
very low and

5) if the G-force registered by <some sensor, if it exists?> is lower than
usual (neutral or even negative?)

6) etc... (anything else excluding AoA?)

...then the airplane is mooost probably "going down", even if the AoA says the
opposite.

As the MCAS auto-trim has apparently a big potential impact on how the
airplane flies (surfaces on the tail are big and their angle can move a lot),
it should therefore not rely on just 1 (or even 2 duplicate) lonely sensor(s)
but on multiple types that provide multiple point-of-views of the aircraft's
situation to then abstract/determine/confirm the current high-level situation
(in this case "is the aircraft going up or down?") and only after such result
to react/take countermeasures to save the aircraft respectively help the
pilots and/or lower costs for the flight.

If this is not totally wrong, is this (the total integration of all sensors)
how Airbus did it with their super-automated systems like e.g. their side-
sticks/joysticks just telling the airplane what pilots "want"? (meaning, did
they start already from scratch creating a totally integrated system, if they
did create such an integrated system?) (not sure about how automated the new
Boeing airplanes are, so I did not mention them...)

~~~
bbojan
Angle of attack is not related to any of the items that you list. You could be
approaching stall even while your altitude is increasing (2), while you are at
a high speed (3), at less than 1 G (5) etc.

I'm not sure you quite understand what a stall is. A great resource I can
recommend is a book called See How it Flies
([http://www.av8n.com/how/](http://www.av8n.com/how/)).

~~~
avionicsguy
Actually from the link you provided, airspeed is absolutely a quantitative
indicator of a stall.

"The airspeed indicator provides quantitative information about angle of
attack, when the airspeed is not too low. Correction factors must be applied
to correct for nonstandard weight and/or load factors.3"

~~~
bbojan
Absolutely quantitative except when airspeed is low, when weight is non-
standard and when load factor is not 1.

I guess you and I disagree on the definition of "absolutely".

------
jngreenlee
Realize some pilots lurk here...was the angle of attack sensor ever associated
with an automatic system prior to MCAS?

Seems like previous errors would have been less impactful due to human
observation and synthesis with other instruments, window observation, etc.

If so, it shows how GIGO works in automatic systems...this should have been
scrutinized as AOA sensor was not above the 1:1,000,000 threshold for single-
input systems w/o failover.

~~~
PaulHoule
Airbus learned this the hard way in this incident:

[https://en.wikipedia.org/wiki/XL_Airways_Germany_Flight_888T](https://en.wikipedia.org/wiki/XL_Airways_Germany_Flight_888T)

which I think had to do with New Zealand regulators following up on

[https://en.wikipedia.org/wiki/Qantas_Flight_72](https://en.wikipedia.org/wiki/Qantas_Flight_72)

A group of New Zealand pilots and regulators pressured a German pilot to test
the envelope protection systems during a simulated emergency landing (so well
simulated that the pilot did not seem to know where he was going ahead of
time)

The AoA vanes were out of commission with this aircraft, the flight protection
system malfunctioned and the plane flew into the ground.

That is, the danger got caught on a test flight and only 7 were killed.
Compare that to the problem happening on not one but two revenue flights.

Had the FAA required that pilots ride out an MCAS failure in a simulator they
probably would have learned that pilots could not ride out an MCAS failure
before passengers were put at risk.

~~~
prolepunk
AoA vanes not just failed, they failed because of incorrect maintenance
procedure.

>One of the contributing causes was incorrect maintenance procedures which
allowed water to enter the angle of attack (AOA) sensors. During fuselage
rinsing with water before painting, three days before the flight, the AOA
sensors were unprotected. As specified in the Structure Repair Manual by
Airbus, it is mandatory to fit a protection device on AOA sensors before these
tasks.

In this case several failures all happening at the same time cause the crash:
software, pilots following protocols and maintenance.

~~~
alexis_fr
Wow, that still happens? Not the first accident caused by incorrectly covering
and uncovering probes during maintenance. I think once the Pitot tubes were
still covered with their caps.

------
cmurf
Too much is being made about the sensors. From my pilot's perspective it's far
worse that:

a.) MCAS can mistrim the airplane faster than pilots can even recognize what's
going on.

b.) Boeing and FAA directives still don't account for a.), and therefore
they're still giving pilot's inadequate advice. And it's sufficiently bad that
I think it's intentional because the advice is consistent with prior Boeing
737 models: runaway stabilizer. If they came up with a better checklist to run
that only applies to the MAX, that is suggestive that at least difference
training is needed, and might even suggest a different type certificate is
needed.

Much of this is discussed here which I found to be decent:
[https://www.satcom.guru/2019/04/what-happened-on-
et302.html](https://www.satcom.guru/2019/04/what-happened-on-et302.html)

 _incredibly it just dumps the stab and leaves it to the pilot to get it
back._

 _Electric trim must be used to neutralize control column pitch forces before
cutout!_

------
gordon_freeman
So if the sensors have a higher rate of failure then it looks like changing
MCAS software to rely on both sensors may not be enough as there is a higher
probability of both sensors malfunctioning and sending bad data to MCAS at the
same moment. In that case, I am not sure how would MCAS behave.

~~~
VBprogrammer
The other part of Boeings fix is to prevent MCAS from being reactivated
repeatedly. If I remember correctly they are not allowing it to reactivate
until the AoA returns to normal.

I'm hopeful that they have analysed what happens if the AoA returns more or
less random data.

~~~
zepearl
> _...they are not allowing it to reactivate until the AoA returns to normal._

Puah, but what is "normal"? This is probably kind of tricky to define if you
cannot potentially rely on the sensor(s). Additionally I understand that the
auto-trim of MCAS was created to actually react to "abnormal" situations (nose
too high), which brings us here into a conflict :)

~~~
VBprogrammer
By normal I mean below its threshold for activation. According to Boeing this
depends on speed and attitude.

Of course this leaves the problem of what happens if the sensor data was wrong
but fluctuating wildly. Which is what I was getting at in the other paragraph.

In both the Lion Air and the Ethiopian incident the sensor data went to some
very high value and stayed fixed basically until the aircraft hit the ground
so perhaps they know that is the only failure mode or maybe they are
accounting for other failures but don't feel the need to expand upon that in
public.

------
velox_io
I'm just amazed by how poorly this was implemented. Software engineers expect
hardware to fail. This wasn't just the work of a newb (or I hope it wasn't).
This was planned; both the requirements and the failure modes. Then
implemented by another team, and the quality verified by another team, even a
3rd party. There wasn't even a cutoff if it started operating outside it's
expected range, surely one of those people would have noticed the shortfalls.
This software should have the same level of engineering as other systems/
parts of the plane.

Every phone I've had in the last decade has come with GPS (which includes
altitude), giros and even magnetic compasses. So they can work out the angle
of attack without the need for additional sensors, or at the very least a
backup!

Is this a wider issue with software development in general? My guess is that
many businesses are moving away from the waterful model, which can often go
overboard with planning (not really a problem in aerospace..). To 'Agile',
which I don't think many people truly understand (especially some managers),
often sidestepping the planning and documentation to software that appears to
work fine, and senior management are overjoyed with the progress and become
complacent.

This software wasn't even a crucial feature. It was only needed when high
power is applied to the engines (like during takeoff) to stop the nose feeling
too light by existing 737 pilots. This wasn't a jet-fighter, which would
become unstable without the aid of computers. It's a shame 338 died because of
what is effectively a 'schoolboy error'.

~~~
inflatableDodo
Here's a version of how it occured;

"Instead, engineers were able to add just a few inches to the front landing
gear and shift the engines farther forward on the wing. The engines fit, but
the Max sat at a slightly uneven angle when parked.

While that design solved one problem, it created another. The larger size and
new location of the engines gave the Max the tendency to tilt up during
certain flight maneuvers, potentially to a dangerous angle.

To compensate, Boeing engineers created the automated anti-stall system,
called MCAS, that pushed the jet’s nose down if it was lifting too high. The
software was intended to operate in the background so that the Max flew just
like its predecessor. Boeing didn’t mention the system in its training
materials for the Max.

Boeing also designed the system to rely on a single sensor — a rarity in
aviation, where redundancy is common. Several former Boeing engineers who were
not directly involved in the system’s design said their colleagues most likely
opted for such an approach since relying on two sensors could still create
issues. If one of two sensors malfunctioned, the system could struggle to know
which was right.

Airbus addressed this potential problem on some of its planes by installing
three or more such sensors. Former Max engineers, including one who worked on
the sensors, said adding a third sensor to the Max was a nonstarter. Previous
737s, they said, had used two and managers wanted to limit changes."

[https://www.nytimes.com/2019/04/08/business/boeing-737-max-....](https://www.nytimes.com/2019/04/08/business/boeing-737-max-.html)

~~~
Datenstrom
> Boeing also designed the system to rely on a single sensor — a rarity in
> aviation, where redundancy is common.

I worked on the C-2 Greyhound aircraft in a past life. Part of an upgrade they
got, only because it was an absolute requirement to fly in European airspace,
was a magnetometer. The documentation on replacing and testing the new sensor,
of which there was only one, was non-existent. The documentation said that the
sensor could not and would never fail.

While in another country one failed. In disbelief my Chief had me read out
every wire in the system even though I knew it was the sensor immediately. The
bird was down for weeks until we got a new one because supply didn't exist and
I had to make up a suitable plan for testing the replacement on the fly.

I wonder what kind of practices lead to a "this system can never fail"
mentality. It should be assumed everything can and will fail.

~~~
teraflop
This is the second time I've had occasion to post this quote here in the last
week:

"The major difference between a thing that might go wrong and a thing that
cannot possibly go wrong is that when a thing that cannot possibly go wrong
goes wrong it usually turns out to be impossible to get at and repair." \--
Douglas Adams

------
newsoul2019
One thing I haven't seen fully explained ( maybe it was but I missed it ).
Were these vanes damaged on both of these downed flights? Why didn't all the
other 737 MAX 8's crash around the world, Was it something about the flight
profile or the weather? Can the plane be flown as-is with a sufficiently
conservative flight profile?

~~~
VBprogrammer
The speculation on the Ethiopian flight was a bird strike. The data was normal
until a few seconds after take off.

Another issue could be the failure mode, if it failed at or near 0 AoA it
wouldn't cause too many safety of flight issues on a normal flight.

~~~
linuxftw
Well, unless you need the MCAS to kick in, and the plane plummets to the earth
in a hard stall because it didn't.

~~~
VBprogrammer
If a pilot goes anywhere near a stall on a normal commercial flight then they
have fucked something up.

MCAS is only required to maintain a positive slope to AoA - stick force curve.

MCAS resulted in 2 crashes because the implementation was completely brain
dead. Not because the aircraft is impossible to fly without it.

~~~
linuxftw
> MCAS is only required to maintain a positive slope to AoA - stick force
> curve.

You and other keeps parroting this line, I don't believe it to be true. MCAS
is an anti-stall device, show me once source that says otherwise.

> Not because the aircraft is impossible to fly without it.

Not impossible, but possibly also not safe. The system was implemented to
prevent stalls in the MAX 8 planes due to it's flight characteristics.

~~~
webdevatlurk
It's well documented in numerous reports that:

1\. The position and size of the upgraded engines for the 737 MAX caused the
plane to tend to pitch upwards, which could cause a stall. 2\. Boeing was
concerned designed MCAS to automatically push the plane's nose down to prevent
stalls

Every single piece of reporting I've seen on the matter refers to MCAS as an
anti-stall device.

Pilots of 737 who have talked to reporters refer to it as an anti-stall
device.

I have a hard time believing that this information hasn't been fact checked to
hell and back yet.

~~~
VBprogrammer
> Every single piece of reporting I've seen on the matter refers to MCAS as an
> anti-stall device.

Assuming you are a developer, have you ever seen some reporting of something
technical in the news? Does it not make you cringe?

> Pilots of 737 who have talked to reporters refer to it as an anti-stall
> device.

Even Boeing themselves do; so who can blame them. The reason that the
certification item exists is to prevent certification of aircraft which
demonstrate increasingly lighter control forces as the aircraft approaches a
stall. The reason being that it makes it easier for an inattentive pilot to
accidentally fly the aircraft into a stall. So if you want to shorten that to
anti-stall then I'm fine with that.

What I don't really like is the retoric about how these aircraft would fall
out of the sky without MCAS "controlling" the plane. It isn't a closed loop
control system implementing PID control to account for some crazy instability
in the aircraft.

------
cmurf
Cessna 150/152/172 stall warning "horn" works by negative pressure.
[https://www.youtube.com/watch?v=Q1JRTCBkWKQ](https://www.youtube.com/watch?v=Q1JRTCBkWKQ)

And then the Cessna 182 has an aerodynamic switch that activates an electric
"horn". And in either case they can get jammed up with bugs. But we do a lot
of training with slow flight, approach to stall, full stall, and recoveries
including recovery from the ensuing dive and secondary stalls.

------
ansible
I'm wondering now if MCAS should be an active system. Perhaps it should
instead just warn the pilots that it thinks a stall is imminent instead, like
when you get the stick shaking.

~~~
Someone1234
They cannot do that, because then the Max loses its shared type rating due to
the larger engines. MCAS exists to give the aircraft more similar flight
characteristics to its predecessors.

~~~
tzs
MCAS apparently exists for more than that. A few days ago someone posted a
link and some quotes from the FAA regulations on pitch stability. They require
that the control force must be positive up to and throughout a stall.

Without MCAS there would be a point on that MAX where you no longer needed
positive control force to continue pitching up. Without MCAS or some other
system to automatically counter that, so that the pilot needs positive control
force to continue pitching up, the plane could not be certified even if they
did go for a new type rating.

~~~
chipsa
It's not that the force must be positive. It's the the force must be
consistantly increasing up to the stall as well. Without MCAS, the problem is
that the force does not continue to increase to the stall point. It does stay
positive at all times.

------
AceyMan
I wonder if it's feasible to build a guard to cover the AoA sensor to mitigate
the ability of foreign objects damaging the vane.

Clearly it'd need to be very transparent to airflow—something like the face
guards of an NFL player—a heavy gauge titanium cage or something like that.

Btw, the Wright Bros knew the importance of AoA and included one on the Wright
Flyer. Implementation? — a length of straight stick parallel to the wing chord
with a _bit of ribbon_ affixed to the end :-D (KISS at work!)

~~~
HeyLaughingBoy
My general practice is to assume that the engineers in a highly regulated
field are competent and have good reasons for the things they do/don't do. If
there isn't a guard, then it's likely that having one would cause more
problems that leaving it off or it would be ineffective. e.g., hitting a bird
at 100+ kts would damage both the guard and the sensor anyway, rendering the
guard useless.

~~~
noir_lord
Given they test aircraft canopies by firing frozen (after defrosting) chickens
at them it's usually a safe assumption.

Air travel is very safe because of literal generations of engineers working
hard to make it safe, in this case the failure is one of procedure and people
been people (cutting close to the wind on what was allowed).

Things usually tighten up after this kind of thing and then gradually loosen
until the next disaster that could have been preventable.

------
digikata
All equipment on the plane has a potential for failure... the question is what
systematic design and pilot training is in place to take action in the face of
failures.

~~~
peteradio
What? Does the degree of failure not matter at all? That makes no sense. The
question is whether all these parts are actually meeting the specified failure
tolerance rate.

~~~
digikata
It matters, but if the rate of failure is high enough to be expected to be
encountered within an all-fleet, constant flight environment for the lifetime
of the aircraft, then you plan for failures as well as having the pilot know
what to do when there is a failure.

In air sensors like this, their failures are higher than the rate at which you
can ignore. They fail on their own, they get bumped and bent on the ground,
they ice over in the air, or have objects striking them.

------
dreamcompiler
This thing is a small fin on a swivel. It's a wind vane about 6 inches long,
sticking out into a 500 mph airstream. It's a heavy-duty piece of metal, but
that swivel joint makes it vulnerable to bird strikes, freezing, etc.

~~~
dboreham
Hmm...if it works that way it seems quite easy to determine if it is working
-- if it isn't swiveling then it is probably broken. Perhaps also you could
make an active version of this sensor where there's a stepper that pushes it a
bit either side of the current neutral position, then measure resistance to
that push with a strain gauge. It might be possible to thereby know that
you're pushing against flowing air, not bird guts jammed in the mechanism, for
example.

~~~
dreamcompiler
Probably could be done but it would require significant reengineering because
it introduces new failure modes. Best approach is what everybody else does:
Assume the AoA sensor can fail and design the software accordingly. Boeing
didn't do that and the results were fatal.

~~~
noir_lord
Use an odd number > 1 and sound an alert if they can't reach a quorum.

Triple redundancy is pretty common in safety critical systems.

------
bookofjoe
The elephant in the room, which I've not yet seen alluded to: Boeing wants at
all costs (which will be very high) to avoid having pilots be required to
requalify with new MCAS software in flight simulators.

------
jpm_sd
I am surprised that there is no ultrasonic method of measuring AoA.

~~~
CamperBob2
Exactly, using mechanical vanes for this sounds pretty stupid, given what can
be done with acoustical signal processing. It must be a more difficult problem
than it seems.

~~~
HeyLaughingBoy
Not necessarily. It could be that after doing the safety analysis, the
mechanical method is less likely to fail.

Remember that there are usually many ways to skin a cat. Often the one chosen
hits the sweet spot of reliability, manufacturability, cost (both initial
purchase and operational), maintainability (e.g., how easy is it to
test/inspect), etc.

When you have decades of experience with something and it works well enough,
there's generally no reason to improve it when your engineering resources can
be focused on more pressing problems.

I was a team lead on a new system that was based on the IBM ISA bus in 2013.
The reasons were simply that it worked well enough and we had tons of
experience with it and could reuse a lot of custom-built hardware and code.
The only downside (and reason we switched), was that it was getting harder to
find blade PCs that still had an ISA bus and the system was expected to be in
the field for 15+ years.

~~~
jpm_sd
I was hardware lead on a system that was the 2013 revision to an existing ISA-
bus blade PC architecture. We threw it all out and switched to an off-the-
shelf DIN rail PC from Beckhoff with snap-on EtherCAT modules. It took a
little extra development time but it was 100% worth it.

Sometimes you gotta throw it out.

------
salawat
This is not even remotely surprising, and by all standards should be taken
into account in your implementation of dependent systems' gradual degradation
fault modes.

------
mansoor_
_Everything is vunerable to failure_

------
markphip
Shouldn’t it be easy to detect a stuck sensor and then stop using its data
unless it recovers?

------
ngcc_hk
3 is the minimum for any critical system.

------
lurker0094
So, 140 failures in 30 years of aviation... ~4 failures per year FOR ALL
FLIGHTS IN A YEAR. That's unreliable?

~~~
peteradio
140 incidents. Untold number of averted failures.

------
shadykiller
Shouldn't the headline say "Boeing 737 Max" ? There's a big difference
compared to regular 737s

~~~
oliveshell
I think it’s well-enough implied by ‘crashes’. Regular 737s haven’t been
crashing recently.

