
Flight QF72: What happens when automation leaves pilots powerless? - korethr
http://www.theherald.com.au/story/4659526/the-untold-story-of-qf72-what-happens-when-psycho-automation-leaves-pilots-powerless/
======
contingencies
This was already submitted and discussed at
[https://news.ycombinator.com/item?id=14336897](https://news.ycombinator.com/item?id=14336897)
and
[https://news.ycombinator.com/item?id=14328294](https://news.ycombinator.com/item?id=14328294).

My summary at
[https://news.ycombinator.com/item?id=14329082](https://news.ycombinator.com/item?id=14329082)
is still the most in depth researched as far as I can tell (I actually read a
lot of the air safety bureau report).

Basically Northrup Grumman's ADIRU units had sporadic software bugs (off by
one - typical of software written in needlessly low level languages) and the
Airbus control system that combined the output of the three redundant ADIRUs
didn't properly deal with the case of failure in one of the three (even if 2/3
agreed with GPS!), didn't provide transparent training and resolution
procedures to pilots for this class of situation, and was generally trusting
the broken unit instead of the other two or the GPS which were operating fine.
Even calling the crisis center of the major airline for assistance on the
ground couldn't help resolve the situation. So... problems in both companies,
but Airbus is mostly to blame, even if testing could not be expected to tease
out all bugs in third party components, primarily because they were not
logging issues and resolving them from the millions of flight hours these
things already had, and because they had not tested their fly-by-wire system
with worst case input, ie. what's the point of redundancy if you trust the
_broken_ unit?

~~~
mathgenius
After the Intel FDIV bug [1] they started using formal verification to make
sure the cpu's did what they said they would. I don't understand why they
don't do the same for these avionic systems. Maybe it's just not worth it to
the company.

[1]
[https://en.wikipedia.org/wiki/Pentium_FDIV_bug](https://en.wikipedia.org/wiki/Pentium_FDIV_bug)

~~~
flashmob
Would you say that this has limitations and it's not a silver bullet? Eg. What
if your software added numbers, formal verification proved that your program
was indeed correct because it added the numbers. However, it missed the
requirement that you also had to do subtraction, which for some reason was
omitted in the specification.

~~~
mannykannot
You would be no worse off in this situation. Writing a formal spec. arguably
makes it harder to overlook such issues. Formal verification is good at
catching things like off-by-one errors, where the problem is not in the spec.

------
cbanek
They mention it once in the article, but this has basically been true since
fly-by-wire, which was quite a while ago.

While it may seem timely to blame automation for this one, you could just as
easily blame faulty sensors (like the iced up sensors they mention) or faulty
mechanicals, such as pumps, hydraulics, wiring, electrical shorts, etc.

The sad thing is, as we go faster (both in terms of speed and technology),
things will only get more complex. We're also far past the point where human
pilots and crew might be able to do something about it.

Let's say it wasn't fly-by-wire, and it was hydraulics tied to the stick. If
you lost power, you still probably wouldn't be able to move the stick, or it
wouldn't do anything. Either way you are in deep trouble.

The real solution is redundancy. They mention the triple flight computers in
the Airbus, but in the end, there will always be possible problems of all
types. In computers, there are also hardware (not software bugs) errors that
may occur at any time.

One of the things devs love to joke about is heisenbugs. If you have debugged
these, and only found them on one machine, it's probably your memory doing a
bit-flip. Solar radiation can do that, especially at higher altitudes. Just
because we're behind our magnetic field doesn't make it go away, and at higher
altitudes you do have a slightly higher error rate. Now we're just back to
byzantine general territory where you have to figure out which one is lying.
Again, coming down to probability that you didn't lose two at the same time.

While it's easy to say that the system should have just done "the right thing"
\- when you learn more about how the system works, and how complicated it is,
you realize how hard that is. And problems in handling errors or trouble are
some of the hardest to find, and have the most damaging consequences.

Spacecraft (manned or otherwise) have these exact same problems. (Source: I've
worked on aerospace systems)

[https://en.wikipedia.org/wiki/Fly-by-wire](https://en.wikipedia.org/wiki/Fly-
by-wire)

~~~
pdonis
_> We're also far past the point where human pilots and crew might be able to
do something about it._

I disagree. In the incident described in the article, the pilot could have
prevented all of the damage--if only the airplane had responded to his control
inputs. This was not a situation that would have been hard for a human to
handle--if the human had had the ability to tell the airplane: stop listening
to the computers and just listen to me.

 _> The real solution is redundancy._

There was plenty of redundancy in this case. It didn't help.

I think the real solution is, as I said above, to give the human pilot a last-
ditch way of regaining complete manual control of the plane. After all, that's
what the human is there for: to intervene if it is clear that the automation
is doing the wrong thing. That was clearly the case here, but the humans had
no way to stop it. That is not a good idea.

~~~
averagewall
That assumes the human won't mistakenly misuse the manual override and cause a
crash when the computer would otherwise have been fine on its own.

Computers and humans are both fallible and it's not obvious that the human
should be given higher authority than the computer. There isn't a simple
answer to this kind of problem. Ultimately someone or something has to make
decisions based on sensor data/vision/etc and that something or someone might
always get it wrong.

~~~
belovedeagle
This reminds me of some dev tools (can't remember which ones) which require
the user to do something like set an environment variable to
"YES_THIS_IS_A_TERRIBLE_IDEA". If the switch to disable aircraft automation
involves such cognitive barriers, it's possible that pilots could be relied
upon not to use it except when truly necessary. Heck, the barrier could be
that pilots aren't told of its existence until contacting maintenance. It
sounds silly and useless but it would have offered greater assurance in this
situation that landing could be done safely.

~~~
averagewall
I don't know how hard it was to do, but here's a case of some Apollo
astronauts who injured and nearly killed themselves by unnecessarily using
manual override and doing it wrong. I suspect they know full well what it
means but still decide that a special situation requires it. Not so much like
a careless oversight as in programming.

[https://youtu.be/V-jRjyl7Fg4](https://youtu.be/V-jRjyl7Fg4)

The story starts at 47:50. I can't quite get the link to jump straight to it.

------
WalterBright
> control over the horizontal tail – 3000 pounds per square inch of pressure
> that can be moved at the speed of light.

Not quite. If I recall correctly the tail will move about 1/2 degree per
second. Hardly the speed of light. The elevators are hydraulically operated
and can go much faster, but hydraulics don't move at the speed of light,
either. The 3000 psi refers to the hydraulic pressure in the system, which has
kinda nothing to do with how fast things move. Boeing considered going to a
higher pressure to save weight, but rejected the idea because a pinhole leak
at such pressures would act like a knife on human skin.

Source: I worked on the horizontal tail design of the 757.

~~~
PhantomGremlin
Yeah, that particular sentence really grated on me.

I also didn't like: _Booooom. A crashing sound tears through the cabin. In a
split second, the galley floor disappears beneath Maiava 's feet, momentarily
giving him a sense of floating in space._

I initially thought there was some sort of structural failure of the cabin
floor.

~~~
WalterBright
Most articles written by journalists about technical issues read like the
journalist took the facts and ran them through a Mixmaster before printing
them.

One nice anomaly is the "Aviation Disasters" TV series, which gets the tech
right as far as I can tell.

------
_acme
Similarly, Captain Sullenberger said that in respect of US Airways Flight
1549, computer-imposed limits prevented him from achieving the optimum landing
flare for the ditching, which would have softened the impact.

------
wahern
What I found most interesting was how unnerved the pilot (a bone fide Top Gun
graduate) became by his newfound sense of lack of control. It ruined flying
for him and effectively ended his career.

I would have assumed that coming to terms with a lack of control was one of
the first things a fighter pilot would have to learn to deal with. But maybe
you can't be a good fighter pilot unless you can fool yourself into thinking
you're always in control, and that lack of control is really just a personal
failing (didn't train hard enough, not thinking clearly enough, etc).

Or maybe with age you just lose the ability to handle that kind of stress.
AFAIU the older you get the more likely you are to become overwhelmed by
anxiety. Which seemed totally counterintuitive until I watched others get
older--including myself--and saw how that works.

I guess that was sort of the plot to Top Gun, too. ;)

~~~
pdonis
_> I would have assumed that coming to terms with a lack of control was one of
the first things a fighter pilot would have to learn to deal with._

I think it wasn't just the lack of control; it was the inability to do
anything to _regain_ control. My father was a test pilot; there were plenty of
times where his airplane did something unexpected and he lost control. But he
always had some option for regaining it. (In the last extremity, he could
eject--which he actually had to in one test flight. I noticed that the lack of
that option in an airliner was mentioned in the article.)

Also, this was a situation where, as the pilot said, he could easily have kept
control of the plane, if only the plane had let him. But the plane was
listening to the computers instead of him. To me, if you're going to have
highly trained humans in the cockpit, it doesn't make sense not to give them a
last-dicth way of telling the airplane to stop listening to the computers and
start listening to them.

~~~
wahern

      > there were plenty of times where his airplane did
      > something unexpected and he lost control. But he always
      > had some option for regaining it. (In the last extremity,
      > he could eject [....])
    

But that's not really a lack of control. In those cases you're merely being
challenged to regain control of the aircraft. You just need to fight harder,
think smarter, or exit. In any event you still have control (or at least a
sense of control) over your life.

I understand why this issue was so unnerving for the pilot. Heck, any
passenger could sympathize: it's why flying can be so scary--total and
complete lack of control; little to any knowledge about what's happening and
why; you're completely at the mercy of somebody else, or nothing at all.

So, yeah, I get it. I was just thinking that of all sorts of pilots, a fighter
pilot would be the last to be unnerved by losing control; unnerved to the
point of not wanting to fly anymore. a) Fighter jets are super powerful and
super sophisticated and when something goes wrong there may not even be any
time to regain control no matter your training or experience; you're always
flying at that envelope where the unexpected can and often does happen, so I
just assumed you invariably accept that you're flipping a coin every time you
take-off. b) I presume combat is one of those things where you either resolve
yourself to a lack of control or realize you're not able to cope and move on
to something else (if you can, and assuming you can move on emotionally). But
I guess I shouldn't presume a modern fighter pilot (esp. from the 1980s) to
have experienced the same sort of peril as someone on the ground during a war.

That said, I'm not trying to suggest he lost his mind or something. He
responded appropriately in the moment. He was able to let his training and
experience dictate his responses. He consciously understands that he was
unnerved after the fact because of that lack of control, as opposed to not
understanding those feelings and fears. I'm just surprised that those feelings
(or at least their severity) were novel to him and changed his perception of
flying, not that he was unnerved, per se.

~~~
pdonis
_> But that's not really a lack of control._

Not in the sense that the pilot had lack of control of the airliner in this
incident, yes. That's my point.

 _> I presume combat is one of those things where you resolve yourself to a
lack of control or learn you're not able to cope._

Lack of control to a certain extent, yes. No matter how good a pilot you are,
you can still get shot down. (My dad flew in Vietnam before he became a test
pilot, and was shot down once. Luckily, he was recovered.) But you can still
fly your own airplane--often even if it gets hit. (My dad got part of his
right wing shot off during one mission, but he still was able to make it back
to the carrier.)

I think the reason this incident was so unnerving to the pilot was the lack of
control over the one thing pilots are supposed to always have control over:
that when they move the stick or the rudder, _something_ happens.

------
outworlder
> His reasoning is simple: how can the plane stall and over-speed at the same
> time?

It can. It's called the "coffin corner".

~~~
rzzzwilson
I remember reading comments by a U-2 pilot that turns had to be very shallow
at height due to the possibility of the inside wing stalling and/or the
outside wing entering mach buffet.

!

------
korethr
Also found these, the official investigation reports from the incident.

[https://www.atsb.gov.au/publications/investigation_reports/2...](https://www.atsb.gov.au/publications/investigation_reports/2008/aair/ao-2008-070.aspx)

~~~
turbohedgehog
That's disconcerting that they still don't know for sure what caused this
issue and if it could happen again.

~~~
korethr
It is disconcerting. I do take some heart in the fact that there's thousands
of safe operational hours on the flight computers in question. That said, I'd
hope that behind the scenes, there's some engineers pouring over data and test
units to find exactly why it happened and what needs to be fixed to ensure it
can never happen again.

~~~
dtech
They did that for the investigation and couldn't find any probable cause. At
some point you have to admit defeat and accept that the cause will not be
found.

------
ams6110
No system, human or mechanical or computer, is perfect. Human pilots have
crashed more airplanes and are more likely to crash airplanes than automated
systems. Human pilots have also saved airplanes when the systems fail. For the
foreseeable future, it's probably best to have both, but nothing in life can
ever be perfectly safe.

~~~
wahern
Is there a site somewhere that attempts to catalog

1) instances where the flight computer prevented an incident due to pilot
error

2) instances where the flight computer caused an incident

I understand it's not always (if ever) so black & white, and I'm sure there's
a better way to break things down. But it'd be interesting to read something
that attempts to keep some kind of score.

~~~
colechristensen
It's a question that can't really be answered. It's sort of like asking about

1) instances where your alarm clock prevented you from being late to work

2) instances where your alarm clock failed to wake you up

How could you possibly determine how often your alarm clock prevented you from
being late? You can certainly tell how often it fails to do it's job, but you
can't determine how often it's success is necessary because there aren't any
near misses to detect. With no control group there can be no comparison.

You could count incidents which the flight computer is helpful, but you cannot
count incidents that never happened in the first place because of the flight
computer.

~~~
wahern
OTOH, Airbus and Boeing use different systems. Some alarm clocks kick you out
of bed, shower you, and drive you to work. Other alarm clocks just pull the
covers off and poor cold water on you, but still require you to do the rest.

So you could analyze instances where the latter proved insufficient and try to
guess whether the former would have been sufficient.

The bigger problem is probably that there just aren't enough useful extreme
incidents to be draw any conclusions. Or maybe there are. That's why I'm most
curious if anybody has even tried. All I ever hear is that air safety has
increased along with an increase in computerization. But that tells me very
little. Everything about the aircraft and its ground support has been getting
better, not to mention the training of the humans.

------
cmurf
I am a pilot, and my complaint is not the lack of a last ditch emergency
manual mode, but that there are numerous modes (or alternate laws) in the
software abtraction of fly by wire aircraft, and the plane responds
differently to inputs in these different modes, the notification for what mode
the system is in has to be inferred, and the pilot is expected to know all of
these things.

I think it's hubris and incompetence in engineering that a system would be
designed this way. And in fact not all of them have these same deficient
behaviors, so this is not inherent to fly by wire or automation. It's a choice
made by the software abstraction designers.

The idea there'd be a mode where advancing a throttle literally does nothing,
no increase in power nor a notification that the input received is sensed but
isn't going to do what you think it should so that you have a clue in that
some other mode is applied, is so totally batshit insane to me. It's a huge
betrayal of physics of flight which is something pilots do understand rather
well, almost like muscle memory.

In a conventional plane, throttle advance always increases power. If that
doesn't happen, it's an emergency. In a fly by wire plane if it doesn't
happen, it almost certainly means the pilot is confused. And that to me is
crap design.

------
marze
What a poor design. Pilots should have a hard "manual control" switch that
turns control completely to the pilots.

A friend of mine from childhood became a top expert in formal verification,
and got a contract many years back to help Airbus perform formal verification
on their control software. I don't know many details, but according to my
friend's father, he would never fly on an Airbus jet after this experience.

A pilot friend told a story of some pilot friends of his who were piloting an
Airbus in Canada, many years back, who couldn't get the jet to give control
back and get out of a holding pattern to land. They had to wake up engineers
in France in the middle of the night, who told them to take a hammer to
certain fuses or breakers, and let them regain control.

From a UI point of view, the control system and displays in the Airbus are a
disaster. For instance, the pilots in the Air France plane that stalled over
the pacific couldn't figure out that the plane was stalled for several minutes
before it was too late to correct, with all of the displays in front of them.

~~~
hlandau
...Actually, while I'm criticising Airbus, it would be amiss not to mention
the A400M incident [1] [2]. Unfortunately as a military accident the full
report was never made available to the public, but based on the available
information, failure to load calibration data for the engine computers during
manufacturing resulted in the engines shutting down... once they got to the
altitude where that calibration data was needed, and found to be absent.

The idea that Airbus would design engine computers not to check the validity
of vital calibration data until the aeroplane is in the air, and then respond
to that contingency when it is identified by _shutting down the engines_ , is
so obscene I think I've usually tried to assume that there _must_ be something
amiss about the (very limited) reporting as to the details of the issue.

But considering your words, and the general lack of consideration that Airbus
seems to give to these things the more and more I look at accidents involving
Airbus craft, maybe they just really are truly bad at this stuff... it's quite
disturbing.

[1]
[https://en.wikipedia.org/wiki/Airbus_A400M#Accidents](https://en.wikipedia.org/wiki/Airbus_A400M#Accidents)
[2] [https://arstechnica.com/information-
technology/2015/06/repor...](https://arstechnica.com/information-
technology/2015/06/report-airbus-transport-crash-caused-by-wipe-of-critical-
engine-control-data/)

~~~
Avalyst
> Safety officials are still investigating how safety checks failed to spot
> that the calibration data had been deleted.

So I guess the checks for checking the checks of the data failed? You can
never have too much redundancy!

------
robbiep
Read this last weekend on the beach. Interesting.

That poor bastard who put the life jacket around his neck at 30,000 feet and
inflated it...

~~~
codys
To be fair, all the safety briefings I've seen tell you not to do that. ("Do
not inflate while inside the airplane" is the line, I believe).

~~~
knz
Most people assume that's just a precaution to make it easier to move around
the cabin and not a choking hazard!

~~~
fphhotchips
Massive part of the reason is because if the airplane ditches into the water,
the life jacket can make it very difficult for you to escape [1].

[https://en.wikipedia.org/wiki/Ethiopian_Airlines_Flight_961](https://en.wikipedia.org/wiki/Ethiopian_Airlines_Flight_961)

~~~
robbiep
I assume that the pressure differential between sea level and 30,000 feet is
going to change how inflated it gets as well

------
amluto
As I understand it, Boeing planes physically move the cockpit controls to
reflect computer control inputs. Would that approach have made a difference in
this event?

~~~
mjevans
Maybe, surely more so if the pilots resisting that intended change in control
were given priority.

I very much like how the redundant sticks are /forced/ to stay in sync. What's
done to one IS done to the other as well. That's a critical safety feature in
the human interface.

I think a complete human over-ride of 'higher level' functions (disable the
auto-pilot) should require 'two keys' like in War Games (with the nuke silo).
The lower level systems still have to exist, and there should be a way of
troubleshooting (graphs probably) their inputs and outputs over time.

~~~
hlandau
IIRC, Boeing 777s (which are FBW) will automatically disengage the autopilot
if you start forcing the stick to a different position. The flight envelope
protections are implemented by adding a physical resistance to the stick, but
can be overridden by using force, which seems a highly intuitive and
deferential implementation. It certainly seems a lot more responsible than
Airbus's design, both for the above reasons and for the mechanical linkage you
mention.

------
Avalyst
Reminds me of SK751 [1] where an automated system prevented the pilots from
powering down a broken engine, resulting in both engines failing and the plane
crash landing in a field north of Stockholm.

[1]
[https://en.wikipedia.org/wiki/Scandinavian_Airlines_Flight_7...](https://en.wikipedia.org/wiki/Scandinavian_Airlines_Flight_751)

------
nikcub
Happen to another Qantas flight in the area, QF71 - hence the theory about the
high-power VHF transmission station in the area being responsible:

[https://en.wikipedia.org/wiki/Naval_Communication_Station_Ha...](https://en.wikipedia.org/wiki/Naval_Communication_Station_Harold_E._Holt#Aircraft_interference_controversy)

~~~
harshbutfair
I think the second event you're thinking of was a Malaysia Airlines flight
from Perth to Kuala Lumpur.
[https://www.atsb.gov.au/publications/investigation_reports/2...](https://www.atsb.gov.au/publications/investigation_reports/2005/AAIR/aair200503722.aspx)

~~~
madeofpalk
No, there was an incident several months later with QF71 flying the same route
in the opposite direction.

[https://en.m.wikipedia.org/wiki/Qantas_Flight_72](https://en.m.wikipedia.org/wiki/Qantas_Flight_72)
See the subheading in the Final Report section

~~~
madeofpalk
direct link:
[https://en.wikipedia.org/wiki/Qantas_Flight_72#Subsequent_Qa...](https://en.wikipedia.org/wiki/Qantas_Flight_72#Subsequent_Qantas_Flight_71_incident)

------
ptero
This is unquestionably a very scary situation. Kudos to pilots who got the
plane down without loss of life.

However painful this state is (broken autopilot, no full human override) I
suspect fly by wire actually prevents many more accidents than it causes.

------
Figs
This story reminded me of the adage: "They say that to err is human, but to
_really_ fuck up, you need a computer..."

------
sceew
Aren't we always going to have two pilots?

a captain in case the computers fail, another in case the captain fails.

Automation (airpilot) has already been around...?

------
DuckConference
Don't the pilots have the option of engaging alternate flight rules in this
case? Or is that not how it works?

------
boznz
On the whole I would still prefer well tested software over a human anytime.

------
dtech
Jesus, that article contains an ungodly amount of annoying padding.

TLDR; Plane makes two steep dives within 3 minutes without apparent mechanical
failure or pilot intervention, injuring a number of passengers. Plain manages
a safe emergency landing.

Dives were initiated by a flight computer, which could override pilot input,
reacting to faulty data. No one died, but some persons have lasting physical
damage.

~~~
johansch
Yeah, that was annoying to read.

[https://en.wikipedia.org/wiki/Qantas_Flight_72](https://en.wikipedia.org/wiki/Qantas_Flight_72)

Root cause: a fault with one of the plane's three "Air Data Inertial Reference
Units":

[https://en.wikipedia.org/wiki/Air_data_inertial_reference_un...](https://en.wikipedia.org/wiki/Air_data_inertial_reference_unit)

This error condition then exposed a bug in the aircraft's software which
caused the dives.

------
korethr
Note to mods: I modified the title slightly, as the original was too long. I
dropped the word 'pycho' from 'psycho automation' as I felt it had the least
impact of any word to substitute or drop from the title.

