
Blackout Bug: Boeing 737 cockpit screens go blank if lands on specific runways - DemiGuru
https://www.theregister.co.uk/2020/01/08/boeing_737_ng_cockpit_screen_blank_bug/
======
jperras
Is someone dividing by zero somewhere?

"All six display units (DUs) blanked with a selected instrument approach to a
runway with a 270-degree true heading, and all six DUs stayed blank until a
different runway was selected."

cos(270°) = 0

~~~
FriendlyNormie
angle = arctan(dy/dx) where dx=0.

I guarantee this is the mistake they made.

In some languages you can use Math.atan2(dy,dx) to easily avoid this.

~~~
phkahler
>> I guarantee this is the mistake they made.

No. East-West runways are very common.

If I may speculate... the blanking of 5 displays is probably not a bug. Maybe
it's a (misguided) feature meant to keep people from landing at certain
restricted airports and they coded the logic incorrectly?

~~~
cameldrv
My guess is that it's not just any old "Runway 27", but it's runways where the
coordinates of the endpoints of the runways have precisely the same latitude
down to whatever precision is in the database. It's probably quite rare to
have a runway that's this perfectly east-west.

~~~
cameldrv
Looking at the coordinates of the runways in the FAA database, they are all
perfectly east/west within about 2 feet of deviation N/S. This would round to
zero if the calculations were done as lat/long as single precision numbers.

~~~
egdod
Are there other runways that are so dead-on?

------
nradov
That reminds me of the C-130 avionics failure when flying at negative
altitude.

[https://www.popularmechanics.com/military/aviation/a26598/c-...](https://www.popularmechanics.com/military/aviation/a26598/c-130-sea-
level-dead-sea/)

~~~
rob74
> The lowest airfield in North America, for the record, is Furnace Creek
> Airport, Death Valley at minus 210 feet below sea level.

So, then I guess it's 210 ft _above_ sea level? (SCNR)

~~~
reaperducer
Been there. Almost nobody uses it. But from what I read in the newspapers, the
few times it does get used, it seems to have more than its share of crashes.

------
mcsoft
While it is easy to blame Boeing these days, I suspect we have survivorship
bias here. Dig into Airbus software and all sorts of similar scary bugs will
pop up. For instance, one may remember inconsistencies between airspeed
measurements that resulted in the loss of AF 447 on June 1, 2009. Overall,
it's great that at least some airliner's closed software finally gets more
eyes looking at it.

~~~
umvi
I seem to recall Airbus having different engineering teams across EU using
different versions of the CAD tool and then when they went to build the plane
the wires were too short because one version of the CAD tool accounted for the
radius of wires turning a corner while the other didn't.

[https://www.nytimes.com/2006/12/11/business/worldbusiness/11...](https://www.nytimes.com/2006/12/11/business/worldbusiness/11iht-
airbus.3860198.html)

~~~
zaphirplane
Had only the version that didn’t account for turning a corner used, would
things have worked out ? I don’t follow

~~~
Piskvorrr
The measurements would have been the same in both cases: therefore it would
have been a) designed correctly and implemented correctly, or b) designed
incorrectly, but caught in the design phase, looping back into a). This way,
you get c) designed correctly, but implemented incorrectly or d) designed
incorrectly and the design implemented as specified, in both cases leading to
an unworkable implementation.

------
awinter-py
The first F-22 they delivered to japan had serious trouble when it crossed the
international date line

Sounds like nobody was hurt & they didn't lose any planes, but there was some
period of time where the onboard computers were not good

[https://www.defenseindustrydaily.com/f22-squadron-shot-
down-...](https://www.defenseindustrydaily.com/f22-squadron-shot-down-by-the-
international-date-line-03087/)

Yet another reason to exercise low-level math functions with a wide range of
inputs -- maybe not on every run, but at least in pre-release integration
tests

There are only 4 billion floats

[https://randomascii.wordpress.com/2014/01/27/theres-only-
fou...](https://randomascii.wordpress.com/2014/01/27/theres-only-four-billion-
floatsso-test-them-all/)

~~~
sudosysgen
F-22 weren't exported, it was probably just at Okinawa

------
taylodl
I'd like to learn how Boeing's software delivery practice has changed. Sounds
like there may be some lessons learned we can all learn from.

~~~
Frost1x
I would hazard a guess: their process became "agile."

~~~
eskaytwo
You should clarify what you mean by “agile”. It means so many different things
in different contexts.

A large part of this is likely the cost-driven pressure that lost the CEO his
job. Agile can be abused certainly.

~~~
nybble41
> Agile can be abused certainly.

... where "abused" appears to cover most good-faith efforts to implement Agile
in practice. When it works it's Agile—despite the fact that these
implementations have very little in common. When it fails the mantra is always
"Agile wasn't implemented properly". Agile has every characteristic of a
utopian ideal that cannot survive contact with reality, or at least the
realities of safety-critical software.

~~~
Jtsummers
Big-A Agile is often described in Utopian terms. Which is unfortunate, but
little-a agile is understood to be a thing you grow towards with constant
improvements and changes to both the process and the system being developed
based on frequent, ideally continuous, feedback. Boeing, in my experience,
didn't want to change their processes, they wanted to run the same processes
faster. Which is decidedly _not_ agile (and I hate Scrum for this).

The form that I've seen them adopt on programs I've been involved with is
heavily inspired by Scrum. So they have sprints, they accept product releases
from the subs and will, ostensibly, test it (running the integration lab they
have the more complete test infrastructure, in theory). But, in practice, the
released code isn't tested for months, but development continues, monthly or
bi-weekly releases continue, and no feedback is ever provided to the
developers. Finally, 6-12 months later most of the builds are discarded, the
most recent one is tested, and it's discovered that, true-to-Waterfall, the
wrong thing was built. Patches are quickly developed and applied, tests pass,
and something is released to the customers.

Months or years later, the software is found to be defective, the test suite
is found to be insufficient (again), and the process repeats.

~~~
nybble41
> ... little-a agile is understood to be a thing you grow towards with
> constant improvements and changes to both the process and the system being
> developed based on frequent, ideally continuous, feedback.

Agreed, and I have no objection to the feedback and continuous improvement
aspects of agile development. Or anything in the Agile Manifesto, for that
matter. In my experience, however, when projects "go Agile" they tend to
cargo-cult simplistic, visible changes like Scrum instead.

> Boeing, in my experience, didn't want to change their processes...

Yes, this is generally the issue. On the other hand, those processes are
strongly influenced by industry standards such as DO-178 which codify "tried-
and-true" practices established in the early days of software engineering—it
isn't just a matter of corporate culture. It's not _mandatory_ to follow these
processes, but if you do then certification is basically assured, which is
much simpler and more reliable than arguing the case for a brand-new process
with no established history in the industry. For the most part these standards
treat software as just another "component" to be integrated into the hardware,
like an extra-complicated combinatorial circuit, rather than what it is: the
design of a iterative, stateful process to be carried out _by_ the hardware
during flight. Novel development and certification processes are a huge risk;
understandably, no one wants to go first, even when they agree that change is
needed.

------
Scoundreller
Edit: the airports noted aren’t along the same magnetic declination, so maybe
not that.

Given that this applies to a few US airports and some S. American ones, maybe
it has to do with magnetic declination calculations gone awry.

Maybe only airports along one of these lines gets impacted:

[https://en.m.wikipedia.org/wiki/Magnetic_declination#/media/...](https://en.m.wikipedia.org/wiki/Magnetic_declination#/media/File%3AWorld_Magnetic_Declination_2015.pdf)

And magnetic declination changes all the time. Maybe different systems are
using different sources of truth?

Maybe when they did initial testing, they included some airports at the
extremes, but when the declinations got updated, they were no longer good test
cases?

------
pintxo
I guess the automated test suite just got larger. Testing all approaches on
all runways on all airports worldwide...

~~~
jojo2000
In the automotive and aerospace world, of course tests in the real world
cannot cover all cases (cost, time make it impossible).

Even most (not all, but almost none for meaningful applications) computer
program cannot be proved to come to an halt [0], so complete testing is
impossible by essence. We can only use more restrictive rules for programming
but cannot formally guarantee anything.

As those systems are tied to the physical world, a whole lot of complexity is
added by uncontrolled parameters.

Yet we love testing things. So a lot of techniques exist, such as SIL [1] and
HIL [2].

So you could imagine using a real dashboard hooked up to a plane simulator.
Which would enable testing the device in a wide array of conditions.

[0]
[https://en.wikipedia.org/wiki/Halting_problem](https://en.wikipedia.org/wiki/Halting_problem)
[1] [https://www.quora.com/What-is-Software-in-the-Loop-
SIL](https://www.quora.com/What-is-Software-in-the-Loop-SIL) [2]
[https://en.wikipedia.org/wiki/Hardware-in-the-
loop_simulatio...](https://en.wikipedia.org/wiki/Hardware-in-the-
loop_simulation)

~~~
jcranmer
> Even most (not all, but almost none for meaningful applications) computer
> program cannot be proved to come to an halt

That is not true. If you can demonstrate that every loop or recursion is
bounded, then the program must halt as a necessity. Loops of the form for (i =
0; i < N; i++) are trivially bounded, unless you're resetting i in the loop.
If you have containers that are not being modified in the loop, the finiteness
of your data structures is usually sufficient to form the lemma that the loop
will terminate.

Recursive datastructures (such as cyclic graphs) are much more challenging to
prove, and most challenging are fixed point algorithms (do { } while
(changed);), as will be noted by them frequently being the causes of actual
infinite loops in my experience. But if you had mandatory annotations to
declare lemmas for termination, it is doable. With that feature, a programming
environment that forbade you from writing code that couldn't be proven to
terminate is probably sufficiently feasible to allow you to write large,
complex applications.

~~~
jojo2000
My POV here was very cautious, as I didn't exclude programs which can be
proved. The problem is that even if control flow is "provable", a lot of
algorithms use number computations and is much harder to deal with from a
formal POV :

"In cooperation with the University of Iowa and Rockwell Collins, this
research focuses on the verification of safety properties on Lustre pro-grams.
SAT or SMT, based verification approaches such as k-induction give good
results on programs with a mostly discrete state space (boolean, bounded
integers). However, when numerical computations are involved (real/float
computations) the formalization of the property to be proved often needs to be
strengthened using auxiliary lemmas to make it inductive with respect to the
system’s transition relation. When attempted manually the discovery of such
lemmas is time consuming and hinders the efficiency and scalability of formal
verification. Automating lemma discovery hence appears crucial to allowing
end-users to apply formal verification on industrial cases."

Taken from [0]

The seminars I attended to, from the creators of coq (a formal verification
language), didn't disagree with this point of view. Of course, formal
verification is not the only thing we can do [0].

In any case, what you propose seem interesting, if the halting problem was the
only problem to solve to have a formally proven system.

[0] [http://www.aerospacelab-
journal.org/sites/www.aerospacelab-j...](http://www.aerospacelab-
journal.org/sites/www.aerospacelab-journal.org/files/AL04-10_1.pdf)

------
izzydata
Apparently another plane went down in Ukraine according to this article. That
seems like a much bigger story. Maybe I happened to miss that one yesterday?

Edit: A Ukrainian airline in Iran.

~~~
DemiGuru
It’s a different model it’s 737-800. Different plane model while still within
the 737 family.

~~~
izzydata
When I say "another" I was referring to that previous incident where a
passenger plane was shot down in Ukraine and not the Max plane problem. Any
passenger plane crashing seems like a significant story. I hope we aren't numb
to that already.

~~~
SmellyGeekBoy
What makes you think it's being treated as insignificant? It's all over the
news.

~~~
izzydata
Now that I think about it I'm not sure where I got that impression. You are
probably correct.

------
ravenstine
Isn't this something that would be discovered through flight simulation? Does
Boeing run their software and hardware through automated testing?

------
mcv
That is an awfully specific bug. I can see how it went undetected. But what
could trigger this?

~~~
francisofascii
From reading the article, it seems like it occurs when the runway is precisely
running east to west and the airplane is coming in westward. And also the
latitude/longitude is somehow involved. You can see one of the airports here.
Notice how it is exactly horizontal, i.e parallel of latitude.
[https://www.google.com/maps/place/Wiley+Post-
Will+Rogers+Mem...](https://www.google.com/maps/place/Wiley+Post-
Will+Rogers+Memorial+Airport/@71.2835378,-156.7756877,14.08z/data=!4m5!3m4!1s0x50c327fff620aae7:0x936de8c8fd2d094b!8m2!3d71.2874212!4d-156.7802701)

~~~
Scoundreller
My thought is some combination of that and magnetic deviation.

Runway headings are always magnetic headings, so what the runway says and what
GPS calculates (at the lowest levels) are two different things.

So maybe the issue only occurs at airports where you’re landing on 27 AND mag
deviation is some specific value?

Would explain why it happens at only some US and some S. American airports:

[https://en.m.wikipedia.org/wiki/Magnetic_declination#/media/...](https://en.m.wikipedia.org/wiki/Magnetic_declination#/media/File%3AWorld_Magnetic_Declination_2015.pdf)

~~~
theideaofcoffee
> Runway headings are always magnetic headings...

This is incorrect, while most runways are aligned with the magnetic heading,
that changes when you have a field with multiple parallel runways. Atlanta
Hartsfield comes to mind first, with its five parallels in the East-West
directions, 8L/26R, 8R/26L, 9L/27R, 9R/27L, 10/28\. One would naively expect
these to be not parallel by just reading the name, while in fact they are all
in line.

~~~
Scoundreller
Okkkk, good point.

And they're only magnetic until they get updated because the declination
changes over time.

Other exceptions: near the poles where the magnetic declinations change
rapidly, so they use something else.

------
vmchale
don't land on those runways!

------
mikl
I’m getting happier and happier that my usual airline (Swiss) has a mostly-
Airbus fleet.

~~~
coredog64
To quote Han Solo, “Don’t get cocky, kid”

Boeing and Airbus both source avionics from the same third parties. The 737NG
uses Smiths, but everything else in the Boeing fleet (and all newer Airbus
airframes that I’m aware of) come from Honeywell. While you can fault Boeing
for not finding this, the vast majority of the fault lies with the OEM.

~~~
mikl
Even if this particular problem is not Boeing’s fault, but Smiths’, the whole
737MAX saga still puts Boeing in a pretty bad light – cutting corners on
security, outsourcing their software to dubious vendors, etc.

~~~
elliekelly
The Daily podcast (NYTimes) did an episode last week called “Boeing’s Broken
Dreams” where they interviewed a former Boeing safety manager turned
whistleblower. Definitely sounds like there was a major cultural shift over
the past decade or so.

~~~
thephyber
That podcast ep re-airing was based on this story[1].

[1] [https://www.nytimes.com/2019/04/20/business/boeing-
dreamline...](https://www.nytimes.com/2019/04/20/business/boeing-dreamliner-
production-problems.amp.html)

------
_bxg1
I for one will never ride on a 737 again, even if they manage to get them back
off the ground. For every problem that's come to light there are probably two
more than haven't been discovered (or disclosed).

~~~
johannes1234321
Fact is: there are around 10,000 737s around, each if them starts and lands
multiple times a day, by far mostly without problems. Risk is similar to you
slipping while getting out of bed in the morning or being overrun while
crossing a street. (Excluding Max versions, which are grounded)

~~~
ta999999171
Maybe if I have a heart attack or seizure during those activities...

I have 0 past history of slipping. Also, can't avoid walking.

I can avoid flying on something I know could be fucked.

~~~
some_random
Are you saying you have literally never slipped before? I'm quite impressed if
so.

~~~
ta999999171
Straying from point, haha, but only once, in a new set of footwear - jumping
over a car.

------
fit2rule
Oh, it is going to be fun to see them get re-certified.

Did someone forget to run the fuzzer, maybe?

