
Half of European flights delayed due to system failure - el_duderino
http://www.bbc.com/news/world-europe-43633094
======
tim333
Apparently the Enhanced Tactical Flow Management System (ETFMS) packed up.

>“ETFMS facilitates improvements in flight management from the pre-planning
stage to the arrival of the flight. It maximises the updating of flight-
related data and thus improves the real picture of a given flight, thereby
contributing to the Gate-to-Gate Concept,” Eurocontrol explains on its
website.

>The agency initially reported that contingency procedures were immediately
put in place which reduced the capacity of the European network by around 10
per cent. [http://www.airtrafficmanagement.net/2018/04/eurocontrol-
give...](http://www.airtrafficmanagement.net/2018/04/eurocontrol-gives-
allclear-for-normal-efms-operations-to-resume/)

They don't seem to say what went wrong but do say

>In over 20 years of operation, the ETFMS has only had one other outage which
occurred in 2001. The system currently manages up to 36,000 flights a day.

Tech details from wikipedia fr:

Written in ADA , and running on HP-UX , the system is based on an exchange of
messages between the airlines (who will file / change / update flight plans),
the air traffic control bodies and the CFMU , the messages are written in
ADEXP format.

ETFMS uses at least 5 fundamental notions:

flight plan : describes the 4D trajectory of an airplane. regulation: aircraft
rate applied to a "volume traffic". Example: 50 aircraft / hour "Traffic
volume": association of a geographical reference (air sector, waypoint,
airport, etc.) and a set of aircraft flows. list of takeoff slots or called
slot . Example: if the rate of regulation is 30 planes / h, there will be a
slot every 2 minutes: 10 hours, 10 h 2, etc. the delay: difference in time
between the take-off time desired by the company and the schedule calculated
by ETFMS.

~~~
makmanalp
It would be fascinating to read about what bug took down a system with one
outage in 20 years. I remember reading in the chubby paper (from google) that
user error was the cause of more than half the downtime incidents. Wonder if
that's the case here too.

~~~
dx034
If I remember correctly, a similar outage in the UK was due to increased
flight volumes. The system had a hardcoded limit somewhere which caused issues
as flight volume increased. The software was old enough that they weren't
aware of that issue beforehand. I could imagine something similar happening
here.

It definitely shows how well software can work with the right practices. Only
one outage in 20 years and that one only caused a reduction of 10% in
capacity. Don't think many companies can match that.

~~~
carlmr
>It definitely shows how well software can work with the right practices. Only
one outage in 20 years and that one only caused a reduction of 10% in
capacity. Don't think many companies can match that.

This is what happens when you build software with the same meticulousness as
other engineering disciplines. However a lot of software is so much more
complex than what other disciplines can build (because you can build anything
you can imagine), coupled with early deadlines and profit pressure, that it's
unrealistic for most software to be developed this way. You easily have a cost
factor of 100x in time as well as money.

~~~
xcvbxzas
I see this sentiment a lot. Intuitively it seems like it should be true, but I
don't think the case is really quite so clear cut.

The costs involve way more than just the initial development. Maintenance eats
up a huge, perhaps even a majority, of the total cost as well. And outages or
other failures can be very expensive too.

It's also important to keep in mind that this isn't an all or nothing
situation. We can have software that is more reliable without asking that it
chug away without issue for a decade, or anywhere near as long as we expect
bridges or buildings to last.

The process of developing more reliable software isn't necessarily more
expensive than less reliable software. It can even be cheaper. I'm struggling
to find the links (maybe somebody else has them handy, or I'll edit them in if
I find them), but there have been a few case studies done a few years back by
companies that moved to using Ada. In addition to the benefits of more
reliable software, they also found development costs were better or at least
no worse than C. I know that isn't exactly the language to compare to these
days, but as I said these were done some time ago.

This is just my own argument, but I suspect that's because the same problems
that ultimately cause problems after release also cause problems during
development. With a more reliable programming system/environment, problems
that might show up later during development are shown to be an issue
immediately. This means the issue doesn't need to be tracked down, which can
take some serious time. The developers are even fresh on problem area.

Personally speaking, I've been totally won over by Ada. It ain't perfect, but
it's a hell of a lot better than anything else I've seen - and I've looked a
lot. In my own projects (mainly personal or for school admittedly) development
is much easier and ultimately quicker. I don't have to spend a day tracking
down a weird bug because the compiler let's me know about the issue as soon as
I try to cause it.

~~~
carlmr
>The process of developing more reliable software isn't necessarily more
expensive than less reliable software. It can even be cheaper. I'm struggling
to find the links (maybe somebody else has them handy, or I'll edit them in if
I find them), but there have been a few case studies done a few years back by
companies that moved to using Ada. In addition to the benefits of more
reliable software, they also found development costs were better or at least
no worse than C. I know that isn't exactly the language to compare to these
days, but as I said these were done some time ago.

I can believe that. Ada catches a lot of errors you would normally only notice
by extensive testing at compile time. You're preaching to the strong-static
typing choir here. I believe Ada and Rust could solve a lot of problems of
companies working with C/C++ and make development cheaper. You can properly
model your domain and abstract without sacrificing safety.

I'm also a strong believer that TDD makes you much faster and safer in the
long run.

My experience tells me that most tools, languages or methods that catches
errors earlier will save money.

Ada also has the best tested compiler I can think of.

However my larger point was about the engineering processes not the language
itself. I think with languages and tools you can make it easier to make good
software. The 100x time and cost is more in the sense of process changes when
you're working on safety critical systems. How everything has to be traceable
from requirement to test, how there are mandatory reviews before any code
change that need to be documented, how there are qualification criteria for
the toolchain, etc. All these things cost a lot of time and manpower, with
arguably very bad cost-benefit analysis, which is only really worth it when
human lives are at stake.

~~~
xcvbxzas
> The 100x time and cost is more in the sense of process changes when you're
> working on safety critical systems. How everything has to be traceable from
> requirement to test, how there are mandatory reviews before any code change
> that need to be documented, how there are qualification criteria for the
> toolchain, etc. All these things cost a lot of time and manpower, with
> arguably very bad cost-benefit analysis, which is only really worth it when
> human lives are at stake.

Absolutely. That's part of what I was getting at by mentioning all of this
exists on a continuum. We don't need to, and really shouldn't, treat a SaaS
startup exactly the same as a military aviation project.

We can, however, draw from the lessons learned on those safety critical
projects and use parts of the process that make sense for the nature of
whatever we're actually working on.

You're right that in general I suspect that comes down to strong static
typing, particularly for the sorts of projects common to the HN crowd. When
dealing with very large enterprise projects the balance might start to shift
to more than just typing, though it would probably take a lot of real-word
data that nobody is keen to supply to figure out where the tipping points are.

And I'd argue about how well Rust actually helps with these things, but that
would really be going off the rails. Unfortunately.

------
isostatic
> Eurocontrol announced the system restart later in the day, after what it
> called extensive testing.

So they turned it off and on again.

~~~
Numberwang
Bastards.

~~~
jwbensley
Its not brain surgery!

~~~
jwbensley
Whats with the downvotes? That guys username is and my comment are both
references to a TV show.

~~~
AndrewDucker
That may be part of the problem.

Silly jokes don't go down well on HN, as they're distractions from the
conversation. It's considered "Reddit-like" behaviour, and discouraged through
downvotes.

~~~
jwbensley
Are you saying humour isn't allowed on HN?

~~~
nkurz
Humour is allowed, and so is downvoting. In practice, the humour has to be to
be really good to avoid being mercilessly downvoted. Meta-humour based on
usernames and TV shows usually doesn't cut it. Many of us view this as a good
thing.

------
tnolet
5hr delay on a 1,5hr flight from Berlin to Paris. If they'd use Docker with
React this would have never happened!

~~~
reificator
> _If they 'd use Docker with React this would have never happened!_

It's probably because they wrote it in Go. Did you know Go doesn't even have
generics?

~~~
jfktrey
What do you mean? Go has generics. Kind of.

[https://www.reddit.com/r/rust/comments/5penft/parallelizing_...](https://www.reddit.com/r/rust/comments/5penft/parallelizing_enjarify_in_go_and_rust/dcsgk7n/)

~~~
Pxtl
Wait, is that string using inuktitut characters as fake angle-brackets? That
is monstrously evil.

~~~
nkurz
Yup, that's even evil enough to quote in full:

    
    
      [–] pftbest 114 points 1 year ago 
      can you please explain this go syntax to me?
      type ImmutableTreeListᐸElementTᐳ struct {
      I thought go doesn't have generics.
    
      [–]Uncaffeinated[S] 239 points 1 year ago 
      It doesn't. That's just a "template" file, 
      which I use search and replace in order to
      generate the three monomorphized go files.
      If you look closely, those aren't angle brackets,
      they're characters from the Canadian Aboriginal
      Syllabics block, which are allowed in Go identifiers.
      From Go's perspective, that's just one long identifier.
    

[https://www.reddit.com/r/rust/comments/5penft/parallelizing_...](https://www.reddit.com/r/rust/comments/5penft/parallelizing_enjarify_in_go_and_rust/dcsgk7n/)

------
emersonrsantos
For those curious, Eurocontrol MUAC (Maastricht Upper Area Control Center)
migrated to 50 virtual SUSE Linux Enterprise servers running under IBM z/VM
hypervisor on a IBM z196 mainframe system in 2013.

~~~
justadudeama
Does any of this effect the nature of the failure? Have they said what went
wrong?

~~~
dx034
No but the fact that this is the first outage in 20 years could indicate that
mainframes still have their place if you need 100% reliability.

Which makes plans of banks to move core systems from mainframes to the cloud
even more worrying.

------
sleavey
I was on a plane just about to push back at Heathrow, and the pilot informed
us we'd be delayed 15 minutes due to this failure. In the end it was 10
minutes, and we landed only 5 minutes late at my destination. Doesn't appear
to have been a big deal, at least for me.

~~~
dghughes
I wonder if that is due to Heathrow using Time-Based Separation software.

[https://www.airport-technology.com/news/optimised-runway-
too...](https://www.airport-technology.com/news/optimised-runway-tool-improve-
efficiency-heathrow/)

~~~
hightowk
Technically, Time Based Separation helps only arrivals since it tries to
negate the effects of headwinds on final approach, but it does add a lot of
resilience to the airport to absorb delays that would normally ripple to
departures.

~~~
dghughes
I'm pretty sure it doesn't I was reading about it and it mentioned even when
the pilot can start the engines is calculated. It saves millions of liters of
fuel just waiting a few minutes.

There is a digital display facing the pilot showing him when he can depart.
Even which size aircraft are allowed to depart. It's extremely well organized.

------
icebraining
"departures are now limited to 10/hour at #brusselsairport"

Impressive that they maintain this rhythm even during a once-in-a-decade
unexpected system malfunction.

~~~
ttul
I guess that’s the “paper and pencil” speed.

------
njitbew
Yes, this sucked. I just had a 1 hour delay on a 1 hour flight (AMS-ZHR).
Unfortunately, no compensation until its a 2 hour delay (but thank god it was
only one hour!). Passengers who had a layover were noticeably less happy.

~~~
CaptainZapp
You'd probably won't have been eligable for compensation since this delay was
definitely beyond the airline's control.

While a lot of airlines try to weasel out of their obligations (mechanical
failue, for example, which however is the airline's responsibility) I would
think such a case is pretty clear cut.

------
tvanantwerp
I was at a conference in Northern Virginia some months ago and saw a
presentation from the folks at Upside, a startup specializing in booking
business travel. They described the legacy system which handles pretty much
all booking in the US, a system called SABRE. They described it as an ancient
6-bit computer system in Texas with no modern API. Everything they do tech-
wise is a modern wrapper around that system. So I'm not at all surprised by
any air travel computer failures if tech like that is central to the system.

~~~
userbinator
I've heard of enough misguided "modernisations" (and failures thereof) that I
think the "legacy system" was the part that stayed working throughout, and
it's the newer stuff added around it that failed. The old stuff may be old but
there's a reason it's old... it's outlasted any attempts at replacing it.

~~~
AceyMan
>it's outlasted any attempt to replace it

See,
[https://en.wikipedia.org/wiki/Lindy_effect](https://en.wikipedia.org/wiki/Lindy_effect)

(my fave citation from everyone's fave source for non-academic citations)

~~~
mseebach
The counteracting force to that will be that they complexity of the
surrounding environment increases and becomes more brittle, or is just simply
in the way, as constantly increasing demands for new features drifts further
from the capabilities of the old system in the middle.

A few years ago, SAS introduced a new status tier. It took 18 months to
introduce into the system (Amadeus). The system may be stable, but those kinds
of turn around times for a minor customer service change simply isn't
feasible. I don't have numbers, but I wouldn't be surprised if one of the
reasons upstart airlines such as EasyJet are competitive is that their IT is
comparatively modern and can actively support the organisation while IT is
more of a millstone around the legacy airlines' necks.

------
DavidAdams
Can confirm. My flight from Zurich was delayed by over an hour today, causing
my family and me to need to run, OJ Simpson-style, through the Philadelphia
airport to make our connection.

------
candiodari
No worries.

Everyone at that organisation is paid boatloads of money. [1]

They don't pay taxes on it. (~10%, "for solidarity", which means they get to
enjoy healthcare paid by ~55% taxed nationals)

And 90% of the organisation (especially the management) has absolutely
nothing, nothing whatsoever, to do with guiding planes anywhere. In fact,
those departments are severely understaffed. The department doing "regulatory
support" is about 2/3rds of the organisation (tldr: making sure half the local
government officials don't have to get their own coffee - and before you say
it, no, Eurocontrol employees don't get them coffee, they're merely in charge
of making sure someone's there to get them coffee, and steak, cake, and ...
The coffees, I might add, are baffling. Done from a steam boiler machine in
front of you, with fair trade beans, sweetened not with sugar, but with
expensive imported bars of chocolate meled in milk that's frothed in front of
you (they put in the chocolate somehow while they're frothing the milk with
steam, melting it in while not getting the steam on the chocolate somehow),
and you get the rest of the case of that (expensive) chocolate to take home
for the kids. No, not when you ask, they'll ask you if you want that. Btw,
it's not really the rest of the case they give you, you get a fresh case. Oh
and of course, of the bar they opened they prepare just one coffee (about
1/4th of the bar). The rest gets thrown into the trashcan, they don't use the
same bar for the next coffee. As for the steaks ... oh my God)

And in case you're wondering: the odds of 2 planes colliding with zero
guidance outside of the ATC zones around airports (which aren't covered by
Eurocontrol) over even a region as big as Europe are more than 10 billion to
1, against, per year. So if Eurocontrol didn't exist at all, and we just
allowed every plane to fly wherever ... nothing would go wrong at all.

So ... what is the problem here ? Disruption of millions of travelers for no
reason whatsoever ? Let's please not pretend anyone at Eurocontrol cares
(well, they care about not being interfered with, and that will make them care
NOW, but if one thing's guaranteed it's that the Software/ATC departments will
remain the same size, and only the bribery departments will grow)

[1]
[http://www.eurocontrol.int/sites/default/files/content/docum...](http://www.eurocontrol.int/sites/default/files/content/documents/jobs-
section/scale-basic-monthly-salaries-officials-annex3-staff-regulations.pdf)

------
kzrdude
Just like there are no bugs without security impact, is it real to say that
this has no impact on flight security? Any error can be a contributing factor.

~~~
webreac
Safety is ensured by ATC controlers. ETFMS is there to ensure that traffic
does not increase beyond controlers capacity. I have read that without ETFMS,
the traffic is reduced by 10%. I have been involved in ATC simulations where
controlers had to land about 40 flights per hour using new procedures and
tools. We have tried with 38 flights per hour, it was too easy for the
controlers: their work was perfect even without tools. With 42 flights,
controlers were getting angry because the traffic could not be managed. At 40,
we could see the benefits of out new procedures and tools (more regular
separation of flights). IMHO 10% less traffic gives far enough capacity margin
to ensure safety.

------
ehudla
Is that the Ada code base?

------
maartn
So Trump and Putin finally hooked up. No need to get all SuSe over it... nerds

------
dang
Url changed from [https://www.theverge.com/2018/4/3/17193814/eu-eurocontrol-
br...](https://www.theverge.com/2018/4/3/17193814/eu-eurocontrol-brussels-
system-failure-flight-plans-airlines-delays-grounded), which points to this.

------
jumelles
I fear this sort of thing is going to become more and more common at airports.

~~~
isostatic
Why?

Sure, the skies are more and more crowded, meaning more and more people will
be affected by a once-in-20 year failure, but why would it happen more and
more common?

~~~
SteveNuts
Well the archaic systems are getting older and older. From what I've heard
it's almost impossible to get replacement parts for some of the computer
systems that run airports/airlines.

~~~
icebraining
Not sure about aviation, but in other industries they virtualize the old
hardware on a new architecture, and keep the software going.

------
bluedino
Twenty years from now... _uber car service system failure affecting 90% of
North America_

~~~
jlgaddis
The only reason this will never happen is because there's a only snowball's
chance in hell that Uber will still be around in 20 years.

------
matte_black
In situations like this I’m glad I book my flights with Chase Sapphire
Reserve. It comes with a trip delay reimbursement for up to $500 per ticket if
a flight is delayed more than 6 hours. No sweat!

~~~
joezydeco
...for a $450 annual fee. If you plan to get kicked off a flight more than
once a year, perhaps it's worth it.

~~~
matte_black
The rental car insurance benefit already covered $2600 worth of damage when a
rental car I had got banged up a bit.

I also get a $300 travel credit a year.

~~~
greglindahl
Your normal car insurance covers rental cars.

~~~
rconti
not overseas, and has a deductible. I used a generic travel credit card and it
covered my rental damage once, now I make sure to use such a card whenever I
rent.

