
Delta Computer Glitches Force Flight Halts Third Year in a Row - danso
https://www.bloomberg.com/news/articles/2018-09-26/delta-air-halts-services-to-address-systems-technology-issue
======
danso
The end of the article provides this comparison for context:

> _Still, Delta isn’t the only U.S. carrier to have suffered from technical
> glitches. In December 2017, a fire at Atlanta’s airport, the world’s busiest
> hub, caused a major electrical disruption, crippling services and stranding
> thousands of passengers of Delta as well as rivals including Southwest
> Airlines Co._

It seems to me that Southwest's systems being brought down by fire and
electrical disruption is a different "technical glitch" than the kind of
systems problem Delta just had. A better comparison would be July 2016, when a
router failure created a "chokepoint that crippled hundreds of the company's
software applications":

[https://www.dallasnews.com/business/southwest-
airlines/2016/...](https://www.dallasnews.com/business/southwest-
airlines/2016/07/30/southwest-ceo-router-failure-grounded-flights-equated-
thousand-year-flood)

~~~
Twirrim
> It seems to me that Southwest's systems being brought down by fire and
> electrical disruption is a different "technical glitch" than the kind of
> systems problem Delta just had.

To some degree, sure. But they really should be capable of a straight failover
(forced, if necessary) to geographically separate components.

~~~
txcwpalpha
>To some degree, sure. But they really should be capable of a straight
failover (forced, if necessary) to geographically separate components.

AFAIK the failure at ATL had nothing to do with the ability for the IT network
to failover. It was because the power at the airport was off, and therefor the
airport was unable to fly planes. And when the world's largest airport goes
dark, it causes ripples in delays across the entire country because of
misplaced planes/staff/passengers. And you can't just "failover" hundreds of
planes and thousands of passengers to a different geographic location.

We're really talking about two completely different types of failure here, and
the other comment is right in that it's weird to compare the ATL fire to last
night's failure.

~~~
AnimalMuppet
It's weird to those who know. But to those who are just looking at how bad it
messes things up from a passenger's view (ie, non-aviation-knowlegeable people
reading the article), it's reasonable to compare things of different causes
that have similar effects.

~~~
txcwpalpha
True, but if that's the case then why doesn't the article bring up all of the
adverse weather events that have had the same impact?

Even for non-aviation-or-tech-knowledgeable people, I think most still care
who is at fault for such disruptions. The article frames the ATL fire as
Delta/Southwest's fault. It puts the ATL fire in the same category as other,
airline-specific issues that were caused by Delta's IT systems. But that just
isn't the case. The ATL fire affected _every_ airline at ATL (but for some
reason the article only calls out SW and DL), and was the fault of the
airport, not the airlines. It's more similar to bad weather shutting down an
airport than it is to the IT issues that Delta faced last night.

It just feels like bad reporting to include that comparison.

------
anon4lol
I just have to comment about this.

I had the weirdest job interview for a contracting job at Delta. I passed two
technical phone interviews and was scheduled for an in-person interview. The
night before it dumped a ton of snow. I tried to cancel, but didn't have the
manager's contact info. I left an hour early to get there, and still I was 5
minutes late.

They let me sit in the lobby for another twenty minutes, then the Manager's
"administrative assistant" came and fetched me. The Manager then asked me some
of the most condescending questions, like "what operating system do you use?
Have you ever used Linux?" He then finished with a snide comment and that our
time was up and he had a meeting to go to.

I called the recruiter from my car. They decided pass, "because the Manager
said I didn't apologize enough for being late." That was the last I heard from
the recruiting company. They ghosted me. Wouldn't talk to me or respond to my
email. It was the most unprofessional situation I've run into in 25 years of
contract/consulting. Just plain weird.

Note, this job was not to code, but write documentation about the code. They
wanted a C++ programmer to document the existing system that schedules and
reroutes flights during irregular ops, that was written by some guys from Bell
Labs running on some fault-tolerant hardware. He had sold the management on
converting everything into Java running on commodity/cloud servers.

Every time I see their system meltdown... I'm reminded of this.

~~~
pavel_lishin
Your story reminds me of the LinkedIn article about employees ghosting
employers and recruiters.

[https://www.linkedin.com/pulse/people-ghosting-work-its-
driv...](https://www.linkedin.com/pulse/people-ghosting-work-its-driving-
companies-crazy-chip-cutter/)

~~~
mikestew
And the article waits until late to point out that it is the companies
themselves that have explicitly demonstrated that that's how they want it to
go: once you lose interest, quit answering the phone.

~~~
pavel_lishin
I wonder how many people reading the article thought to themselves, "Well, no
shit" vs. "Oh, wow!"

------
AceyMan
Once again, TFA is unclear about exactly what DL systems became unavailable.
As usual, it's implied that it was ticketing and reservations stuff
("Deltamatic" for DL) which is a nightmare, yes. But back in the day (of paper
tickets) you could resort to manual processes and still get flights out. The
backpressure would accumulate quite fast, and you couldn't operate in 'manual
mode' for very long, but you could bust your butt and do it for a
departure/arrival push or two.

The _real showstopper_ is if the Operations Center can't deliver the dispatch
release; that's the legal document prepared by the airline dispatcher that
grants the flight permission to initiate the flight. Without that, it's a no-
go. Even if Deltamatic were down, Flight Ops could fax the releases over, as
the dispatch planning system is merely tied into Deltamatic to make local
printout easier; it doesn't actually run on Deltamatic.

These days with e-ticketing and more stringent DOT/Customs standards on
accurate pax manifests, etc, having Deltamatic (ticketing/res) down is a show-
stopper, but it wasn't always that way and the way these articles are written
it leaves, in my mind, the open question about what parts were actually
fubared.

(me: ex-aircraft dispatcher for a DL-owned regional carrier.)

------
rectang
Which airlines, if any, run outstanding software departments?

The impression I get from these repeated fiascos is the whole industry still
runs on mainframes and java applets held together with duct tape and
bubblegum, and that their executives are dinosaurs with no appreciation for
what it takes to build robust failure-tolerant systems. But maybe it's just a
skew in the reporting.

~~~
pc86
Running on mainframes and java applets is not inherently a negative thing. Not
everything needs to use npm or be written in Rust.

~~~
rectang
Sure, so emphasize the "duct tape and bubblegum", then. There are a number of
modern software development practices that make software systems more robust
as they are evolved over time. These are the kinds of things that companies
whose leadership lacks individuals with tech backgrounds do not seem to
appreciate enough to budget for.

* Sharding and clustering.

* Queuing between services.

* Geographic distribution across multiple datacenters.

* Continuous restoration from backups.

* Unit testing.

* Source control.

* ...

Those practices are not language or system specific, but they may not be easy
to follow when using older technologies.

~~~
ogn3rd
Are you from the cloud? ;)

~~~
rectang
Heh. I'm not, and all of those practices are compatible with running your own
data centers.

------
rb808
Incidentally Delta has the same market cap as Square at ~$40B. It also owns
881 airplanes. Plus it has to write critical software to keep them all running
smoothly. Non-tech world is tough.

~~~
KMag
Does Delta really own its planes, or lease them long-term from a holding
company?

~~~
rb808
They own most, see p34
[https://s1.q4cdn.com/231238688/files/doc_financials/quarterl...](https://s1.q4cdn.com/231238688/files/doc_financials/quarterly/2018/Q2/DAL-6.30.2018-10Q-Filed.pdf)

------
Sharlin
As an aside, the title is a garden path sentence if I’ve ever seen one. Had to
backtrack at least twice to parse it correctly.

------
chisleu
At least the "glitch" let everyone get home from defcon before "happening"
this year.

~~~
acct1771
Has there been strife between the community and Delta?

------
exabrial
I'm curious if they're still using their mainframe based systems? They've been
workhorses over a number of years but show their age when it comes to making
quick changes.

~~~
newsDerp
From Wikipedia:

    
    
      With SABRE up and running, IBM offered 
      its expertise to other airlines, and 
      soon developed Deltamatic for Delta Air 
      Lines on the IBM 7074, and PANAMAC for 
      Pan American World Airways using an IBM 
      7080.
    

[https://en.wikipedia.org/wiki/Sabre_(computer_system)](https://en.wikipedia.org/wiki/Sabre_\(computer_system\))

------
rbanffy
Do they publish detailed postmortems?

~~~
snaky
> The recent Delta Airlines system outage and the prior Southwest outage are
> pointing people to blame their antiquated technology and infrastructure. In
> particular, a lot of these so-called technology experts point their fingers
> at airlines’ IBM mainframes running z/TPF as part of the cause of their
> troubles. The problem is that none of these “experts” seem to have ever done
> any work on a mainframe and only have a passing understanding of z/TPF if
> they have any understanding of it at all.

[https://www.linkedin.com/pulse/mainframes-problem-
solution-j...](https://www.linkedin.com/pulse/mainframes-problem-solution-
jeff-hall)

~~~
rectang
This link is a gem. It is very revealing of the dynamics of arguments over
software in the airline industry.

> _A number of financial institutions have discovered and acknowledged this
> fact and are now taking the first steps to address this situation by making
> the decision to go back to z /TPF, z/OS and UNIX because of their
> bulletproof reliability and transaction processing capabilities. This to the
> chagrin of the bulk of their application development staffs who are bolting
> out of these organizations for the more sexy technology environments. Yes,
> IBM Assembler, COBOL and other antique languages are making a resurgence at
> these brave organizations as they supplant and even replace Java, .NET,
> Python and other development environments._

Back to the good old days of _IBM Assembler_?!

Are they ditching source control and unit testing too?

~~~
pc86
I used to work at a healthcare company where about 1/3 of the development
staff worked in RPG and COBOL day-to-day. A consultant came in to pitch them
on a COBOL source control system, and from what I heard they had to spend most
of the time explaining to these 60+ year olds what source control was before
pitching the management on a 7-figure implementation for what was basically
timestamped folders with copied of code inside.

When I left that company a few years after that, there was still no source
control for the RPG or COBOL code.

~~~
Xixi
I have to chime in and display my ignorance: what makes RPG or COBOL so
special that git or mercurial could not simply be used for source control?

~~~
snaky
Technically, nothing.

"Using Visual COBOL in Modern Application Development"
[https://www.microfocus.com/documentation/visual-
cobol/visual...](https://www.microfocus.com/documentation/visual-
cobol/visualcobol30/VC_for_VS_in_Modern_App_Dev.pdf)

------
CaliforniaKarl
Suggest changing the title to "Delta Resumes U.S. Domestic Flights After
Computer Glitches", as the headline has changed, since the issue has been
resolved.

~~~
dang
We've changed the title from "Delta Grounds U.S. Domestic Flights to Fix
Computer Glitches" to the article's current title.

------
senorsmile
I'm currently sitting in an airport in Salt Lake City. Was supposed to be in
St. Louis, MO last night for the preconf day of Strange Loop. Instead I get to
miss the first day because they didn't hold my connecting flight. And, my
first flight didn't leace until almost 2 hours after we were supposed to,
because of a computer glitch.

~~~
pc86
I thought it was standard practice not to hold connecting flights, especially
for 2 hours? I certainly would not want to be stuck at my gate (and
potentially miss my connections) waiting for another passenger for that plane.
It seems pretty obvious that the least impactful route is that planes take off
when they are ready and do not wait around.

~~~
gav
Almost no flights are held back for connections.

There's probably the odd flight that is held back if there's a large number of
passengers that will miss it due to a late connection--and there's no later
flights with seats available. I would imagine the cost of holding the 11pm
LAX-SYD flight because 20 people are 10 minutes late on the NYC-LAX connection
is a lot less that paying for hotels, especially when the next flight may not
have 20 seats available anyway. Though then again, most carriers have gate
pressure at LAX and they might not be able to hold the flight anyway.

It's an interesting optimization problem to solve!

