
United Airlines System-Wide Computer Problem - SanjeevSharma
http://www.flyertalk.com/forum/united-airlines-mileageplus/1693331-shares-down.html
======
chinathrow
Nice reminder about a "glitch" happening in one of the datacenters use at the
airline I used to work.

Went like this: Guy who shows around the new datacenter/ops guy demonstrated
how the emergency power off works by lifting the protection plate. Protection
plate unhinges suddenly and droppes onto emergency power off button. Hilarity
ensues.

~~~
bitwize
A data center where I work had a naked big red switch without a protection
plate. Until I flagged up this lack and let the IT people know about it, it
was a disaster waiting to happen.

There's a plate on the switch now, praise Eris.

~~~
philipw
I've seen many places where "the solution" to the unprotected EPO button was
just a simple plastic cup!

------
lectrick
I have a nagging (intuition? feeling?) that software
safety/reliability/security needs are going to explode soon (because
unreliabilities multiply in non-resilient systems interacting with each other)
and that these are simply foreshocks.

(yeah, I know security is already a huge deal, but as we come to trust
software systems more and more, the safety/reliability factor will come more
into play)

EDIT: This is also part of the reason I've been learning Elixir
([http://elixir-lang.org/](http://elixir-lang.org/)) since it's based on the
highly-resilient Erlang and is designed to embrace failure. This was also
informed by me reading Nassim Taleb's book "Antifragile" as well as "Thinking
in Systems: A Primer" by the (late) Donella Meadows.

~~~
yAnonymous
I doubt it. Nobody wants to pay for that.

~~~
DougWebb
You might be right, unfortunately. If it's cheaper to buy insurance that will
cover the (expected) losses caused by outages, most organizations will choose
to do that instead of making the software more failure-resistant. The problem
is that insurance only works well for isolated incidents, but a software
failure can cause a cascading failure with a huge impact. Insurance companies
generally aren't prepared for that and don't have the resources to pay out to
everyone.

~~~
sopooneo
But aren't the insurance companies smart enough to figure this out and start
correcting their rates to be much higher?And if they actually have their acts
together, wouldn't those same insurance companies start insisting on basic
audits of their client's systems?

I actually don't know about this stuff, so any correction of my thoughts is
appreciated.

~~~
TeMPOraL
A cynic in me feels that someone will figure out the problems with cascade
case and insure from failure of insurance companies to pay out insure money.
Just like during 2008 financial crisis.

~~~
coderjames
That's what existing reinsurance companies do, if I understand correctly. They
insure the insurance companies.

------
carnesen
I worked on United's computer systems for a year (never that one though), and
so I get nervous when I see a headline like that. True story: one of their
systems still runs on a mainframe that has 9 bits in a byte!

~~~
themartorana
I did not know that was ever a thing. I knew that bit counts had been lower -
5 or 6 - but assumed once we hit 8, the whole power-of-2 thing was too
comfortable in a binary system to ever be anything besides a power of 2 again
- and sure enough, we get 16 bit, 32 bit, and 64 bit systems, and double-byte
and triple-byte char sets, etc. Need more space? Take another byte.

Was the 9th bit special? Or just a standard bit in the byte?

~~~
Someone
36 bit was popular for scientific computing because 35 gives you a sign bit
and 10 decimal digits (yes, that's weird, but that's the argument I read
everywhere, including at
[https://en.m.wikipedia.org/wiki/36-bit](https://en.m.wikipedia.org/wiki/36-bit).
35 likely was skipped because it its only divisors are 5 and 7, limiting
instructions to 7 bits even then was felt to be too restricting. For some
architectures, the DoD had a say in this, too. See
[https://en.m.wikipedia.org/wiki/Unisys_2200_Series_system_ar...](https://en.m.wikipedia.org/wiki/Unisys_2200_Series_system_architecture))

36 bits got us the 6-bit character (10 digits, 26 letters, and punctuation)
with six characters in a word. Because of that, some OSes had six-character
file names.

If you want to get upper- and lowercase, you need more than 6 bits. 9 is the
smallest divisor larger than 6 of 36, so nine-bit characters made sense.

On such systems, file names could still use 6-bit characters, while
applications used 9-bit ones. Also, some instructions could work on words,
half words, quarter words, or sixth words.

------
jaybna
I knew someone at United that once offered to give me a tour of one of their
data centers: "It is like a computer museum - we have one of everything." Hard
to imagine that they would have problems as a result. United is a really,
really bad airline.

------
jonawesomegreen
I bet this is an issue with an old mainframe used somewhere in the booking
system, something that has worked well but is difficult to fix when things go
wrong.

I think there is / will be a lot of money to be made trying to solve the
problem of software security and reliability. This is obviously an extremely
difficult problem, however the number of ancient systems that we currently
have interconnected I think more large scale outages like this are inevitable.

~~~
igrekel
Truth is that most reservation systems are built on TPF and it isn't really
easy to replace.

~~~
waqf
just in case we have readers to whom TPF isn't a household name:
[https://en.wikipedia.org/wiki/Transaction_Processing_Facilit...](https://en.wikipedia.org/wiki/Transaction_Processing_Facility)

~~~
pjc50
That sounds like a heck of a system.

------
DangerousPie
Official status page:
[http://www.fly.faa.gov/ois/jsp/summary_sys.jsp](http://www.fly.faa.gov/ois/jsp/summary_sys.jsp)

Currently says:

    
    
        ATCSCC  ADVZY 027 DCC 07/08/15 UAL GROUND STOP REVISION
        DESTINATION AIRPORT: ALL AIRPORTS
        FACILITIES INCLUDED: ALL FACILITIES
        GROUND STOP PERIOD: 08/1200Z - 08/1315Z
        REASON: USER REQUEST AUTOMATION ISSUES

~~~
kieranelby
Good to see the Sun Microsystems favicon on the FAA status page - not seen one
of those for a while!

~~~
fnordfnordfnord
Last one I noticed was the FCC's comment system.

------
peterjmag
Ouch. And only a month after another major systems outage:
[http://www.wired.com/2015/06/united-flights-grounded-
mysteri...](http://www.wired.com/2015/06/united-flights-grounded-mysterious-
problem/)

------
kendallpark
Was reading this on HN and heard it on NPR simultaneously.

I have a sneaking suspicion that booking systems for most airlines run atop
legacyware. It just seems like the type of thing that would've been put in
place long ago and then be very expensive to migrate/updgrade.

~~~
joezydeco
Oh, it's not a suspicion.

[http://www.pacbiztimes.com/2012/04/06/united-takes-a-step-
ba...](http://www.pacbiztimes.com/2012/04/06/united-takes-a-step-back-in-time-
by-adopting-outdated-system/)

~~~
kendallpark
> The biggest problem, one that would drive any tech-savvy user crazy, is that
> United junked an award-winning, state-of-the-art reservation system and
> adopted the Continental Airlines model based on older technology known as
> System One.

Well, that answers that.

------
caractacus
Not a word about it on United's web site. Flight status page doesn't load
correctly. "Today's Operations" gives an error message. United's Twitter is
silent.

Meanwhile news articles and twitter complaints abound.
[http://mashable.com/2015/07/08/united-computer-problems-
flig...](http://mashable.com/2015/07/08/united-computer-problems-flights-
grounded)

------
imgabe
United is a mess. I had the misfortune of flying with them a couple months
ago. I ended up in a city a 2 hour drive from my actual destination and had to
rent a car on my own to get to where I was going.

That was the worst of it, but almost every flight I saw on the way (both ones
I was on and other flights at nearby gates) was delayed or overbooked or
otherwise messed up in some way.

~~~
boken
If I ever listened to horror stories like these, there wouldn't be an airline
left I would fly with. This is textbook; replace United with Delta, Southwest,
Air Canada, etc., at will. The only company I've used that I haven't heard
exactly this type of complaint about is Widerøe, a tiny regional line in
Norway with a fleet of prop planes. This is unfortunate, as I live in
Pennsylvania and get motion sickness on those little aircraft. And I imagine
that if I spoke Norwegian—it seemed to me that while many Norwegians speak
English, they don't do much complaining in it when the native language would
do—I wouldn't even have Widerøe left.

~~~
driverdan
Overall JetBlue and Virgin have been great. AA and US Air have been good.
United is the worst.

~~~
briandear
AA destroyed a piece of baggage of mine. They denied responsibility. I sued
and won. Then I couldn't collect because that judgement was essentially
nullified because AA was going through bankruptcy.

I have 1k status with United -- I fly internationally with them almost
monthly. The problems I've had with United have been when bags have to be
interlined to Brussels Airlines and occasionally (surprisingly) Lufthansa.
I've also had Lufthansa somehow think it was a good idea for 2 toddlers to sit
in scattered seats rows away from their parents (who were also rows apart as
well, despite having checked in almost 24 hours before the flight and being
Star Alliance Gold.) I've had Brussels Airlines say they were going to "gate
check" a stroller only to have it show up days later. I've been stranded in
Detroit back in the Northwest Airlines days when aircrews hadn't showed up to
work. I've been stuck in Paris when the Air France pilots decide that salaries
up to $300,000 per year just aren't enough. On a recent United trip from
Hartford to Marseille, I was stuck in Hartford for % extra hours for an
airplane that was stuck in Newark (just a <50 minute flight away.) I then
missed a cascade of connections leaving me rather miserable. However, United
sorted the problem and got me on my way as quickly as possible. Let's not
forget Jet Blues antics on multiple occasions a few years ago: a 10.5 hour
tarmac delay, a 7 hour tarmac delay among several other extremely long tarmac
delays. AA had 14 long tarmac (over 3 hours) delays in February. United had
zero long tarmac delays during the same period. Envoy/American Eagle was in
last place for on-time arrivals last year.

I'm not defending United. I'm not disparaging the others. The fact is that the
air transport industry is extremely complex and perceptions of quality are as
varied as their are passengers in the sky.

Every airline sucks and every airline is great. Pick a day, pick a destination
and roll the dice. When you fly often enough it seems like it all averages out
to just one level of melancholic service; unless you're flying on Singapore
Airlines -- then it just becomes sublime.

~~~
juliangregorian
That's not accurate about the Air France pilots -- they were striking because
the airline was moving to replace them with cheaper pilots.

------
Sukotto
What does "all airports" mean? Global or just US domestic?

~~~
fnordfnordfnord
US domestic would be my guess. ie: All airports under FAA authority.

------
danso
Very little news about this on Google News, but heard over the local Chicago
ABC affiliate that the FAA attributed this to an "automation error"

Edit: And its Twitter account has been relatively inactive, with more than 30
minutes since the last reply-to or general tweet...presumably a lot
complaining tweets have come in in the last 30 minutes
[https://twitter.com/united/with_replies](https://twitter.com/united/with_replies)

~~~
lectrick
Just a minute ago they tweeted about it:
[https://twitter.com/united/status/618769524544942081](https://twitter.com/united/status/618769524544942081)

~~~
ceejayoz
Took them until just now to go "oh, we should probably post something that
isn't a @reply".

[https://twitter.com/united/status/618777538865799168](https://twitter.com/united/status/618777538865799168)

~~~
fnordfnordfnord
I bet they could have saved themselves untold numbers of telephone calls in
the support queue, and thousands of in-person queries to ticket agents and
other airport staff with one or two tweets and a facebook post or two.

------
gesman
Interesting bits:

"Departing DEN; taxied and then returned to gate. Pilot says nationwide
failure of "three or four" computer systems. Only information from airport
staff is that since the computers are down UAL can't book pax onto any other
airline ..."

"Systemwide Ground Stop posted at FAA: Due to USER REQUEST DUE TO AUTOMATION
ISSUES. UAL AND SUBS ONLY., departure traffic destined to ALL airport will not
be allowed to depart until at or after 13:15 UTC."

------
vaadu
How do you know it's a glitch(whatever that means) and not a big problem such
as a data center outage or hacker caused?

~~~
oaktowner
Yeah, "glitch" seems to imply something minor, while this seems anything but.

------
DannyBee
Considering just yesterday their flight system didn't believe they flew from
SFO for 2+ hours in the morning (I have screenshots), i'm not all that
shocked.

------
bst287
Currently in the air on a United flight, EWR -> SFO. Took off at 7:30ish. No
mention of this in the airport or on the plane. Yikes

------
pavel_lishin
"Automation error" rings a funny bell in my head, since I'm currently re-
reading "A Fire Upon The Deep".

------
aaronkrolik
Is anyone familiar with the UA tech stack? I'd be curious to see what they're
running.

~~~
imroot
They're running SHARES internally; most everything else is specific
applications with hooks into shares for data processing and return.

~~~
coldcode
Lol HP system. I used to work for SABRE, its far more dependable despite its
antiquity. HP is a marginal player in this market which is dominated by SABRE
and Amadeus.

~~~
yarper
Were they a good employer? Sane codebase?

------
TWAndrews
I was able to get checked in via the United android App around 9am ET.

------
rwestergren
Perhaps someone wasn't following the bug bounty rules?

------
raus22
Off-topic: Please use ISO 8601 format(YYYY-MM-DD) for dates in titles. The US
date format hurts my poor logical soul.

[https://en.wikipedia.org/wiki/ISO_8601](https://en.wikipedia.org/wiki/ISO_8601)

~~~
snarfy
There's little endian, big endian, and the US date format, which I like to
call middle endian.

~~~
chinathrow
Well, to attribute that to SHARES, we should go with big endian ;)

------
johnrydell
Let's just be thankful that we haven't had a major air-based catastrophe due
to an outage or hacking!

~~~
userbinator
Planes with pilots in them are mostly autonomous and can avoid each other as
well as the ground.

[https://en.wikipedia.org/wiki/Ground_proximity_warning_syste...](https://en.wikipedia.org/wiki/Ground_proximity_warning_system)

[https://en.wikipedia.org/wiki/Traffic_Collision_Avoidance_Sy...](https://en.wikipedia.org/wiki/Traffic_Collision_Avoidance_System)

[https://en.wikipedia.org/wiki/%C3%9Cberlingen_mid-
air_collis...](https://en.wikipedia.org/wiki/%C3%9Cberlingen_mid-
air_collision#TCAS_and_conflicting_orders)

~~~
chinathrow
Yes, mostly.

[https://en.wikipedia.org/wiki/Vitaly_Kaloyev](https://en.wikipedia.org/wiki/Vitaly_Kaloyev)

~~~
drzaiusapelord
I wouldn't get on a Russia autonomous flight for a million dollars. Between
the corruption, hackey engineering, and complete disregard for safety
standards, Russian accidents surprise no one. The Tu-154 accident list is long
and scary. Hell, look at all the fires that broke out.

[https://en.wikipedia.org/wiki/Tupolev_Tu-154#Incidents_and_a...](https://en.wikipedia.org/wiki/Tupolev_Tu-154#Incidents_and_accidents)

------
scrumper
That's twice in a short time now - 'coincidence' on Ian Fleming's scale. A
third time is enemy action. Airlines do seem like a pretty juicy target for
cyber war operations - you can cause a gigantic amount of disruption with a
successful attack on a single system.

~~~
eli
That seems like quite a leap given how often complex computer systems fail
without any malicious act.

~~~
mnw21cam
Indeed. Never attribute to malice that which is adequately explained by
stupidity.
[https://en.wikipedia.org/wiki/Hanlon's_razor](https://en.wikipedia.org/wiki/Hanlon's_razor)

