
911 emergency services go down across the US after CenturyLink outage - LopRabbit
https://techcrunch.com/2018/12/28/911-service-outage-centurylink/
======
ergothus
I'm unable to read the article on my phone (ad?) so I apologize if it is
answered there, but why is this a long lasting problem? My past two or three
(not sure) workplaces have had to failover to alternate providers for office
traffic when something broke in the primary, and both had contracts in place
and the technical infrastructure to do so in relatively short order. (I don't
want to ignore the pre- and intra- effort by ops, I'm just saying it paid off)

Is this outage just breaking those places that didn't have such things (as a
result of insufficient budget or skill)? Is there some now-regrettable
contract(s) that prevented that from being in place?

Why is a system that saves lives working without the backup that systems that
dont dictate life are? While I understand that govt has both budget and hiring
issues, i've worked in state govt and there are definitely some highly
competant admins to be found, so I wouldn't assume that is the problem without
word from someone with more detailed knowledge into this particular problem.

~~~
jpollock
Software is hard, and backbones tend to be mono-cultures. This means that
there will be bugs which only become apparent when the full fleet has been
deployed.

This is why large phone networks have really long rollout periods - to
hopefully catch these things before it affects everything.

This is also why emergency services should not be using cell phones for
reliable service during emergencies.

Here's another example from 1990, where the phone network got stuck in a reset
loop.

[https://catless.ncl.ac.uk/Risks/9/62#subj2](https://catless.ncl.ac.uk/Risks/9/62#subj2)

Corporations make similar mistakes with their disaster planning. It doesn't
matter if you purchase network connections from two suppliers if they go
across the same bridge.

~~~
flexer2
My dad spent his career as a natural gas pipeline technician. For a long time
they maintained their own radio towers and VHF or UHF (I forget which) radio
network. Each district had their own comms guy who maintained the network, and
it was very resilient and reliable, allowing the entire company to be in
contact essentially anywhere.

In the early 2000s as a cost cutting measure, the company decided to nix the
radios for satellite phones on each truck. Of course this proved to be
problematic, as the sat phones often had reception issues. They relied on cell
phones as backups, which was also quite foolish as the remote compressor
stations had very poor reception. They also had issues where someone would
leave a voicemail and they wouldn’t get notified of it for days or weeks due
to some issue with AT&T.

After a couple emergencies where communication was identified as a big issue,
they proposed moving back to the old radio system, but they had already sold
off the frequencies and dismantled the infrastructure. My dad retired not long
after this, but the “corporate bean counter” trope rang quite true here, and
in the long run we were all a little less safe because some executive with no
field experience wanted to make a name for themself by saving a little money
on something that proved to be mission critical.

~~~
julianlam
That makes me sad to hear, because not only was this company left with an
inferior solution, the man hours and effort that went into the original
project (including purchasing part of the spectrum!) was essentially
nullified.

It's an alarming trend among new grads as well, who see existing systems as
bloated and in need of rewrites. Who would've thought the bloat was kind of
important?

~~~
scruffyherder
But then we wouldn't have single threaded wonders like node.js

It's part of our culture that the young kids dismantle the 'old people stuff'
setup their new stuff to only re-invent the wheel again and again, run into
the same problems that were solved decades ago to only be repeated again.

Remember how TCL was going to save us all from K&R C? Then it was perl to save
us from TCL? Then Ruby? Python? javascript, Go or whatever is the flavour of
the month?

~~~
XorNot
Sure except you've managed to bitch about most major programming languages
here without proposing any alternatives.

~~~
salawat
The poster's entire point was that all of them came from a desire to find
another option when a reasonable alternative already existed.

~~~
setpatchaddress
The logical conclusion is that Ruby shouldn’t exist because _shakes fist_
those darn kids should write all programs in C instead.

------
Animats
From CenturyLink's Twitter feed:

 _" Update: On December 27, 2018 at 02:40 GMT, CenturyLink identified a
service impact in New Orleans, LA. The NOC is engaged and investigating in
order to isolate the cause. Field Operations were engaged and dispatched for
additional investigations. Tier IV Equipment Vendor Support was later engaged.
During cooperative troubleshooting a device in San Antonio, TX was isolated
from the network as it was seeming to broadcast traffic consuming capacity,
which seemed to alleviate some impact. Investigations remained ongoing.
Following the isolation of the San Antonio, TX device troubleshooting efforts
focused on additional sites that teams were remotely unable to troubleshoot.
Field Operations were dispatched to sites in Kansas City, MO, Atlanta, GA, New
Orleans, LA and Chicago, IL Tier IV Equipment Vendor Support continued to
investigate the equipment logs to further assist with isolation. Once
visibility was restored to the site in Kansas City, MO a filter was applied to
the equipment to further alleviate the impact observed. All of the necessary
troubleshooting teams in cooperation with Tier IV Equipment Vendor Support are
working to restore remote visibility to the remaining sites at this time. We
understand how important these services are to our clients and the issue has
been escalated to the highest levels within CenturyLink Service Assurance
Leadership."_

~~~
walrus01
This info is related to their much earlier DWDM/transport system outage, which
took down a huge number of 10Gbps lit transport circuits starting around
0700-0800 Pacific time on the 27th. The 911 outage for WA began 10+ hours
later.

see outages mailing list archives for more details:

[https://puck.nether.net/mailman/listinfo/outages](https://puck.nether.net/mailman/listinfo/outages)

~~~
ct520
This effected Washington, and its still very much on going 24hrs later. Geez
is it a pain.
[https://downdetector.com/status/centurylink/map/](https://downdetector.com/status/centurylink/map/)

~~~
FireBeyond
Thurston County came back online about 20 minutes ago (0900 PST).

------
mekaj
There was an incident with similar consequences on April 10, 2014. The cause
was a programmed threshold being breached and the impact was 6h of downtime.

Source: "The Coming Software Apocalypse" published by The Atlantic
([https://www.theatlantic.com/technology/archive/2017/09/savin...](https://www.theatlantic.com/technology/archive/2017/09/saving-
the-world-from-code/540393/))

walrus01 also linked to
[https://www.fcc.gov/document/april-2014-multistate-911-outag...](https://www.fcc.gov/document/april-2014-multistate-911-outage-
report) in another comment.

~~~
PhasmaFelis
> _Operated by a systems provider named Intrado, the server kept a running
> counter of how many calls it had routed to 911 dispatchers around the
> country. Intrado programmers had set a threshold for how high the counter
> could go. They picked a number in the millions._

I'm really curious if there's some explanation that makes this sound less
catastrophically stupid, particularly the part where they picked a threshold
less than INT_MAX.

~~~
swiftcoder
What do you mean, VARCHAR(8) is a perfectly valid way of storing counters...

~~~
ineedasername
much better to future proof it by using a BLOB. You know, just in case you
ever need to shove something else in there.

~~~
AdamJacobMuller
if you're going to use a BLOB how can you know the type and how to decode it?
Make sure to serialize that data with something like protobuf.

~~~
oneplane
Nah, describe it using some custom binary format and have a 1000 page spec to
document it!

~~~
justinclift
Yay, seralised bluetooth! ;)

------
_eht
This will be a post mortem to read. No details yet but I’m wondering where the
broken links are? Is it a VOIP glitch or more than that?

Edit: VOIP indeed. Centurylink outage:
[https://komonews.com/news/local/centurylink-outage-knocks-
ou...](https://komonews.com/news/local/centurylink-outage-knocks-
out-911-calls-in-parts-of-w-wash)

~~~
paganel
Today I’ve learned that vast swathes of the emergency call network depend on
just one ISP. I wonder if the old, analogue and copper-only network wasn’t
more resilient.

~~~
Animats
It was. In the entire history of electromechanical switching in the Bell
System, no central office was down for more than 30 minutes for any reason
other than a natural disaster or major fire. There are books about Number Five
Crossbar and how it worked, and they're worth reading if you design high-
reliability systems.

The only central office downtime from a major fire was in 1975 in New York
City, resulting in 23 days of downtime.[1] All the incoming cables and the
main distributing frame had to be replaced. The switching equipment, on upper
floors, survived and just needed some maintenance and cleaning. That was still
a crossbar office; it hadn't been converted to electronic switching yet. Worst
disaster in the history of the Bell System. Bell switched to less flammable
cable coverings, like plenum cable, after that.

Widespread failure of 911 service suggests an overcentralized architecture.
911 requires a phone number to address lookup, so there's a database involved.
Widespread failure indicates this was implemented as a remote query service
("in the cloud") rather than read-only database copies of the directory at
each central office.

[1] [https://youtu.be/f_AWAmGi-g8?t=110](https://youtu.be/f_AWAmGi-g8?t=110)

~~~
tjohns
There's an old AT&T video showing the careful coordination and huge amount of
manpower needed to upgrade a single telco switch. They had rows of workers
with cable cutters, in order to literally cut the entire service over to the
new equipment. Total downtime was just 47 seconds.

(And even during the cutover, they're careful not to interrupt any emergency
calls.)

[https://www.youtube.com/watch?v=saRir95iIWk](https://www.youtube.com/watch?v=saRir95iIWk)

~~~
the_duke
So, what exactly happened there?

Where were the signals going after the cables have been cut?

~~~
Animats
The new switch was connected to each line in parallel with the old switch, but
it was able to ignore incoming signals until told to take over. The old switch
was hard wired enough that it could not be remotely commanded to have no
effect on incoming signals. The line relay attached to each line had to be
physically cut out of circuit.

------
armchairanalyst
CenturyLink is critical to 9-1-1 services throughout the U.S. They have a
9-1-1 division to which a very large number of municipalities outsource. Even
military installations outsource parts or all of their 9-1-1 operations to
CenturyLink.

This CenturyLink division has historically acted very carelessly in regards to
securing these systems. Getting them to take ownership of their contractual
obligations can be difficult.

Criticism of their operations is very sensitive. Pointing out any
vulnerabilities can be prosecuted as an act of terrorism. A post as simple as
this one comes with significant risk.

~~~
stingraycharles
While I'm not refuting any of your claims, do you have a source for them? It
sound like typical big corp behavior, albeit worse because of the importance
of emergency services.

~~~
nickpsecurity
I at least found the 911 unit:

[https://www.centurylink.com/wholesale/pcat/911.html](https://www.centurylink.com/wholesale/pcat/911.html)

~~~
Operyl
To the best of my understanding, they aren’t providing the 911 call center
here.

------
rococode
Interestingly the University of Washington put out an email notice nearly an
hour before AlertSeattle sent one. You'd think the city would make it a higher
priority to let everyone know that 911 is down. Not sure if they did a text
message earlier though, currently overseas w/o service.

[https://i.imgur.com/2rlxi7f.png](https://i.imgur.com/2rlxi7f.png)

More info on the outage here:
[https://www.king5.com/article/news/local/nationwide-
centuryl...](https://www.king5.com/article/news/local/nationwide-centurylink-
outage-affecting-911-calls-in-western-
washington/281-1e461abb-8b3c-49e2-8a57-764e39466213)

~~~
reaperducer
_Interestingly the University of Washington put out an email notice nearly an
hour before AlertSeattle sent one. You 'd think the city would make it a
higher priority to let everyone know that 911 is down._

How long do you think it takes to send out a few million alerts versus sending
a few thousand?

~~~
pbhjpbhj
Isn't there an emergency alert system -- should take as long as sending a
radio signal?

~~~
reaperducer
EAS notification would have been virtually instant. But it's supposed to be
used only for immediate life-and-death situations (tornadoes, nuclear war,
chemical plant explosions).

Amber/cell phone alerts have a lower threshold for pulling the trigger.

~~~
roywiggins
I think Seattle eventually activated the local EAS over the radio during the
911 outage, my parents mentioned hearing it, though it was very scratchy and
not very intelligible.

------
vegardx
And this is why I've made it a habit to save the the local number to the
police station where I live, and after moving around a little I seem to have
accumulated a few. Always nice to have a backup should you need it.

~~~
riskable
Not just the police department... Fire, ambulance, poison control, dig safely,
local traffic control tower (they're about ~9 miles away from me), pharmacy,
and suicide prevention (every fridge needs one of those magnets).

~~~
lukeschlather
Seems like the government could very cheaply offer an app that is basically
just a geolocated lookup for these sorts of things. The app could basically
just be a spatialite database generated from a git repo, and it would only
have to be updated periodically, and would never go down.

~~~
otachack
They should but I think it would be a great community project. I'd be
interested in helping out, been working on apps for ~3 years.

------
tracker1
> Ajit Pai, chairman of the Federal Communications Commission, which regulates
> and monitors 911 services, said the commission is investigating the outage.

Until they "resolve" the issue without disclosing any details...

~~~
josteink
Maybe they will open the case for public comments and insights, then “ignore”
those comments because of a “DDOS attack”, and then finally not investigate
the alleged attack despite earlier on outlining how it was able to cripple
public infrastructure.

Could there be a more mismanaged agency than the FCC?

~~~
clubm8
>Could there be a more mismanaged agency than the FCC?

or maybe they're all just as mismangaged, but you noticed them in partiular
becaue you are passionate about the FCC's mission?

I'd be willing to bet if the average American interacted with fincen, FDA, IRS
etc they'd be shocked at the ineptitude. America has been strangling their
enforcement agencies for a long time - one could argue there's a direct path
from a poorly funded IRS only taking on "slam dunk" cases and ignoring complex
money laundering schemes to Trump's election.

~~~
iscrewyou
The fact that it takes a special prosecutor (and the subsequent reporting) to
dig into these big money laundering schemes only cements your point even more.

------
tarellel
Could this happen to have been part of the massive CenturyLink/Verizon outage
that's been rolling across the midwest the last day or 2? I know it's been a
major pain in the butt for me, but I've been super productive at work not
having to worry about answering calls.

~~~
walrus01
network engineer for pacific NW ISP here: Yes this is absolutely centurylink
related, from what I can observe that it's taken down, and I have my
suspicions about Intrado. I think centurylink broke things further trying to
fix this morning's DWDM/transport system outage.

See the PDF file linked here:

[https://www.fcc.gov/document/april-2014-multistate-911-outag...](https://www.fcc.gov/document/april-2014-multistate-911-outage-
report)

~~~
mmmBacon
Was this 10G DWDM gear? Any idea who’s equipment and what root cause was? I’m
just curious because I used to work for a vendor that supplied DWDM gear to
Centurylink and wonder if this is equipment I designed. 10G DWDM is pretty
robust so must have been a pretty massive failure, line amplifier, fiber cut,
software upgrade, or something.

~~~
walrus01
nothing confirmed yet, it's all rumors and conjecture.

[https://fuckingcenturylink.com/](https://fuckingcenturylink.com/)

------
tzs
At least the systems to notify of the outage sure worked well, at least for
me.

Got an EAS alert while watching TV and two emergency alerts on my cell phone
(one generic telling me to call local police or fire, and one telling me a
specific number to call for my county).

Furthermore, I now think that a call that came in on my landline at 10:42 PM
was probably also a notification. The caller ID just said "wireless caller"
and I didn't recognize the number so I ignored it, but now Googling that
number turns it up as a cell contact number for my county's Department of
Emergency Management.

------
BucketSort
The incompetence of the Roman government once became so great that they lost
the ability to provide basic public utilities like the keeping of time. Is
this that for us? How could 911 possibly go down? Crazy.

~~~
StudentStuff
911 is a creaky old system put in place by a Chicago mayor looking to score
some points with the electorate. There are thousands of (poorly run,
uncoordinated) Public Safety Answering Points across the country, and most
tend to contract with just the local incumbent telecom, ensuring any time
there is an issue with said incumbent, every call to said PSAP fails.

In the context of most of these PSAPs having already moved to VOIP, there
isn't a coherent reason why these calls shouldn't go straight from the carrier
originating the call to the PSAP, without middlemen like Centurylink standing
by to misroute those calls.

~~~
JadeNB
> 911 is a creaky old system put in place by a Chicago mayor looking to score
> some points with the electorate.

I'm always interested in Chicago history, so I went looking, but couldn't find
anything at, for example,
[https://www.nena.org/page/911overviewfacts](https://www.nena.org/page/911overviewfacts)
. Do you know anywhere this history is written up?

~~~
JadeNB
Ah, I may have seen what you mean. According to
[https://www.countyofunion.org/site/cpage.asp?cpage_id=180009...](https://www.countyofunion.org/site/cpage.asp?cpage_id=180009766&sec_id=180003667)
:

> 1976 Chicago claims to have had "the first enhanced 911 system of any major
> city" in the United States.

… so I guess in

> > 911 is a creaky old system put in place by a Chicago mayor looking to
> score some points with the electorate.

you meant E911 (which, as the linked page points out, is a nebulously defined
and variously implemented technology), not necessarily 911?

------
jrockway
I'm surprised cell towers can't tell connected phones what the local emergency
number is, and then dial that instead when 911 isn't working correctly.

~~~
g_p
Cell networks (but not towers) effectively do this already - when you dial 911
(or 112), the phone doesn't actually dial the number "911" or "112"; instead
it requests a call of type emergency. This is routed by the mobile network
itself, according to how it handles emergency calling.

The phone also enters a specific "access class" (access class 10) for
emergency calling mode, which lets it join other masts on networks it wouldn't
normally operate on. (This is where the message "emergency calls only"
originates from)

But if the onward system this "un-numbered" emergency call would be passed
onto is unavailable (as it sounds like is the case here) then emergency calls
will fail. It might be possible to dynamically reconfigure the mobile network
core , but you'd need to configure this on the core, for each and every cell
tower or group of cell towers, to route the call to the right alternative
lines/number. Whether that system has capacity to handle all these calls is
another question.

~~~
jsjohnst
> for each and every cell tower or group of cell towers, to route the call to
> the right alternative lines/number

Cell towers have virtually nothing to do with routing calls. That’s all
centralized in a region with all the towers back hauling.

------
Trisell
From my experience for a lot of these 911 centers they are centered around
counties(not all, but a lot). Back when they were first being set up with the
internet capacity there were two issues. DSL was the only available service
within the area, and the centers couldn’t afford large enough pipes. Talking
early 2000s here. So a lot of these centers got pretty large grants from the
feds.

They then hired one of the companies at the time that were doing these
upgrades in bulk around the country when the money was flying because the fcc
wanted all of the emergency services to move to narrow band from wide band.

These companies came in updated everything to modern and setup the contracts
with the DSL providers back in the day(in my rural town I grew up in they just
got 80mb cable internet in 2016, prior to that 10mb dsl was the best you could
get there) In the west that was Qwest who became Century Link.

But government being what it is. A lot of these 911 centers have languished
because there is no desire to invest in updating these systems as they are
expensive and hve a difficult upgrade process. So a lot of these centers are
just paying to bills and trying to scrape by with minimal funding. And so
upgrades and redundancy aren’t even considered. Until something like this
happens. And the feds might throw some money at the problem. All the centers
will get grants to have failover and then will go back into dormancy of just
working day to day with no budget.

------
petee
Mass says they restored 911 by 9am, but I just got an alert at 12 noon saying
it's down now...

------
ConcernedCoder
I was in a grocery store when my cellphone, in unison with all the other
phones in the store gave the clarion call for attention... I thought Russia
had finally launched a 1st strike...

------
zipwitch
It wasn't just emergency calls. Most Centurylink service around the US was
down intermittently yesterday.

~~~
aviv
A lot of Centurylink circuits are still down in Arizona, and if not down
completely, have significant packet loss.

~~~
reaperducer
A friend texted me that his POS systems that rely on CenturyLink in Nevada
have been down for two days. The stores can only do transactions in cash.

Makes you think twice about the cashless utopia the HN crowd imagines.

~~~
smileysteve
Or to switch to crypto currency that is distributed and eventually consistant

~~~
dvtrn
I swear the swiftness with which people swoop into these kinds of threads to
suggest crypto/blockchain is bordering on becoming a meme.

If the network is fully down, I don't see how the scenario for this business
owner changes with blockchain, they'd have the exact same problem with a
register that can't close an ACH: backlogged transactions that need to catch
up eventually, meaning some card types are going to decline if the POS can't
reach a clearing house utilized by the card vendor.

~~~
smileysteve
Even if you wait for 15 confirmations, that's easily fufilled by the others in
a market.

Similarly, for it to make it to the chain only 1 of those 15 confirmations has
to connect to any larger network sometime in the future.

------
anticensor
Why does a congressman not step up and draft a law stipulating:

    
    
        * US telecom, an office of US federal government shall
          be founded and it shall serve all federal and state 
          governments' institutions. US telecom shall be funded
          by federal sales tax. Government institutions
          themselves shall not be billed.
        * Emergency services number shall be 911.
        * 911 shall be operated by US telecom and US telecom
          shall pay private communication providers in
          international rate every time 911 is called.
        * Private providers shall not bill
          911 calls to their subscribers.

------
JustSomeNobody
Is it me or did TechCrunch recently ruin their mobile experience. I tried to
read the article but got so frustrated and distracted by the wonky scrolling I
gave up. It’s just awful.

------
tybit
Is there any information available about how these systems are typically
architected and run as well as what sort of availability they generally
achieve?

~~~
Aloha
I can tell you what 911 Call Taking systems look like, they are mostly a call
presentation system, with some call handling features. This issue is upstream
of any 911 center however.

------
mirimir
context:
[https://news.ycombinator.com/item?id=18775352](https://news.ycombinator.com/item?id=18775352)

~~~
dang
Since the current post adds information, we'll leave it up instead of marking
it as a dupe, and have moved most of the non-time-sensitive comments here.

------
walrus01
Notes from their massive DWDM system outage earlier on the 27th, which
preceded the 911 outage:

    
    
        2018-12-28 13:35:00 GMT - Efforts by the Equipment Vendor and CenturyLink engineers to apply the filters and remove the secondary communication channels in the network continue. The previously provided ETR of 09:00 GMT remains.
        2018-12-28 13:27:30 GMT - The Equipment Vendor and CenturyLink engineers continue work to apply the filters and remove the secondary communication channels. Field Operations and Equipment Vendor dispatches to recover nodes locally remain underway. Services continue to restore in a steady manner as troubleshooting progresses following the recovery of nodes. CenturyLink NOC management remains in contact with the equipment vendor to obtain updates as restoration efforts continue.
        2018-12-28 11:04:24 GMT - CenturyLink continues to work with the Equipment Vendor to apply the filters and remove the secondary communication channels. Field Operations and Equipment Vendor dispatches to recover nodes locally remain underway. Client services continue to restore in a steady manner as troubleshooting progresses following the recovery of nodes.
        2018-12-28 10:05:18 GMT - CenturyLink NOC Management reports steady progression of node recovery and restoral of client services. In addition to the remote node recovery process, Field Operations continue to dispatch and assist the Equipment Vendor with local equipment login.
        2018-12-28 08:51:29 GMT - CenturyLink NOC Management has advised that repair efforts are steadily progressing, and services are incrementally restoring. The Equipment Vendor and CenturyLink engineers continue work to apply the filters and remove the secondary communication channels at this time. There have been additional restoration steps identified for certain nodes, which includes either line card resets or Field Operations dispatches for local equipment login, that have impeded the restoration process. Various repair teams are working in tandem on these actions to ensure that services are restored in the most expeditious method available. Restoration efforts are ongoing.
        2018-12-28 07:12:32 GMT - Efforts by the Equipment Vendor and CenturyLink engineers to apply the filters and remove the secondary communication channels in the network continue. Additional information on repair progress will be available from the Equipment Vendor by 07:30 GMT. Information will be relayed as soon as it is obtained.
        2018-12-28 06:00:01 GMT - Efforts by the Equipment Vendor and CenturyLink engineers to apply the filters and remove the secondary communication channels in the network continue. The previously provided ETR of 09:00 GMT remains.
        2018-12-28 04:58:44 GMT - CenturyLink engineers in conjunction with the Equipment VendorÂ¿s Tier IV Technical Support team have identified the elements causing the impact to customer services. Through the filters being applied and the removal of the secondary communication channels, it is anticipated services will be fully restored within four hours.Â¿We apologize for any inconvenience this caused our customers. Additional details regarding details of the underlying cause will be relayed as available.
        2018-12-28 04:09:31 GMT - The Equipment VendorÂ¿s Tier IV Technical Support team in conjunction with CenturyLink Tier III Technical Support continues to remotely work to remove the secondary communication channel tunnels across the network until full visibility can be restored, as well as applying the necessary polling filter to each of the reachable nodes.
        2018-12-28 02:53:38 GMT - The Transport NOC has confirmed that cooperative efforts remain ongoing to remove the secondary communication channel tunnel across the network until full visibility can be restored, as well as applying the necessary filter to each of the reachable nodes. It has been confirmed that both of these actions are being performed remotely, but an estimated time to complete the activities is not available at this time.
        2018-12-28 01:58:56 GMT - Once the card was removed in Denver, CO it was confirmed that there was no significant improvement. Additional packet captures, and logs will be pulled from the device with the card removed to further isolate the root cause. The Equipment vendor continues to work with CenturyLink Field Operations at multiple sites to remove the secondary communication channel tunnel across the network until full visibility can be restored. The equipment vendor has identified a number of additional nodes that visibility has been restored to, and their engineers are currently working to apply the necessary filter to each of the reachable nodes.
        2018-12-28 00:59:04 GMT - Following the review of the logs and packet captures, the Equipment Vendor's Tier IV Support team has identified a suspected card issue in Denver, CO. Field Operations has arrived on site and are working in cooperation with the Equipment Vendor to remove the card.
        2018-12-27 23:57:16 GMT - The Equipment Vendor is currently reviewing the logs and packet captures from devices that have been completed, while logs and packet captures continue to be pulled from additional devices. The necessary teams continue to remove a secondary communication channel tunnel across the network until visibility can be restored. All technical teams continue to diligently work to review the information obtained in an effort to isolate the root cause.
        2018-12-27 22:52:43 GMT - Multiple teams continue work to pull additional logs and packet captures on devices that have had visibility restored, which will be scrutinized during root cause analysis. The Tier IV Equipment Vendor Technical Support team in conjunction with Field Operations are working to remove a secondary communication channel tunnel across the network until visibility can be restored. The Equipment Vendor Support team has dispatched their Field Operations team to the site in Chicago, IL and has been obtaining data directly from the equipment.
        2018-12-27 21:35:55 GMT - It has been advised that visibility has been restored to both the Chicago, IL and Atlanta, GA sites. Engineering and Tier IV Equipment Vendor Technical Support are currently working to obtain additional logs from devices across multiple sites including Chicago and Atlanta to further isolate the root cause.
        2018-12-27 21:01:26 GMT - On December 27, 2018 at 02:40 GMT, CenturyLink identified a service impact in New Orleans, LA. The NOC was engaged and investigating in order to isolate the cause. Field Operations were engaged and dispatched for additional investigations. Tier IV Equipment Vendor Support was later engaged. During cooperative troubleshooting a device in San Antonio, TX was isolated from the network as it was seeming to broadcast traffic consuming capacity, which seemed to alleviate some impact. Investigations remained ongoing. Following the isolation of the San Antonio, TX device troubleshooting efforts focused on additional sites that teams were remotely unable to troubleshoot. Field Operations were dispatched to sites in Kansas City, MO, Atlanta, GA, New Orleans, LA and Chicago, IL. Tier IV Equipment Vendor Support continued to investigate the equipment logs to further assist with isolation. Once visibility was restored to the site in Kansas City, MO and New Orleans, LA a filter was applied to the equipment to further alleviate the impact observed. All of the necessary troubleshooting teams in cooperation with Tier IV Equipment Vendor Support are working to restore remote visibility to the remaining sites at this time. Tier IV Equipment Vendor Technical Support continues to review equipment logs from the sites where visibility was previously restored. We understand how important these services are to our clients and the issue has been escalated to the highest levels within CenturyLink Service Assurance Leadership.

~~~
kyrra
Sysadmins and networking subreddits were tracking this as well.

[https://www.reddit.com/r/networking/comments/a9z6tb/centuryl...](https://www.reddit.com/r/networking/comments/a9z6tb/centurylink_outage_west_coast/)

Some areas were fully out. While others (like Denver) saw partial
outages/packet-loss as exit nodes depending on the dest were having problems.

~~~
emeraldd
That thread has some very interesting tidbits. (like this one:
[https://www.reddit.com/r/networking/comments/a9z6tb/centuryl...](https://www.reddit.com/r/networking/comments/a9z6tb/centurylink_outage_west_coast/ecoolf8/)
)

------
zw123456
My two cents... When a company lays off their most experienced engineers year
over year over year... this is what happens. Just my humble opinion of the
real root cause.

------
shostack
Given the past and recent news about ongoing infrastructure attacks the US has
been seeing, reminiscent of Ukraine, is there any possibility this is related?

------
IronWolve
Not sure how common the problem still is, but some custom roms for android
phones could not use shortcodes, and could not dial 911.

I'd recommend having the full numbers programmed in your phones for
emergencies if your playing with new roms.

Also, federal alerts did not go out for the 911 outage, but major cities did.
So subscribe to the local alert systems.

------
StillBored
"In this case, the outage affected only cellular calls to 911, and not
landline calls."

Which means its basically the system which determines which 911 center to
route a cell phone based 911 call. That this is still centralized, rather than
distributed to the geographical towers is crazy.

------
drdeadringer
After reading about this earlier today I had the amusing experience of
tweeting my local PD asking if "our" 911 was still operational given the
situation; apparently it is given their reply tweet.

I do still have the actual "direct" local phone numbers of dispatches for the
areas I most frequent.

------
clubm8
Is this outage for both mobile (cell phones) and landlines?

Does dialing the direct number (XXX-XXX-X911) work?

I've had issues on both when there was no widespread outage to the point I've
given up on the utility of 911 and try to put the local police department's
number in my phone for emergencies.

------
iheartpotatoes
From 1990, these guys were so ahead of their time (yea, I know, institutional
racism != IT infrastructure, but still...):

[https://www.youtube.com/watch?v=CPNK0VspQ0M](https://www.youtube.com/watch?v=CPNK0VspQ0M)

------
Waterluvian
Is 911 in the U.S. still that wildly disparate patchwork of networks and
operations centres?

~~~
gwright
You say that as if it is a bad thing. Decentralizing key infrastructure and
emergency response systems seems like a good idea.

~~~
wpietri
Absolutely. Especially given this widespread failure, the last thing I want is
more central control.

~~~
minikites
There needs to be some central or regional oversight body to ensure standards
are being met, equitable access is being maintained, etc. A lot of poor areas
are undeserved by these patchwork networks and several infrastructure
deficiencies can compound. Example: poorly maintained roads leads to potholes,
those lead to drivers swerving or getting otherwise surprised leading to
injuries, which are insufficiently addressed by local healthcare. Imagine if
we had a tyrannical electric utility who cut your power for a bill paid one
day late.

~~~
gwright
> There needs to be some central or regional oversight body to ensure
> standards are being met

You mean something like your local, county, or state governments and their
agencies?

------
iask
So, now there is proof that our national emergency 911 system has a possible
vulnerability. Hope this will be addressed with some prioritity.

------
Schmazo
Should cell phone users be plugging in 360-693-3111 as an emergency backup in
the event 911 goes down in the future?

------
smaili
I'm quite surprised no fallback/failover was put in place for a critical tier
service like this.

------
epynonymous
time to switch to aws

------
herostratus101
lol, The Purge (2018)

------
redleggedfrog
“When an emergency strikes, it’s critical that Americans are able to use 911
to reach those who can help,” said Pai in a statement. “The CenturyLink
service outage is therefore completely unacceptable, and its breadth and
duration are particularly troubling.”

In response CenturyLink is now offering a new service, called "911 Fastlane",
that guarantees that your 911 calls will get through and will be prioritized
ahead of calls from standard CenturyLink accounts. This service is only an
additional $10 a month.

Of course, satire.

~~~
ryanlol
Why did you feel the need to post this weird piece of NN propaganda in a
completely unrelated thread?

~~~
icedchai
It seems related to me.

~~~
ryanlol
Could you help explain why? I really don't see any connection between NN and
this story.

I understand that NN is an important issue for the BigCos employing many HN
users, but I'm not really convinced we need their talking points to be brought
up in every single vaguely network related thread.

~~~
icedchai
Ok, so it's tangentially related. Even though it was meant in satire, it seems
like something that a big telco _would_ try to do.

------
EGreg
Why is it centralized??

~~~
JeremyBanks
Put it on the blockchain.

------
reasonablemann
Strange Apple doesn't push their own tech for this. They could include it in
Messages and release on Android. For Android there would be a store button
where you could see all the new stuff from Apple and buy an iPhone. Would do a
service to the 50% of the world's population who's entire community
communication network is owned and mined by FB.

~~~
emeraldd
Apple doesn't do this kind of infrastructure. Their stuff only works if the
rest of the network works, which seems to be what's falling over here.

