
“An ISP in Asia is leaking routes to a Tier 1 transit provider” - nothingbutslide
https://www.cloudflarestatus.com/incidents/bzknm1t91kjq
======
amalcon
nanog thread:
[http://seclists.org/nanog/2015/Jun/586](http://seclists.org/nanog/2015/Jun/586)

Basically what's happened here is that Telecom Malaysia told one of Level3's
networks (AS3549 - Global Crossing) that it was capable of delivering traffic
to... anywhere on the Internet. Global Crossing apparently didn't have the
usual sanity checks in place.

Because of how BGP works, once GBLX decided the route seemed reasonable, it
immediately proceeded to dump huge amounts of traffic on the tiny Telecom
Malaysia. This isn't incorrect behavior on GLBX's part (accepting the routes
was incorrect, but sending the traffic was correct once they were accepted).
This is _way_ more traffic than Telecom Malaysia was prepared to handle.

Every tier 1 network gets rid of traffic as soon as possible (because it
reduces their costs), completely ignoring performance or whether a route seems
sensible. Telecom Malaysia therefore claimed to the most attractive route from
anywhere near Malaysia to most of the world.

Level3 is one of the biggest telecoms in the world -- and in fact the biggest
by far in that region. This means that most of the Internet probably stopped
working for anyone anywhere near Malaysia.

~~~
tomjen3
Do we know that this happened by accident? Because if all it takes is a few
routing packages that is a very attractive option for any state that wants to
attack another.

~~~
MichaelGG
Hard to say, but accidents do happen.

However, it'd be a good idea for CAs to look at any "domain validated" certs
issued during this time and re-confirm. Really, any service that does insecure
account resets or takeovers should be a bit more wary than usual.

------
peterwaller
This is rather comic:
[https://twitter.com/TMCorp/status/609167065300271104](https://twitter.com/TMCorp/status/609167065300271104)

(That's the twitter account for the currently implicated provider which messed
things up, and for the record it has a minion in a Hawaiian grass dress saying
"Happy Friday!", posted about 10 hours ago.)

~~~
nathas
[http://devopsreactions.tumblr.com/post/87284390953/friday-
de...](http://devopsreactions.tumblr.com/post/87284390953/friday-deployments-
and-leaving-afterwards)

------
hhw
Telecom Malaysia (AS4788) was leaking a full routing table to a Tier 1
network, Global Crossing (AS3549), who in turn was advertising the prefixes to
its peers like Level3 (AS3356). Large portions of the Internet would have been
affected.

This is a double fail, both for Telecom Malaysia for leaking a full routing
table, and for GBLX who apparently isn't filtering prefixes from their
downstream customers or even restricting to a max number of prefixes.

~~~
namecast
I just remembered, GBLX was bought by L3 a few years back - I'm guessing
someone has a "tighten up route maps for GBLX ASN" ticket open in their
backlog at the Level 3 NOC.

The people managing peering for AS 3356 and AS 3549 should be the same group,
no? ("That DB we don't share the URL of" seems to imply as much.)

------
jameswyse
Looks like it's party day at Telecom Malaysia.
[https://twitter.com/TMCorp/status/609167065300271104](https://twitter.com/TMCorp/status/609167065300271104)

~~~
johansch
They added the route at 16:44 local KL time on a friday afternoon. The network
team started partying early? :)

~~~
ars
Malaysia is an Islamic country, so Friday is the equivalent of the American
Saturday to them, i.e. Friday is the first day of the Weekend, and Sunday is a
normal workday.

~~~
atomengine
I'm an American living in KL. Malaysia follows the Monday through Friday work
week. The Twitter message was posted in the AM and this event didn't occur
until the evening.

------
bbrazil
Nanog discussion:
[http://mailman.nanog.org/pipermail/nanog/2015-June/076187.ht...](http://mailman.nanog.org/pipermail/nanog/2015-June/076187.html)

------
sbarre
What is the impact of this problem? Any TL;DR from an expert on here?

~~~
bbrazil
People cannot access some internet sites, depending on who they have internet
connectivity with.

[https://blog.cloudflare.com/route-leak-incident-on-
october-2...](https://blog.cloudflare.com/route-leak-incident-on-
october-2-2014/) has more information from a previous time this happened. A
key idea is that internet routing depends a lot on trust, and it's possible
for a single misconfigured site to cause serious issues across the internet.

~~~
knodi123
I thought the point of the internet was that it was fault tolerant enough to
be deployed on the battlefield?

~~~
voidlogic
This wasn't battle damage, this was operator error. Building a tank to survive
a war and building it to work when driven in a ditch are not quite the same.

------
_jomo
Apparently this is AS4788 (Telekom Malaysia) leaking routes.

[https://twitter.com/rrbone_net/status/609282081420652544](https://twitter.com/rrbone_net/status/609282081420652544)

------
sumedh
I was able to open reddit but I was not able to open wikipedia from Australia.
So I did a tracert.

6 29 ms syd-sot-ken-int2-be-20.tpgi.com.au [203.219.35.68]

7 197 ms 10ge3-7.core1.sjc1.he.net [72.52.66.21]

8 199 ms 10ge1-4.core1.pao1.he.net [72.52.92.113]

9 * Request timed out.

~~~
ryan-c
Looks like it made it to California (San Jose and Palo Alto) with what appears
to be a reasonable round trip time. Weird.

Edit: It seems like that must have come through the Southern Cross Cable[0]
which is nowhere near Malaysia.

0\.
[https://en.wikipedia.org/wiki/Southern_Cross_Cable](https://en.wikipedia.org/wiki/Southern_Cross_Cable)

------
adamlj
I can't access a couple of US sites (like github, linkedin etc.) from my
Swedish ISP Bahnhof.

~~~
radiospiel
Had the same issue from Berlin. (and probably unrelated to the original post
anyways.) But now we are back online!

------
deepnet
Can confirm UK ISP was unable to reach arxiv imgur mit or nytimes.

Ycombinator and reddit were A OK though.

~~~
pyvpx
that is because cloudflare has direct peering with many access networks. most
providers see cloudflare routes directly and not through a transit such as
Level3.

------
ckvamme
Here is an interactive snapshot of this outage:

Capital One uses Level 3 GLBX as a primary ISP:
[https://oxqyi.share.thousandeyes.com](https://oxqyi.share.thousandeyes.com)

(Jump to BGP route visualization and you can see AS 3549)

On the routing plane, you can see that the London GLBX monitor had issues to
many services and that AS4788 Malaysia Telecom was advertised in the route.

Here is an example from a LinkedIn snapshot:
[https://wbpkq.share.thousandeyes.com](https://wbpkq.share.thousandeyes.com)

------
peterwaller
Could someone elaborate on the scope of the problem? Is it a sensible question
to ask "How many routes were leaked?" how big were the prefixes of the
networks leaked?

~~~
elktea
Full table was leaked

------
h43k3r
I am very much interested in learning about such issues. Any links or blogs
that put light on such issues.

~~~
georgerobinson
There is a great chapter on BGP by Hari Balakrishnan. I believe the chapter is
available on MIT Open Courseware. It should be under the Computer Networks
class.

Edit: found it. See Lecture 4: [http://ocw.mit.edu/courses/electrical-
engineering-and-comput...](http://ocw.mit.edu/courses/electrical-engineering-
and-computer-science/6-829-computer-networks-fall-2002/lecture-notes/)

------
piyushpr134
did this happen last night too (about 15 hours from when this post was made)?
I am in India and was not able to reach my servers in Singapore or do a git
push. All other sites worked okay. I scratched my head and changed my dns to
8.8.8.8. It still did not work.

~~~
davidgerard
It's not DNS - it's how the actual packets are supposed to get between you and
the server, that got messed up.

------
gordon_gecco
[https://www.tm.com.my/OnlineHelp/Announcement/Pages/INTERNET...](https://www.tm.com.my/OnlineHelp/Announcement/Pages/INTERNET-
SERVICES-DISRUPTION-12-June-2015.aspx)

------
lucb1e
I see only this:

> Your IP address 5.79.68.161 has been flagged as a scanner. Scanners are not
> permitted. If you are seeing this message in error, please contact
> security@statuspage.io.

Guess I better send an e-mail to see this status page..?

~~~
decasteve
After I switched Tor circuits it worked fine for me.

------
prusswan
just want to ask if there's anything a home user could do to detect unusual
routing behavior/phenomenon? I'm envisioning something like a browser plugin
that logs/monitors outgoing connections and traceroute data

With that, instead of just looking at 404s we can make more informative
observations like this request got stuck at this node, or that request is
routed through a new node which has never been seen before recently..

------
ryanlol
Fuckups like this should result in criminal charges (and immediate depeering).
DoS attacks are illegal in most countries and this is definitely gross
negligence.

~~~
AndrewDMcG
It's actually interesting why this idea is so totally wrong.

So Telecom Malaysia messed up a config, and Global Crossing accepted their
updates automatically.

Global Crossing didn't have to accept the bad update. They generally trust
updates from other organisations that are generally trustworthy. They apply
checks and restrictions proportionate to the risks involved.

These mistakes happen rarely. If they were to happen more often, major
operators would apply more checks and restrictions. If they were to stop
happening, operators would apply less checks and restrictions, because they
have a cost in manpower, complexity, and loss of flexibility.

That's how the internet works. You could almost say that's what the internet
is--the idea of being actively managed by people who know what they're doing
and are not bound by exhaustive predefined policies is defining of how the
internet came about and how it came to be dominant.

If you want a network guaranteed to be resistant to this kind of f---up, build
one. The internet is that network which does not work that way, which is
flexible, expandable, mostly "good enough" but not ever designed for absolute
reliability.

~~~
tomjen3
>It's actually interesting why this idea is so totally wrong.

No it is not. You accept their claims of "mistakes" I see no evidence of that
- how, exactly, do you leak a full table by accident? and this is too big a
security hole to leave to hackers. Leak a bunch of table, shut down an entire
country.

~~~
ryanlol
Even if was a mistake, that doesn't suddenly make it OK. People make mistakes,
yeah. But this isn't a simple mistake, in fact this incident consists of
multiple mistakes.

1) Someone wrote an incorrect config

2) They did not test it

3) They pushed it to production systems without testing it

4) They did not monitor their systems after pushing new configs

5) They took ages to fix the problem after it was detected.

That definitely isn't a single mistake.

~~~
laumars
And how would you propose they test it?

It's a little difficult to test this kind of config without emulating the
entire internet - which is quite clearly beyond the scope of all bar a very
small number of organisations.

