
BGP leaks causing internet outages in Japan and beyond - zakki
https://bgpmon.net/bgp-leak-causing-internet-outages-in-japan-and-beyond/
======
notyourday
That's because for all the talk about Google its network engineering drives
networks like a buzzed 19 year old drives his fathers Porsche.

Google does not believe in BGP filtering. They just don't. When someone brings
up a BGP peering with Google that someone announces _any_ prefixes to Google
without first registering it. When asked "Huh? How do you ensure that I do not
announce someone else's address space to you" Google's response becomes
something akin to "We are Google, we have a very complicated system that
prevents that from happening. It will detect the issue and address it
automatically. We would build your filter lists based on those announcements"
At the same time, the same people say that prefixes advertised to Google over
PNIs take _hours_ to propagate across the entire Google network.

BGP filtering of prefixes to the address space registered to the peer is a
basic hygiene, something that Google simply does not believe it has to do.

~~~
fach
Based on the article, you probably should be s/Google/Verizon here. Yes,
Google's export policy was misconfigured towards Verizon but Verizon blindly
accepted and propagated these prefixes.

~~~
notyourday
Since Google does not register or filter routes it receives it is impossible
for VZ or anyone else to know what are the "correct" routes that Google
advertises v. what should be filtered.

Look, we have been though this before, in 1994, 1995, 1996, 1997, etc.

Sprint (1239) used to filter based on AS_PATH. They stopped after FLIX
incident.

~~~
fach
They don't?: [https://pastebin.com/5ngq1cJi](https://pastebin.com/5ngq1cJi)

~~~
notyourday
Right now Google announces slightly more than 440 IPv4 routes over PNIs.

The list in radb is :

whois -h whois.radb.net '!oMAINT-AS15169' | grep ^route | awk '{print $2}' |
grep -v "::" | wc -l

6180

I picked four prefixes. 3 were in RADB. 1 was not.

~~~
fach
Doing a 1:1 comparison between what you're currently receiving on import and
RADB isn't completely fair given most folks who preallocate address blocks
will register route objects long before they are actually used.

What is preventing you from crafting your import policies based on this data?
Google is clearly creating route objects for most of their prefixes and
rejecting a handful of prefixes vs. the risk of accepting at worst a full
table seems like a reasonable tradeoff. This is something Verizon could have
done, and something other folks like Level3 have done for some time.

~~~
notyourday
1) It is missing routes that they are advertising.

2) When asked "Are you using RADB/Altdb entries to filter routes/should we use
those?" being told "No".

If Google used that basic hygiene then it would not be announcing routes it
does transit.

~~~
fach
There's an important aspect of BGP you're overlooking: mutual acceptance. If
one party exports a prefix, the other party can choose to either reject or
accept the prefix. If the former party does not advertise the prefix or the
latter party does not accept the prefix, no unidirectional forwarding path is
established. Yes, Google could have derived their export policy based on their
RADB entries, which would have prevented this issue. But Verizon could have
also derived their import policy based on their RADB entries, which would have
prevented this. While Google is at blame for fucking up their export policy,
Verizon is at blame for simply accepting these prefixes.

~~~
notyourday
This is 2017. We have had this debate in 1994.

We have also had this debate when smd proxy aggregated routes because certain
network was announcing every /24s instead of /12s causing certain routers to
run out of memory ( I'm pretty sure those were AGS+ ). It came known as "you
will aggregate or I will aggregate it for you and you won't like it". While it
was done just for a few hours the consequences were rather unforeseen.

Right around that time it was determined that no one outside the AS knows why
the AS is choosing to announce routes in a specific way and those outside it
were better not be "smart" over it. That was also around the time it was
decided that one simply registered everything _correctly_ and announced only
what was registered and announced it the way it was registered.

------
justinjlynn
What ever happened to network ops implementing RFC7454/BCP38!

> Google is not a transit provider and traffic for 3rd party networks should
> never go through the Google network.

Why would you _ever_ purposely configure your router to transit traffic via
them?

~~~
toast0
Google runs a couple different ASNs, so you might need to allow that as
transit. BGP is notoriously easy to configure too open, but prefix limits
might help; OTOH, dropping sessions to Google is going to cause a lot of
headaches, so extra caution is required.

~~~
justinjlynn
That's certainly true. Generally speaking though, it's up to one's own network
to know which ASs Google are routing for - and filter ranges not owned by any
of those ASs. At worst you should route out of your known transit networks and
(for the time google is being stupid) simply get non-optimal routing. Ideally
your PNI agreement will include cost recovery for bad-acts like advertising
non-owned space (but they probably won't) causing session drops and thus
making your transit bill go up. Google being stupid (or any PNI, non-transit
peer) should never cause your network to drop customer packets on the floor
due to completely invalid routing.

~~~
notyourday
They can't. Google refuses to register its routes!

~~~
justinjlynn
If true, WTF Google...

~~~
trapperkeeper74
That's the point. Google trying to be special instead of playing nicely with
other kids.

~~~
justinjlynn
Hmm, looking into radb via the following:

whois -h whois.radb.net '!iAS-GOOGLE,1' | head -n2 | tail -n1 | sed 's/ /\n/g'
| xargs -n1 -i{} whois -h whois.radb.net '!oMAINT-{}' | tee google-radb

cat google-radb | grep '^route:' | grep -v '::' | sort | uniq | awk '{print
$2}' | wc -l

yields 7762 announced unique IPV4 routes at least 10% of which (100% of
checked routes) have a suitable reverse route query listed in radb via:

cat google-radb | grep '^route:' | grep -v '::' | awk '{print $2}' | shuf |
head -n770 | xargs -n1 -i{} whois -h whois.radb.net '!r{},o' | tee google-
radb.routesample.lookup | grep '^C' | wc -l

What are they announcing that's missing from information gathered there?

Edit: Hrmn... looking at
[http://thyme.rand.apnic.net](http://thyme.rand.apnic.net) data and comparing
it with what google have registered in radb (among their various AS
numbers)... they do indeed have a number of advertisements that are
unregistered or are at least more specific than their registration (no exact
match)

Results of the comparison at
[https://pastebin.com/raw/P9KMG0ri](https://pastebin.com/raw/P9KMG0ri)

~~~
packetslave
Nice analysis. Passed along to the folks working on the (internal) postmortem
for this outage.

~~~
justinjlynn
Since I've had a moment to clean this up, here's a more accurate grepcidr
(amazing tool for this -- aggregation is an awful hack and actually adds more
noise than it takes away; sorry to your post mortem team) and some
analysis/histo -- this is much clearer:

[https://pastebin.com/raw/96sXXEe1](https://pastebin.com/raw/96sXXEe1)

~~~
justinjlynn
may want to change the grep on the data-raw-table to grep -wF '{}' ... the
other might miss google AS in the middle of a multi-hop route.

------
redm
A really interesting read. I agree BGP leaks are a great risk to instability,
but that could be said about any glitches that affect major backbones like NTT
and Google (not a backbone but it's so well connected...). BGP Routing issues
happened numerously, and will likely continue. Last year with Telia too had 4
or 5 "glitches" as they upgraded their network. [1] I talked to them about it
and the mitigation is always the same, be more careful, peer review,
additional filters, etc.

Since each ISP implements BGP/Routing Tables/Topology in their own way, I'm
not sure what you would do about this, other than choosing your peers
carefully and filter any crazy route changes.

[1]
[http://www.theregister.co.uk/2016/06/20/telia_engineer_blame...](http://www.theregister.co.uk/2016/06/20/telia_engineer_blamed_massive_net_outage/)

~~~
scurvy
The NTT in the article isn't the NTT America that you're thinking of. NTT
America is the "backbone" company formed from the Verio purchase, then left to
do their own thing largely outside the control of NTT Japan.

------
imdsm
What is BGP?

~~~
geofft
Let's say you have three ISPs, Red, Green, and Blue. Red and Green have a
cable connecting them somewhere, and Green and Blue have a cable connecting
them somewhere. Each of them send notices using BGP to each other saying, "Hi,
this is my subnet, if you want to route to that subnet you should send packets
to me." But Red and Blue also want to communicate to each other, so Green will
_also_ send a message to Red saying "Hey, I know how to reach Blue's subnet, I
can indirectly route packets there too," and to Blue saying the same thing
about Red's subnet.

Now if Red and Blue get a cable connecting the two of them, they'll start
speaking BGP to each other. Green will continue advertising the indirect path,
but Red's routers and Blue's routers will see that they have a more direct
path to each other, and not go through green. So while network engineers need
to tell their routers about their direct connections, BGP will automatically
help distant ISPs figure out the best indirect paths.

It looks like what happened is that Google, which is _not_ an ISP but peers
directly with a bunch of ISPs (so they get better performance than being
behind a single ISP, as most smaller companies do), started advertising routes
that looked more efficient than the actual routes between various ISPs.
Technically, those packets _can_ flow over Google, although Google doesn't
have the capacity to route traffic on behalf of the internet at large (it's
not an ISP), and probably has routing rules configured to not actually accept
packets that aren't destined for Google sites.

So two things happened. The first is that internet connectivity was disrupted.
The second is that we got to see, a bit, what Google's peering relationships
look like, because of the routes that Google advertised.

~~~
exikyut
Thanks for that explanation. I'm not 100% on BGP yet, although I do want to
learn more about it.

It sounds like Google's SDN (software-designed networking) stack had a glitch
then. I read an overview of that a while ago, I think I got it from HN.
[https://www.nextplatform.com/2017/07/17/google-wants-
rewire-...](https://www.nextplatform.com/2017/07/17/google-wants-rewire-
internet/)

On a side note, I'm aware that BGP and similar "super-large-scale"
infrastructure can be examined using publicly-accessible resources, but the
sites out there are mostly geared toward people doing lookups for specific
info, not people who want to learn. That's completely understandable, but as
someone who learns more easily if something is tangible and "tinkerable", it's
difficult for me to look at these sites and make the effort to [get oriented
and then] piece everything together. How can I go "oooh, I get it" looking at
the data from these sites?

I'm also vaguely aware you can take the TCP/IP stacks on Linux (and probably
Windows) completely to bits and put everything back together so the OS speaks
BGP over Ethernet (or something equivalent) instead. It would be kind of cool
if I could do that and then actually _use_ the connection, with BGP setup as a
replacement for standard IP addresses... if I could do that I'd actually leave
everything configured that way, and thus be able to tinker with it. Sure,
people don't generally connect their systems together via BGP - but BGP is
used to transfer data (in the sense of "link-layer protocol"), right? I'm
going to learn _something_.

~~~
geofft
BGP runs over TCP/IP, as a normal TCP service (port 179): it just tells you
how to route packets. On 99% of machines you'll run into (your laptop, your
servers at work, cloud VMs, etc.), they have exactly one network connection to
one network provider, and so the routes are static, or at best provided by
DHCP. For instance, at home my laptop knows that everything in 192.168.0.* can
go directly over wifi, and everything else can be sent to 192.168.0.1. My
router knows that everything should be sent to the ISP. And so forth.

BGP is for network devices that have multiple connections to the Internet and
need to decide which connection to use. They'll set up static IP routing to
their immediate neighbors, make that TCP connection between their BGP daemons,
and then let their BGP daemons configure the rest of the routing table.

Most of the time these devices are backbone routers run by ISPs, but you can
totally run BGP on your random Linux server. You'll need someone to peer with,
though. I've used Quagga
[http://www.nongnu.org/quagga/](http://www.nongnu.org/quagga/) (although with
OSPF, another routing protocol optimized for smaller-scale use) for
implementing two separate web servers sharing the same IP address - that is,
two servers claiming "Hey, I know how to reach this address, you can route
through me," such that the service would stay up if either web server died -
and it was mostly just `apt-get install quagga`, a small bit of config, and
lots of help from the networking team doing some config on our intranet's
routers to peer with my two servers and trust them.

~~~
atmosx
Sounds a lot like what FreeBSD CARP[1] suppose to do.

[1]
[https://www.freebsd.org/doc/handbook/carp.html](https://www.freebsd.org/doc/handbook/carp.html)

~~~
gerdesj
Not quite.

CARP (VRRP in Ciscoese) has two (or more) systems capable of presenting the
same IP address and they agree amongst themselves which gets to be the master
through a multicast based advertizing system. Generally each system will be
near clones of each other and be capable of doing the full job themselves. I
have a pair of rather large pfSense systems running CARP as my office routers.
I can update the secondary, test it, fail over the CARP addresses and fifty
IPSEC tunnels, five internet connections, 10 OpenVPN servers along with rather
a lot of clients, five OpenVPN clients, 30 odd internal VLANs and all the
states for the above along with many NAT inbound and outbound sessions will
seamlessly switch over thanks to PFSYNC. Even voice calls don't drop - you
might notice a sub second wobble if someone is talking at the wrong moment.
Rinse/repeat for the other one.

BGP, RIPx, OSPF and co. are routing protocols that deal with the route your
connections take through a network.

There are also things like LACP/LAG for layer two redundancy. You can even
play games with DNS round robin and of course reverse proxies.

CARP is one tool in the box for coping with failures and maintenance - there
are lots of others, each with advantages and disadvantages. That's what makes
the job interesting 8)

------
thomas_howland
Every time I see basic internet infrastructure experiencing issues, my default
is to think about who is testing a new cyberweapon or censorship mechanism.
BGP has been known as trust-reliant for a while.

~~~
simondedalus
(: these aren't the droids you're looking for. taking down the internet (or
specific pockets of it) via BGP is not a question of method, it's a question
of access. BGP hijacking can be easily done by computer science grad
students... state actors would not need to test a "cyberweapon" that messes
with BGP routes.

edit: the point being, _how_ to do it is not an issue. they're not in the
position to do it. ...but if you were thinking of censorship or outright
disruptive terrorism via BGP, you'd be looking for infiltrating network
operator jobs, not developing an attack. the attacks themselves are trivial,
well-documented, and often happen accidentally.

edit2: "sorry we broke the internet"
[http://seclists.org/nanog/1997/Apr/444](http://seclists.org/nanog/1997/Apr/444)

~~~
notyourday
Internet Truth #33, as told by Mr. V: "If the interstate highway system ran
the way we run internet, then in the middle of the rush hour a giant sinkhole
would open in the middle of i495, swallow twenty thousand cars without a
trace, a sinkhole would close and we would still call it a good day"

------
blinkingled
> necessity to have filters on both sides of an EBGP session. In this case it
> appears Verizon had little or no filters, and accepted most if not all BGP
> announcements from Google which lead to widespread service disruptions. At
> the minimum Verizon should probably have a maximum-prefix limit on their
> side and perhaps some as-path filters which would have prevented the wide
> spread impact.

Wow that's just stupid on Verizon's part given the magnitude of potential
impacts.

~~~
akvadrako
It's not that stupid; it's an intentional tradeoff. It's also quite common,
especially with a customer like Google.

~~~
blinkingled
Everybody (including Google as we find) makes mistakes and not having
safeguards doesn't sound like a good trade-off to me. (What are they trading
off here again?)

~~~
rnxrx
Operational complexity. For a carrier the size of Verizon to build an
authoritative list of prefixes to advertise they would have to have a
registration process for every single customer, peer, etc that was both
reliable and up to the minute. That list would have to be compiled and driven
on a near constant basis to many tens of thousands of routers and potentially
hundreds of thousands of BGP sessions.

It's not that this isn't doable, but it touches far more process in far more
places than might be obvious and potentially requires additional
capacity/capability on a lot of hardware. For a giant network this all means a
great deal of time and money. Should they do it? Absolutely. Is it cheap/easy?
Absolutely not. More cynically, does the potential cost and disruption of
doing the right thing map favorably against their view of the risk of NOT
doing it? Unknown.

~~~
jlgaddis
Fortunately, this problem was solved ages ago by central routing registries. I
register my prefixes and have a single object that upstreams and peers can use
to build their filters for BGP sessions with me. Why can't Google do the same?
("But they're soooo big" is not an acceptable answer.)

~~~
rnxrx
It was "solved" in some general sense but was never implemented widely or
consistently. It was traditionally a really clunky and fragile system (e-mail
forms) that ended up being really limited in its utility because it wasn't
widely used.

That said, the point isn't that this is a technically difficult problem (it
isn't) but rather one of human scale: altering the behavior of tens of
thousands of more-or-less autonomous networks spanning the globe is non-
trivial.

------
fundabulousrIII
When I was doing routing and switching Cisco was still undisputed king and I
never had to do anything more than IGP routing. Still, at the time, using two
protocol stacks (IPX/SPX and TCP/IP) and redistribution into IS-IS using NLSP
and RIP was painful. Still rather do that then deal with BGP.

