
BGP 768K day, and whether it will cause internet outages - fraqed
https://blog.thousandeyes.com/what-is-768k-day/
======
tyingq
Here's the growth over time: [http://www.cidr-report.org/cgi-
bin/plota?file=%2fvar%2fdata%...](http://www.cidr-report.org/cgi-
bin/plota?file=%2fvar%2fdata%2fbgp%2fas2.0%2fbgp-
active%2etxt&descr=Active%20BGP%20entries%20%28FIB%29&ylabel=Active%20BGP%20entries%20%28FIB%29&with=step)

And here's the past week. I suspect the big dip is where things actually
broke...a little higher than 768k: [http://www.cidr-report.org/cgi-
bin/plota?file=%2Fvar%2Fdata%...](http://www.cidr-report.org/cgi-
bin/plota?file=%2Fvar%2Fdata%2Fbgp%2Fas2.0%2Fbgp-
active.txt&descr=Active+BGP+entries+%28FIB%29&ylabel=Active+BGP+entries+%28FIB%29&range=Week&StartDate=&EndDate=&yrange=Auto&ymin=&ymax=&Width=1&Height=1&with=Step&color=auto&logscale=linear)

I believe there's another hardcoded hurdle at 1M IPV4 routes with some
existing routers, like the ASR1001.

Guess IPV6 adoption isn't slowing IPV4 growth much.

~~~
zamadatix
Right now IPv6 adoption is only supplementing v4 deployments. Even in the rare
"IPv6 only" deployment there is a 64 gateway at the internet edge and at least
some v4 is still being advertised to the world. When this becomes commonplace
the v4 table will either shrink or at least stop growing and it'll pave the
way to finally being able to just turn off v4 interop completely which is when
the table will finally start to empty. This is still many years out.

~~~
telesilla
My ISP puts the local router on IPv6 by default (resolved with a phone call
fortunately), which I imagine is tunnelled through a number of IPv4 gateways
as port forwarding doesn't work. It's the first time I've seen this.

~~~
zamadatix
Port forwarding on v6 or v4? You shouldn't be port forwarding on v6 but if it
was v4 it's likely they are using CG-NAT which basically results in double NAT
which breaks such things. NAT64 would break this as well but I've only ever
seen it in corporate networks or mobile networks.

------
icedchai
I remember setting up BGP for a company, back in 1998. We had 2 T1's for about
5000 employees. There were about 50K routes total (maybe less?) How times have
changed...

------
walrus01
The number of people running sup720-3bxl or similar on the Cisco 6500/7600,
with total FIB capacity of 1 million, is still way too high. Way too many of
those things out there taking a full table. We ran into this with people who
had not adjusted the balance between RAM usage on ipv4 vs ipv6 when the global
routing table hit 512k distinct v4 routes, causing many peoples' 6500/7600s to
lock up.

You might say "okay, but the v6 table is not really big right now, so adjust
the balance to 900k v4 and 100k v6 routes". But in reality on these ancient
platforms each v6 route takes up a great deal more RAM than a v4 route.

If you have a router with 1 million FIB capacity, the time to replace it was
five years ago. If you still have one running _now_ , time to hit the panic
button and replace it urgently with something like a Juniper MX80, MX104,
MX204, etc.

------
zakk
> Many ISP and other organizations had provisioned the size of the memory for
> their router TCAMs for a limit of 512K route entries, and some older routers
> suffered memory overflows that caused their CPUs to crash.

> Engineers and network administrators scrambled to apply emergency firmware
> patches to set it to a new upper limit. In many cases, that upper limit was
> 768k entries.

Is there some technical reason for the emergency patch not to have increased
the limit to a much higher and future-proof threshold?

In 2014 it shouldn't have been hard to predict that the new 768k would have
been hit in just a few years.

~~~
mitchs
There are traded offs in the decision. The TCAMs generically are rows of
"key," "mask" and "result." A bank of TCAM can, at great power expense, answer
"which is the result from the first row where my input anded with the row's
mask equals the row's key." Some chips let you swap banks between different
functions. Internet carriers tend to need a lot of l3 route lookups.
Enterprise customers tend to want more access control list entries. From the
outside this looks like the initial firmware aimed to be a middle ground, and
carriers needed space shifted around. However, at every nice round power of
two we are going to be having some hardware hit limits in what can be done
even if the only priority is l3 routing. Often there are fixed function banks,
and even if they were all flexible, at least some ACL features are needed on
most (if not all) routers.

------
kiernanmcgowan
It feels so strange that 768k entities is large enough to break
infrastructure.

In my day to day I’ll work on tables with billions of rows with no issue.

~~~
azernik
As your latency requirements go down, feasible storage sizes also go down. In
this case the latency requirement is quite low: any time delay on lookup is
directly added to packet latency. So the better comparison would be L1 or L2
cache, rather than on-disk storage for a database.

------
omani
the amount of popups on this site wont let you just bloody read the damn
article. what were you thinking by posting that link? are you trying to troll
us?

~~~
tyingq
I'm not seeing any popups. Android/Chrome, no adblocker.

~~~
zamadatix
On desktop I got one "thanks for reading" one halfway through and that's after
ublock blocked 26 things. Their mobile site might not have the same design.

~~~
iforgotpassword
Got two popups on mobile, second one was "subscribe to our newsletter". I
didn't even start reading at that point. didn't know that site before, but
they immediately made it to my personal list of unprofessional wannabe news
websites to avoid.

