
Do we really need a new BGP? - okket
https://blog.apnic.net/2018/01/16/really-need-new-bgp/
======
jgowdy
I have to disagree with the primary point being offered up in this post. In my
career I’ve met quite a few network engineers who don’t think sufficiently
outside the box. They typically say “this is how it’s done, this is how it
works.” The idea that we can’t come up with a better protocol than BGP to
manage the interconnections of the Internet is absurd. We don’t say “well
security isn’t a programming problem, it’s a human problem, so there’s no
point in making safer programming languages.” BGP is a ridiculous system of
complete trust just like many older protocols.

If we sat down and started designing a routing protocol to handle the
advertisement of routing tables on the internet, would we come up with BGP
again? Would the new protocol potentially offer up fixes for some of the most
common and obvious problems with BGP? Then there’s potential for improvement.

The network industry resists and slows the horribly needed disruption in
several key areas through the dogmatic defense of the status quo that we
typically hear from its constituents.

This is why so many of us pray for SDN. It’s the only way to wrest control of
networking away from those who typically have dogmatic vendor based training
rather than a “network science and theory” education that leads to innovation.

~~~
ra1n85
>This is why so many of us pray for SDN. It’s the only way to wrest control of
networking

Why not take the standards approach and draft a protocol spec? At some point,
you'll need to interoperate with those horribly dogmatic vendor hugging
networks that serve the majority of traffic on the internet.

>who typically have dogmatic vendor based training rather than a “network
science and theory” education that leads to innovation.

I've seen this as well, and it can be frustrating. Part of me believes this is
a defense mechanism used to prevent developers from pushing problems into the
network. The other part believes that network engineering has drawn
individuals to the field based on earnings potential rather than interest in
the work, and the talent just isn't there. Both are anecdotal, so grain of
salt and all.

~~~
jgowdy
> At some point, you'll need to interoperate with those horribly dogmatic
> vendor hugging networks that serve the majority of traffic on the internet.

And this has been their defense all along. It serves the status quo to imply
that transitions must be difficult. IPv6 being the most obvious example today.
Of course as DJB pointed out, backwards compatible IPv6 would have already
rolled out by now, at the cost of a single /64\. Dual stack was literally the
most difficult way to accomplish IPv6.

Now let’s look at the BGP situation. As long as the outcome is the same (same
routing tables result in nominal situations), any two ISPs could choose to use
a different protocol for their interconnection. Same thing that has happened
with a lot of supplemental networking protocols.

There are BGP alternatives on deck. What we don’t need is high profile network
engineers speaking out against them and in favor of the status quo. The truth
is, networking is the one part of the industry that isn’t improving at the
same pace as the rest. And all the improvements that are happening seem to be
related to single links and switching, whereas the defense of networking dogma
is related to routing. According to the holy bible of networking, switching is
easy, routing is hard. Routers must be memory restricted with small memory
footprints to artificially create SKUs. Routers must operate on multi-minute
timers rather than seconds. Building routing tables is an insanely intense and
time consuming process. Blah blah blah.

The reality is, almost all of these problems are contrived to maintain the
status quo and so network vendors can ignore the pace of hardware and instead
set SKUs at resource levels that force you to use the “appropriate” piece of
hardware for the appropriate task.

Another smell that should make this obvious to everyone is that almost every
major company has had high level engineers make some statement of how network
companies are ripping you off, and explaining how they bought white box
network solutions that route layer 3 at wire speed, converge rapidly, and have
tons of memory. Not to mention the vast improvement something like well
designed SDN can provide.

The network companies rely on the fact that most companies don’t have the
labor or expertise to use white label solutions. But that doesn’t disprove the
point. That proves the point that the state of hardware is “super cheap
components provide all the switching and routing performance and memory needed
for almost 99% of networking purposes.” Companies like Cisco are playing the
Intel +25Mhz game. Supposedly disrupting companies do so at an awfully
measured pace. And the network staff generally will open up to off brand
switches but start to shy away at routing, especially edge routing.

As an industry, networking needs to be disrupted and it needs a severe model
change. People involved in networking need to get on board, and those who
don’t and insist on fighting for the status quo need to be left behind.

~~~
whatupmd
> Routers must operate on multi-minute timers rather than seconds.

s/operate/interoperate/g

> The network companies rely on the fact that most companies don’t have the
> labor or expertise to use white label solutions.

Or, white-box solutions are to complex for the enterprise to operate.

> And the network staff generally will open up to off brand switches but start
> to shy away at routing, especially edge routing.

That's because the routing usually doesn't work!

> As an industry, networking needs to be disrupted and it needs a severe model
> change.

[https://www.youtube.com/watch?v=oNXwxl2Q1tQ](https://www.youtube.com/watch?v=oNXwxl2Q1tQ)

------
smcleod
Slightly aside of BGP on an internet scale - We’ve recently switched to using
BGP internally within our network for kubernetes + load balancer (nginx)
routing, (announcement / discovery) amongst other things and I can tell you -
it’s one of the best moves we’ve ever made. The simplicity, performance and
reliability is brilliant.

[https://www.projectcalico.org](https://www.projectcalico.org)

~~~
merb
OT: is this really "simple" to setup or do you mean that once it's running
that it's simple? or what do you mean with simplicity. I actually looking
forward to use k8s in our network, too. But I think the simplest solution for
HA k8s is using keepalived paired with nginx/haproxy on the master level and
add a keepalived-vip for every service.

(at least for our 3 node master, 3 node worker setup)

(I evaluated the calico + bgp option, too but it looked way to complex, as a
k8s starter)

~~~
mrmondo
Yes, it was very simple to setup, deploy and understand on an ongoing process,
BGP is incredibly simple by design.

I’m heading to bed now (AEDST), but check out:

\- [https://docs.projectcalico.org/latest/getting-
started/kubern...](https://docs.projectcalico.org/latest/getting-
started/kubernetes/tutorials/simple-policy)

\- [https://kubernetes.io/docs/tasks/administer-
cluster/calico-n...](https://kubernetes.io/docs/tasks/administer-
cluster/calico-network-policy/)

And there’s plenty of other resources out there. IMO if you’re designing a
highly available platform, BGP is one of the simplest parts of that complex
system.

In my experience you’re likely to run into much more finicky problems with
higher layer systems including failover delays, reliance on casting systems
and state drift.

------
tialaramex
Technology doesn't magically fix human problems, but good uses of technology
can play to human strengths so that humans make fewer mistakes, or the
mistakes have fewer bad consequences. BGP can and should be implemented so
that the human operators are making the fewest possible mistakes and those
mistakes have the least bad consequences. I doubt that protocol changes are
what is needed, but that doesn't mean sitting on our hands is the right
choice.

~~~
csours
As I understand BGP, any updates published by any AS may arbitrarily re-
route/disrupt traffic, ie it is not sufficient for me to update my systems,
everyone must update all systems for the overall routing infrastructure to be
protected. That sounds a lot like a protocol change to me.

~~~
lima
> any updates published by any AS may arbitrarily re-route/disrupt traffic

Fortunately, most ISPs are doing strict filtering on announcements on customer
ASes. It's when mid-to-large ISPs fuck up (or neglect the filtering part) that
it becomes a problem.

------
ninegunpi
Yes we do, there are problems which beg to be fixed. Most of current
complaints are about security, in one way or another require plenty of CPU
horsepower (which most routers don't have - CPU peaks while suddenly switching
upling and retriving fullview is still a typical picture in a modern ISP).
What it takes to prevent something like NLRI spoofing? Plenty of layers of
data security.

And it still won't fix the human problem underneath it.

Security-aware BGP versions like S-BGP, soBGP and psBGP address most of
security concerns, add plenty of router load and still... don't solve some of
the human problems.

But current publicly made suggestions on replacing it with something
completely different even lack proper understanding of design goals of
original BGP standard.

~~~
linsomniac
Many years ago I ran a small hosting ISP, ~10 cabinets. One of the choices I
made and still stand by was I used Linux-based routers.

We could throw tons of CPU horsepower at it and the fast path and the slow
path were the same thing. We had deep Linux expertise, but only passing Cisco
expertise, so staying with the Linux networking stack required less
specialized knowledge.

But, while most people were worrying about the size of the BGP tables, or
trimming the announcements they were receiving, we could get full feeds and
never had memory problems. We could run strong filtering, to make sure we
weren't sending or receiving any junk like bogons or our users weren't sending
with spoofed addresses, without worrying about adding one rule running past
the ASIC memory and causing everything to go slow path. We had high
availability. All for around $2500 in hardware.

It always seemed crazy to me that they would put such small CPUs in the high
end routers. I know they tried to have that CPU do almost nothing, and the
ASICs do all the heavy lifting. But for our needs, having an insanely fast CPU
and dumb interfaces was perfectly adequate, and let us do full BGP feeds on a
$1200 router.

The only real trick we did was we used one network interface and VLANs, and
had the multiple network connections terminated at the switch. We were able to
get feeds via fiber that we could just terminate in an SPDIF module in the
switch. We had 4 core CPUs, so this allowed us plenty of horsepower to manage
packet storms or DDoS instead of livelocking the kernel with too many
interrupts (it took a while for the kernel to switch from interrupts to
polling, and if packets ramped up too quickly the kernel couldn't switch if
you didn't have more CPUs than interfaces).

~~~
hhw
I run a not too small hosting ISP today, and ten years ago I used routers
based on commodity hardware running OpenBSD. Back when we were at Gbps scale
and most customers were on 100Mb links, this worked fine. However, it did not
scale to 10Gbps+, let alone the 100Gbps+ of capacity we're at today.

Beside the obvious case of performance not keeping up in aggregate,
performance for just customers being on 1Gbps links was not as good. We found
individual TCP session throughput routed through commodity hardware was
noticeably lower than using a layer 3 ASICS based switch despite various
attempts at tweaking various kernel sysctl variables.

Also, because nobody operates commodity based hardware routers at carrier
scale, you cannot trust any open source BGP daemon's implementation of BGP
confederations if they even have it at all. At that, I wouldn't really trust
any vendor other than Cisco or Juniper for carrier scale routing.

So even if someone were to create a popular, open source routing project
making use of DPDK to increase performance scalability on commodity hardware,
I'd still stick to Cisco or Juniper solutions like CSR 1000V or vMX for the
foreseeable future if I really wanted to use commodity hardware, as the full
feature set is going to be proven and mature. I'd want to see any new product
stand up well for 5+ years at carrier scale production use before I'd give it
any serious consideration.

Even then, I'm not sure how well DPDK will stand up to UDP reflection
volumetric type attacks which now dominate DDOS. The top packets/s levels
achieved are barely sufficient for simple routing, and would likely drop
precipitously with a modest ACL in place, in comparison to ASICS based routers
being able to still do full line rate with ACL's that have a large number
(hundreds) of terms.

~~~
linsomniac
So what is our option for securing BGP-like functionality? The thing that
triggered me to tell this story in reply was the part about how most routers
don't have the CPU to do the security necessary for more enhanced security.

At the time I was deploying Pentium D in the multi-GHz range routers, the hot-
shot Cisco routers were running 100MHz MIPS CPUs, IIRC.

I know they put a ton of emphasis on the ASICs, but seems like they maybe need
to splurge a bit on the CPUs. :-) That being said, I haven't really been doing
much networking these days, I now just work for a single org, we let the
facility do the heavy routing.

~~~
hhw
Not sure how far back you're going, or which model of Cisco router you're
referring to. I first cut my teeth on networking working at a Tier 2 carrier
back in 2005 on Cisco GSR's (12000 series) which use a 200MHz R5000 MIPS CPU,
but they were already quite long in the tooth at the time and were one of the
few remaining networks still running them. And I do recall implementing ACL's
to be an issue. Not sure if you were referring to ACL's or BGP policies
themselves when referring to securing BGP.

These days, a full BGP capable router starts with the ASR1001 for Cisco which
starts with a 32bit 1.5GHz CPU + 4GB RAM on the RP1 and goes up to a 64bit
quad-core 2.2GHz CPU + 8-64GB on the RP3. Juniper land, the lowest end non-EOS
router would be the MX80 which comes with a 1.33GHz PPC CPU and 2GB RAM, but
unofficially Juniper will always steer you towards the MX104 with a 1.8GHz CPU
and 4GB of RAM. I can't comment on the ASR's, but we have MX80's and we can do
line rate ACL's with hundreds of terms in ASICs without issue. BGP policies
themselves are still handled by the CPU though, and are indeed quite slow on
the MX80's with full convergence taking up to 20 minutes or so. We've mostly
relegated them to core switching duty at this point though, and are waiting
for the new MX204's (800Gbps in 1U) to become mature enough before replacing
them.

Moving up a bit, we use MX480's as well which we currently have routing
engines with 2GHz Intel CPU's and 4GB of RAM, which I believe these are
already EOS, but not enough of an issue for us to buy upgraded routing engines
for this platform. These do full table re-convergence in about a minute or
two. I believe quad-core 1.8GHz is standard now though, with up to hex-core
2GHz available. I don't think CPU's are really much of an issue anymore,
although obviously still not as fast as what you can find easily enough in
commodity hardware. The ASICs handle most things flawlessly though.

~~~
PatchMonkey
Well... As someone outside of that arm of the industry, I have to wonder about
what exposure there is to spectre, or what kind of patches are coming out for
affected machines.

~~~
hhw
JunOS is based on FreeBSD, which doesn't have a fix yet. Not sure about the
different variants of Cisco IOS. You wouldn't run untrusted code on a router
though, so spectre would not be a concern. In fact, it's probably better not
to patch for it given the performance degradation.

------
chisleu
If we don't replace BGP, people with BGP access are going to keep
"accidentally" redirecting compute resources from AWS and other data centers
to themselves so they can man in the middle crypto mining, and steal millions.

------
arca_vorago
I don't think we need a new bgp, we just need to fix the old bgp. I have yet
to see a practical and realistic alternative. Also, don't forget MPLS is very
interdependent in it's current state on BGP, and MPLS is really awesome.

Yes, BGP has some weaknesses, but perhaps calling for a new bgp is throwing
the baby out with the bathwater.

You know what I think is more of a problem is vendor market dominance that
uses closed source proprietary boxen on proprietary platforms designed for
vendor lock in.

Also, some have referenced the power of asics in these black boxen over cots
hardware, but these days thats not true because there are open source systems
with asics in them.

Not exactly related to BGP, but I did an entire infrastructure upgrade for a
company. I talked to every vendor in existence, extreme, juniper, cisco,
brocade, etc etc. I ended up going with Ubiquiti and saving over $50k... These
vendors are gouging customers who already have problems with IT infrastructure
budgets, and I'm tired of dealing with them. (and their often non-spec
implimentations of protocls)

On top of that, you would be amazed at how many "network engineers" don't know
anything but cisco, and couldn't route their way out of a paper bag in
anything else, this result of the vendor lock-in is much more of a problem.

Lets also not forget to mention the NSA-cisco, etc backdooring systems.

------
aplorbust
"We've somehow forgotten the old maxim that a protocol is not done until we
have removed everything that is not needed."

Would software and web developers ever consider adopting this maxim?

As a user, I would enthusiastically support the experiment.

(For example, results might be smaller program size, and less code for
interested users and others to audit.)

~~~
rdtsc
A bit of topic obviously here. But that's how overall I feel about Apple's UI
experience or product design in general. The UI interaction if you think about
is also a protocol of sorts - between user and the device.

Say user scans the the preferences window, finds the checkboxes they were
looking for , picks one, clicks on it, it reacts back and is marked as such
and so on. Now try to reduce the number of needed actions and decisions needed
by the user there. Maybe they don't have to click "Apply" button when done. Or
maybe there are only 2 checkboxes. Or they are not even in the preference
menus at all.

------
maltalex
The relevance of "you cannot solve people problems with technology" point is
debatable. Sure, some problems you can't solve, but a properly designed system
can definitely reduce the risk of human error - we've seen this in fields such
as aviation and medicine (to some extent).

But what isn't debatable is the fact that BGP is a security nightmare, and
there's really no reason for internet routing to be so vulnerable. It's
amazingly simple to divert traffic using BGP, and it has been so for years -
[https://www.youtube.com/watch?v=S0BM6aB90n8](https://www.youtube.com/watch?v=S0BM6aB90n8).

------
bogomipz
The author states:

>"From time to time, I run across (yet another) article about why Border
Gateway Protocol (BGP) is so bad, and how it needs to be replaced. This one,
for instance, is a recent example."

Yet if you read that linked article this author references in the last
sentence, that article does not suggest replacing BGP at all. That referenced
post even states:

>"How do we fix this? Well, aside from making sure that anyone touching BGP
knows exactly what they’re doing? Not much."

And then the referenced article goes on to mention RPKI and the MANRS
frameworks which are not routing protocols nor are they meant to replace BGP.

------
zero_intp
Operator and designer mistakes happen, the goals should be to make simple
systems easy to build. Complex networks will need thoughtful solutions, and
better designs will have fewer operator interventions.

I could be petty and pick apart the article for mistakes, but I agree with the
gist- If you compose your needs and environment and use the right design you
might not hate the protocol(s) you end up with?

------
aplorbust
Reminded me of this:
[https://media.ccc.de/v/34c3-9072-bgp_and_the_rule_of_custom](https://media.ccc.de/v/34c3-9072-bgp_and_the_rule_of_custom)

One of the better talks I watched from this years conference.

------
no_identd
We wouldn't need BGP at all if they hadn't ditched flow routing from IPv6
before it even became a proper draft, and if IP in general wouldn't assume we
still use thicknet everywhere.

------
andersonmvd
I wonder how it relates to SCION ([https://www.scion-
architecture.net/](https://www.scion-architecture.net/))

~~~
worxli
It states: You become immune to current BGP-level attacks, such as prefix
hijacking.

------
troligare
Well isn't this just a case of company’s hiring the wrong people...

And there should be a better commit and audit system before changing peer
routers on a ISP.

~~~
zero_intp
costs, costs, costs. As your IP service becomes commoditized fewer, not more,
engineers make bigger changes. Interestingly, expert systems reduce, not
improve, the quality of the operating engineers:
[http://journals.sagepub.com/doi/full/10.1177/000183921775169...](http://journals.sagepub.com/doi/full/10.1177/0001839217751692)

------
exabrial
Does BGP have the equivalent of DNSSEC?

~~~
madsushi
Yes, RPKI. Not that widely deployed, <20% of ASs, last time I looked.

------
leecarraher
i would think BGP is needed now more than ever considering the move to ipv6.

~~~
tialaramex
Mostly this makes no sense because the things BGP cares about are orthogonal
to mere IPv4 or IPv6 addressing.

But if there is a difference, IPv6 means less prefixes are needed for any
given provider, since the IPv6 prefixes are huge and unfragmented, we can
usually give a provider enough address space in their initial allocation to
last them forever, and if they exceptionally out-grow it, the larger space is
available, whereas with IPv4 there's no chance anybody can give you a /16 even
if you warrant one.

~~~
Duckeh
It means less prefixes until people start announcing each /48 in their /32
separately...

~~~
jandrese
Oh yeah, nothing makes a mess of your network like someone coming in and
saying "we need to remove these IPs from the middle of our advertised blocks
for security reasons".

~~~
snuxoll
I think we're going to see less of this due to IPv6's design, renumbering is
less of a pain than it was with IPv4 so instead of splitting off a route in
the middle of your address space and bloating the BGP table you can just move
to a different subnet entirely.

