Hacker News new | past | comments | ask | show | jobs | submit login
BBR, the new kid on the TCP block (apnic.net)
360 points by pjf on May 9, 2017 | hide | past | favorite | 47 comments

It's almost irresponsible to write an article on this topic in 2017 without explicitly mentioning bufferbloat or network-scheduling algorithms like CoDel designed to address it. If you really want to understand this article, read up on those first.


CoDel is different from the packet scheduling algorithm even though both fight bufferbloat in different ways. CoDel is a congestion control algorithm for controlling what happens when outgoing buffers start overflowing. This is on a lower level than TCP and happens to any type of packet. The scheduling algorithm, like VEGAS or BBR, controls the transmit rate of only the TCP protocol.

When packets are being sent over the wires, the TCP scheduling algorithm (usually CUBIC, VEGAS, RENO, or now BBR) will send out packets until the parameters they monitor indicate the downstream device is about to overload. Then they will back off slightly to prevent packets from being lost. These TCP transmit strategies tend to either monitor packet loss rate or round trip time, sometimes both. What they do with these two parameters determines the biggest differences between the packet sending algorithms.

CODEL comes into affect when the scheduling algorithm decides it can't send out packets quickly enough without losing them, and they build up on local buffers. This can happen with TCP but also other internet protocols.

Something most people don't know is that without a scheduling algorithm like BBR,VEGAS, or RENO, you can send out packets at interface speed. In simpler protocols like UDP you need to do your own packet scheduling. Otherwise your machine will send out packets at interface speed until they are mostly dropped by the first slower link. This is why TCP has scheduling algorithms, they're all an attempt to monitoring the end to end link speed from A to B you can achieve without losing data.

Edit: BBR is a new TCP scheduling algorithm to fight buffer-bloat at the TCP level. Since the majority of internet traffic is TCP, wide adoption would cause a big improvement. TCP scheduling only affects outgoing packets, so its important to get this into Windows and Linux so we can get the full benefit of having buffer-bloat reduction on both ends. I'm looking at MS here because they're the last major OS running an aggressive and buffer-bloat causing TCP algorithm.

Your terminology seems a bit strange to me, the way I've seen these discussed:

CoDel is an AQM (Active Queue Management) algorithm - it decides what packets to drop and when to drop them as a packet queue fills. This is most relevant at the hop with a bottle neck link.

BBR and reno etc are TCP congestion control algorithms or less frequently called congestion avoidance algorithms - they manipulate the window size and attempt to avoid congestion using different takes on congestion signaling (i.e.loss based reno, delay based for vegas, model based for bbr). This is most relevant from the sender side.

Scheduling would be more advanced queuing disciplines (for QoS or flow based fair queuing) and packet pacers. Queuing disciplines can apply to any bottle neck link, pacing is most useful sender side.

CoDel is packet scheduling. It just dynamically creates "classes" for each flow to determine which ones are overfilling the buckets and selectively drops from those streams.

I try to talk about these things in a way that's easy for lay people to understand. Even among programmers QoS and packet scheduling are a niche topic

right you are! My apologies, my familiarly with CoDel is limited to the Linux implementation

> CoDel is different from the packet scheduling algorithm

I didn't say they were the same. However, these two classes of algorithms interact in such complex and significant ways that it's practically impossible to understand one without understanding the other. An article about the history of one must IMO address how it has co-evolved with the other to be useful.

They are different beasts :), and yes it's very important to understand both because they work in different ways.

Most people that know of CoDel think it's a panacea for buffer-bloat on their LAN but BBR is actually much more likely to be effective. The important distinction is that CoDel only prevents buffer-bloat on the hop where it's running. CoDel manipulates packet order in outgoing local buffers to signal the congestion control protocols at higher levels such as TCP that buffers are overflowing. The drawback of CoDel is that it only works if your router is the slowest link.

If the bottleneck link is on your modem (most are), CoDel will only help if you artificially make your router the slowest link by limiting outgoing bandwidth to slightly lower than your modem upload speed. In practice most people don't know to do this.

BBR in comparison reduces buffer-bloat along the entire path of the packet, but has its own disadvantages. First, it only works on TCP connections. Second, it helps the most if machines on both ends are using it. Each end of the TCP connection uses its own congestion control. Thirdly, it can only be applied at a machine level. If you put CoDel on your slowest link it will help all the traffic going through it, but only if the bottleneck is at that link. BBR will help prevent bufferbloat from end-to-end through all links a TCP connection flows, but only prevents TCP buffer-bloat completely on your router if every machine on the LAN is using it.

Ideally you want to use both in every place you can.

Do you have any information on Window's TCP algorithms and why they are so bad, or what kinds of problems they cause?

By default, Windows machines use the algorithm NewReno. Older algorithms like NewReno are known for causing horrible buffer-bloat in comparison to algorithms like VEGAS which is used in Linux. The reason is pretty simple. The two main metrics used for determining when to send a packet in TCP are round trip delay and packet loss.

Reno/NewReno slow their send rate mainly when they detect lost packets, whereas VEGAS and similar mainly detect round trip delay but also respond to loss. The problem with primarily loss-based strategies is that most routers won't drop packets until their buffers are full. In modern times these buffers can be very large (many seconds). TCP Reno will keep sending packets until downstream routers have full buffers and drop packets, which could be when they're already holding seconds worth of data. This is buffer-bloat. Any packets that makes it to these routers will spend seconds waiting to be put on the wire.

VEGAS on the other hand will try to maintain a constant round trip delay. It uses TCP's ACK packets to gauge how long it takes packets to go back and forth and reduces send rate when this starts to rise. This keeps router buffers empty, delay low, and packet loss near zero. BBR is a further enhancement of the VEGAS delay sensing strategy.

As mentioned in this chain, CoDel is one strategy for "fixing' loss based aggressive protocols like TCP Reno. It detects "flows" (IP/port pair combinations) that are causing the local outbound buffers to overflow and starts selectively dropping packets on them. If all internet protocols were delay based like TCP VEGAS CoDel would not be needed to keep traffic flows "fair". Without using something like CoDel on your routers to punish aggressive strategies, scheduling algorithms like NewReno will cause your routers outgoing buffers to always be full and out-compete friendlier traffic like VEGAS that cuts itself back when buffers fill.

little side node here: the windows 10 creator's update introduced experimental support for CUBIC https://www.ietf.org/proceedings/98/slides/slides-98-tcpm-tc... which is currently the default CC algorithm used in linux. However, the slides state that in the absence of AQM qdelay gets worse when CUBIC is used

Note that in most cases TCP congestion control only really matters from the sender side. Windows has a fairly nice algorithm called Compound TCP if you are sending from Windows https://en.wikipedia.org/wiki/Compound_TCP

So, to be clear: BBR only applies to the edge nodes (the client and the server) whereas CoDel and other AQM are (mostly) for the routers in between, especially bottle-neck points, right?

Yes. CoDel/AQM can only help bufferbloat if you control the bottleneck. However it's possible to artificially make your router the bottleneck if you get really steady bandwidth.

BBR only helps TCP but will help prevent bufferbloat along the whole path


One of the nice things about BBR is that it tries to avoid buffer bloat by measuring buffer bloat at the bottleneck link and avoiding contributing to it. I'm not a protocol expert, but our protocol team has done an implementation of BBR, and I've been in plenty of meetings where I've heard it described. Let me take a crack at trying to explain:

As I understand it, BBR goes through periodic bandwidth probing cycles. When it ramps up and sends at higher bandwidth, it sees if the RTT increases without a corresponding increase in bandwidth or packet loss. If so, it assumes that a queue is building at the bottleneck, and ramps down to a rate below the expected max bandwidth on the flow, thereby draining the queue. When the RTT has bottomed out, it ramps back up to the expected bandwidth on the flow. This keeps queues small.

The bad thing about BBR is that it is much more expensive than other commonly used TCP congestion control algorithms due to the periodic ramp up/down. It also does not lend itself to hardware packet pacing either. Early versions of our implementation are considerably more expensive than the default congestion control. Eg, a server that is ~50% idle at 90Gb/s will be CPU maxed at less than 90Gb/s with BBR. But this is improving daily.

It does mention buffer bloat:

> Larger buffers can lead Reno into a “buffer bloat” state, where the buffer never quite drains. The residual queue occupancy time of this constant lower bound of the queue size is a delay overhead imposed on the entire session.

Yes, it mentions bufferbloat once, but IMO the subject requires far more than that. I would suggest at least introducing the concept of active queue management, maybe set some of the developments in that closely related field - e.g. RED, SFB, CoDel - on the same timeline with Reno, Vegas, and BIC. Otherwise it's like trying to teach someone driving without mentioning stop signs and traffic lights.

First saw it at the morning paper: https://blog.acolyer.org/2017/03/31/bbr-congestion-based-con...

This is the story of how members of Google’s make-tcp-fast project developed and deployed a new congestion control algorithm for TCP called BBR (for Bandwidth Bottleneck and Round-trip propagation time), leading to 2-25x throughput improvement over the previous loss-based congestion control CUBIC algorithm.

Network performance across national borders within China has been abysmal since the censorship got much more serious. BBR seems promising, so more and more people (that includes me) who bypass GFW with their own VPS has been deploying BBR, and seen marvelous results.

Any data on BBR vs Reno and Vegas sharing?

Link capacity estimation is easy. It's the co-existing gracefully with all other flow control options that's tricky.

There's a 'Sharing' section near the end where two scenarios are compared. Doesn't look like an exhaustive test, rather the opposite.

What about BBR sharing with BBR? Really, what is the reason this shouldn't be rolled out as the default everywhere?

Sure, I read the article. That section covers just CUBIC.

It also doesn't discuss competition between BBR users. Playing nicely with your neighbors is important for the health of the Internet.

Several BBR flows do actually converge quite nicely to a fair share of bandwidth. Take a look at the presentation the guys from google gave at the IETF 97 in Seoul: https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-b...

I think there is also a recording of that session somewhere on youtube.

Also, the complete paper can be downloaded for free at http://queue.acm.org/detail.cfm?id=3022184

It doesn't discuss competition with either many users, with things like webrtc, or with those evil dial-a-speed schemes.

The internet is complex.

However, since the dial-a-speed evildoers haven't caused much controversy in the grand scheme of things, I don't think BPR will do real harm in that case.

I'm more concerned about unloaded parts of a circuit. Suppose some BPR streams fill one hop, and share other, higher-capacity, hops with Reno. Does Reno do as well then as it would if the BPR streams were Reno?

There is also no mention of how it handles things like wireless links with high jitter and intermittant not related to flow rate packet loss.

Well it does mention that BBR uses "The RTT is the minimum of all RTT measurements over some time window that is described as “tens of seconds to minutes”." This is a much larger measuring window than previous RTT tracking TCP. At 10s of seconds it is also hopefully enough to tackle the most common forms of jitter and intermittent flows.

At truly bad flow characteristics you need to start some truly weird line coding methods with stream data split over multiple packets with redundant coding to rebuild dropped packets etc, but this kind of thing just kills latency.

Not be confused with BBR enhancing the Mazda MX-5: https://www.pistonheads.com/news/ph-japanesecars/mazda-mx-5-...

Also significantly reduces latency and increases throughput :-)

It even has a flow-control mechanism (wastegate).

This article is not only a great intro to BBR, but an excellent introduction the history of flow control.

Congrats to Geoff and his team.

How can we use it today! Is it in Linux code already and easy to enable ?

It's available in Ubuntu 17.04. Add these lines to /etc/sysctl.conf:

I got 400Mbps (on a 1Gbps link) from Seattle to New York with a single TCP-BBR stream, vs ~50Mbps before :-).

You could probably saturate that link with a single TCP stream -- even without BBR -- by simply choosing the correct send and receive buffer sizes in your test program. (I assume you're using iperf or a similar program to perform your tests.)

BBR and other congestion-control algorithms come into play when the underlying IP network is congested. To test them, you have to have control of the congestion rate of the underlying circuit, which you might not have if you're testing this on an ordinary public IP network.

Thank you! This is helpful also about fq vs fq_codel - https://groups.google.com/forum/m/#!topic/bbr-dev/4jL4ropdOV...

How are you measuring this?

To get this on Ubuntu 16.04 LTS, get and install the following:

  wget kernel.ubuntu.com/~kernel-ppa/mainline/v4.9.26/linux-headers-4.9.26-040926_4.9.26-040926.201705031231_all.deb kernel.ubuntu.com/~kernel-ppa/mainline/v4.9.26/linux-image-4.9.26-040926-generic_4.9.26-040926.201705031231_i386.deb kernel.ubuntu.com/~kernel-ppa/mainline/v4.9.26/linux-image-4.9.26-040926-generic_4.9.26-040926.201705031231_amd64.deb
  sudo dpkg -i linux-headers-4.9*.deb linux-image-4.9*.deb

It's in kernel 4.9

The code is already in net-next. This means you currently have to build your own kernel if you want to use it. This should be enough to get you going: https://github.com/google/bbr/blob/master/Documentation/bbr-...

> ... the startup procedure must rapidly converge to the available bandwidth irrespective of its capacity

It seems to me that you'd be able to make a rough guesstimate by noting the ip address; whether it's on the same LAN, or continent/AS.

It wouldn't matter if you got it very wrong as long as you converged quickly to a better one (as you have to do anyway)

It seems like the best way to handle this situation is to assume that all other algorithms are hostile, and to seize as much bandwidth as you can without causing queue delay. That would reduce the problem set to a basic resource competition problem, which could then be solved with genetic algorithms.

For those dying to know what it stands for: Bottleneck Bandwidth and Round-trip time

Would adding this only to the http reverse proxy machines provide most of the benefit without have to patch all servers.

This seem to have the greatest effect over wan links.

Sounds like they adapted LEDBAT delay measuring tricks.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact