
TCP BBR congestion control comes to GCP - 0123456
https://cloudplatform.googleblog.com/2017/07/TCP-BBR-congestion-control-comes-to-GCP-your-Internet-just-got-faster.html
======
atomt
A warning if you want to try out BBR yourself:

Due to how BBR relies on pacing in the network stack make sure you do not
combine BBR with any other qdisc ("packet scheduler") than FQ. You will get
very bad performance, lots of retransmits and in general not very neighbourly
behaviour if you use it with any of the other schedulers.

This requirement is going away in Linux 4.13, but until then blindly selecting
BBR can be quite damaging.

Easiest way to ensure fq is used: set the net.core.default_qdisc sysctl
parameter to "fq" using /etc/sysctl.d/ or /etc/sysctl.conf, then reboot. Check
by running "tc qdisc show"

Source: bottom note of
[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0f8782ea14974ce992618b55f0c041ef43ed0b78)

~~~
jsolson
A related warning: your NIC driver can break the qdisc layer (or at least
mislead it).

Prior to 4.12 the VIRTIO Net driver _always_ orphans skbs when they're
transmitted (see start_xmit()). This causes the qdisc layer to believe packets
are leaving the NIC immediately until you've completely filled your Tx queue
(at which point you will now be paced at line rate, but with a queue-depth
delay between the kernel's view of when the packet hit the wire and when the
packet actually hit the wire).

After looking at the code -- even in 4.12 enabling Tx NAPI still seems to be a
module parameter.

(I'm not sure which other drivers might have the same issue -- my day job is
limited to a handful of devices, and mostly on the device side rather than the
driver side)

~~~
atomt
That is good to know. I just deployed BBR on some pilot virtio backed VMs
yesterday and I missed this.

As far as I can tell the Actual Hardware I'm running my other BBR pilots on
are doing the right thing.

File under: BBR - still a gotcha or two ;-)

------
nh2
BBR is, in my opinion, one of the most significant improvements to networking
in recent years.

I've started using it on a couple of long-range routes (e.g. Switzerland to
Ireland, Frankfurt to Singapore) with Gigabit servers on the Internet, and it
turns unreliable ~200 Mbit/s transfer rates into reliable > 850 Mbit/s.

And all that's needed is `sudo modprobe tcp_bbr && sysctl -w
net.ipv4.tcp_congestion_control=bbr`.

Great job really!

~~~
morecoffee
Dumb question: the remote side need to enable it too, right?

~~~
manigandham
Doesnt seem like it:
[https://news.ycombinator.com/item?id=14814206](https://news.ycombinator.com/item?id=14814206)

------
nealmueller
Today’s Internet is not moving data as well as it should. TCP sends data at
lower bandwidth because the 1980s-era algorithm assumes that packet loss means
network congestion.

BBR models the network to send as fast as the available bandwidth and is 2700x
faster than previous TCPs on a 10Gb, 100ms link with 1% loss. BBR powers
google.com, youtube.com, and apps using Google Cloud Platform services.

Unlike prior TCP advancements like TCP QUIC which required a special browser,
BBR is a server-side only improvement. Meaning you may already be benefiting
from BBR without knowing it. BBR requires end users to make no improvements.
This is especially relevant in the developing world which use older mobile
platforms and have limited bandwidth.

~~~
YZF
There have been a lot of modifications to TCP since the 1980's to allow it to
push a lot more bandwidth on faster networks. Most notably perhaps window
scaling.

How does BBR avoid killing other streams that happen to share the same pipe?
It seems it would consume more than its fair share if the other TCP streams
are using older algorithms.

p.s. presumably if you get 1% loss with no congestion there's wireless/mobile
involved?

~~~
wbl
BBR uses mode switching to learn what the latency is and what its fair share
bandwidth is.

~~~
YZF
Do you know if any experimental results of sharing with the other congestion
avoidance flavors are available somewhere? Historically this requirement for
backwards compatibility has been a big problem. Maybe YouTube is getting
better but other web traffic is getting hosed?

~~~
wmf
BBR yields to CUBIC or Reno somewhat.
[https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-b...](https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-
bbr-congestion-control-02.pdf) slide 23
[https://www.ietf.org/proceedings/98/slides/slides-98-iccrg-a...](https://www.ietf.org/proceedings/98/slides/slides-98-iccrg-
an-update-on-bbr-congestion-control-00.pdf) slides 16-18

~~~
YZF
Cool. Thanks! I worked on a UDP congestion avoidance algorithm that had
bandwidth/latency feedback built into the protocol and had to deal with some
of the same issues.

------
signa11
here is the acm-queue article/paper on the same thing:

[https://cacm.acm.org/magazines/2017/2/212428-bbr-
congestion-...](https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-
based-congestion-control/fulltext)

edit-01:

some more sources of information

ietf drafts on the same topic available here:

[https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-
congest...](https://tools.ietf.org/html/draft-cardwell-iccrg-bbr-congestion-
control-00)

[https://tools.ietf.org/html/draft-cheng-iccrg-delivery-
rate-...](https://tools.ietf.org/html/draft-cheng-iccrg-delivery-rate-
estimation-00)

and a blog post giving a detailed history of various congestion control
mechanisms, and bbr as well:

[https://blog.apnic.net/2017/05/09/bbr-new-kid-tcp-
block/](https://blog.apnic.net/2017/05/09/bbr-new-kid-tcp-block/)

~~~
kyrra
Link is broken. Is it this?

[http://queue.acm.org/detail.cfm?id=3022184](http://queue.acm.org/detail.cfm?id=3022184)

or this

[https://cacm.acm.org/magazines/2017/2/212428-bbr-
congestion-...](https://cacm.acm.org/magazines/2017/2/212428-bbr-congestion-
based-congestion-control/fulltext)

~~~
signa11
thanks for the heads up, it is the latter. i have updated the info.

------
voltagex_
Looks like it's being tested inside Netflix on FreeBSD:
[https://wiki.freebsd.org/TransportProtocols/26Jan17](https://wiki.freebsd.org/TransportProtocols/26Jan17)

~~~
floatboth
Nice. Came into this thread thinking "hmm I wish someone ported this to
FreeBSD"… of course it's Netflix :D

------
eikenberry
For those interested in a simple guide on how to try it on your servers, this
is the most to the point I've found.

[https://www.admon.org/networking/update-linux-kernel-to-
enab...](https://www.admon.org/networking/update-linux-kernel-to-enable-
google-tcp_bbr/)

~~~
theandrewbailey
CentOS 7 only.

------
netheril96
As a Chinese whose international bandwidth has been abysmal, BBR has been a
godsend. The difference in speed when I turn to BBR on my shadowsocks server
is astronomical.

------
Veratyr
> When a GCP customer uses Google Cloud Load Balancing or Google Cloud CDN to
> serve and load balance traffic for their website, the content is sent to
> users' browsers using BBR. This means faster webpage downloads for users of
> your site.

This makes it sound like BRR is only available for Google-managed services on
GCP, is that correct? Can I use BRR on GCE servers (which can install the
kernel module)? Seems like an odd thing to leave out.

~~~
nealcardwell
Yes, you can use BBR inside a VM in GCE. Here is a quick-start guide if you
are interested in doing that:

[https://github.com/google/bbr/blob/master/Documentation/bbr-...](https://github.com/google/bbr/blob/master/Documentation/bbr-
quick-start.md)

------
orware
Over the weekend I setup one of the new consumer mesh products that's
available, the Linksys Velop, with 9 nodes covering a good sized area between
two homes.

One thing I've been noticing though is that there is considerable
latency/packet loss at the moment (there is only one wired backhaul at the
primary node and all of the other nodes are connected to each other
wirelessly).

I've been running Ping Plotter against to all of the nodes and there seems to
be considerable packet loss (a few percent) and spikes in latency (average for
the two closest nodes to my laptop is about 15 ms, the middle ones out a ways
are about 30-40 ms, and the furthest ones are at about 60 ms) but the spikes
can be in the hundreds or even thousands of ms.

The area covered is about a 500 ft by 120 ft rectangle more or less (with my
house on the bottom left of that rectangle and the other home on the bottom
right of that rectangle).

My question would be...would this BBR algorithm help in some way to reduce the
latency/packet loss in a situation like this? Or does it only apply for these
other situations that Google would normally be encountering/dealing with?

Thanks for the input!

~~~
mangix
BBR solves a different problem. The problem you have is wifi being terrible.
There's massive buffering in the firmware which you can't get rid of.

~~~
orware
In this case do you mean the firmware being provided by Linksys for the
hardware, or an additional layer of firmware embedded into the wireless
hardware in some way? It actually looks like for the Velop they are using some
form of OpenWRT it looks like from what I can see when I pull up the sysinfo
page in the router so it makes it seem like they would have the ability to
customize/tweak the buffering settings in some way (I took Georgia Tech's
Networking class about two years ago now, but it was pretty neat to learn
about the buffer bloat problem there in that course and how they were
mentioning bigger buffers weren't necessarily better for performance).

~~~
wtallis
The radio's firmware, not the Linux OS running on the application processor.
All 802.11ac radios have closed-source firmware even when there are open-
source Linux drivers to communicate with the NIC. The 802.11n chipsets by
Atheros didn't use proprietary firmware and exposed a fairly low level
interface to the host system. This led to the open-source ath9k Linux driver
being the platform of choice for people trying to fix WiFi in general or
improve the Linux WiFi stack in specific.

------
Animats
If you have significant page load delays with Wordpress sites, it's probably
not a TCP-level problem.

~~~
losvedir
Ha, as soon as I saw this I was hoping you were going to chime in!

May I ask if you have any thoughts on BBR? In what ways is networking
different from when you published yours that might warrant (or not!) another
congestion control algorithm?

------
kuschku
So, what would be required to use this outside of GCP? All documentation on
BBR only discusses GCP.

~~~
est
1\. install 4.10+ kernel

2\. echo "bbr" > /proc/sys/net/ipv4/tcp_congestion_control

3\. save it to sysctl.conf

4\. restart and you are done.

~~~
MertsA
> echo "bbr" > /proc/sys/net/ipv4/tcp_congestion_control

[...]

> restart and you are done.

You only need one or the other. Also, you could just reload your sysctl.conf
instead.

------
QUFB
I'd much prefer to see GCP add IPv6 support, which is sorely lacking.

~~~
wsh91
Have a look at [https://cloud.google.com/compute/docs/load-
balancing/ipv6](https://cloud.google.com/compute/docs/load-balancing/ipv6).

(Disclaimer: I work on GCP, albeit not on networking stuff.)

~~~
daurnimator
That's not helpful for a wide range of use cases. The most recent one I ran
into was running an irc server that would be compatible with the matrix irc
bridge, see [https://github.com/matrix-org/matrix-appservice-
irc/issues/2...](https://github.com/matrix-org/matrix-appservice-
irc/issues/208)

------
gafferongames
This is great. Now please solve the head of line blocking for time critical
data.

~~~
lern_too_spel
That's solved in HTTP2 and other connection multiplexing protocols.

~~~
manigandham
HTTP/2 runs on top of a single TCP connection so it's still vulnerable to TCP
ordering requirements.

TCP will probably never have a mainstream solution to this, better to switch
to UDP or QUIC instead.

~~~
morecoffee
While I agree QUIC is a better long term solution[1], saying TCP ordering
affects HTTP/2 is misleading. It is true, but it is quite easy to avoid bad
behavior by using TCP_NOTSENT_LOWAT (also created by Google), to avoid HoL
blocking. For example, SPDY had a similar problem which was ameliorated by
only sending when the water mark is low enough:

[https://insouciant.org/tech/prioritization-only-works-
when-t...](https://insouciant.org/tech/prioritization-only-works-when-theres-
pending-data-to-prioritize/)

[1]
[https://news.ycombinator.com/item?id=12282898](https://news.ycombinator.com/item?id=12282898)

------
anonymousDan
Sounds like it would be great for wireless ad hoc networks.

------
abpavel
A smack in the face of net neutrality, because the protocol hogs the bandwidth
at expense of all other traffic.

It's like putting a tank on a standard regulation road, and boasting about how
well it performs in the standard city congestion environment, because you
simply can roll over other cars and trucks and run the red lights.

Beauty of TCP Classic (i.e. Reno) is an ability to scale by fairly sharing
bandwidth among flows. When one party is aggressively trying to utilize as
much of the bandwidth as possible, it is no longer fair, and will simply force
netadmins to classify Google's protocols into more aggressive queues in
private, and supply fuel against net neutrality in public.

~~~
manigandham
That's a stretch - how do you know this? Especially since BRR has already been
live on google/youtube for a while?

This thread has some comments about it:
[https://news.ycombinator.com/item?id=14814616](https://news.ycombinator.com/item?id=14814616)

The kernel commit also has a note:
[https://github.com/torvalds/linux/commit/0f8782ea14974ce9926...](https://github.com/torvalds/linux/commit/0f8782ea14974ce992618b55f0c041ef43ed0b78#diff-f0cecf8927a4a24ebb8e8a4fe2983704)

"It can operate over LAN, WAN, cellular, wifi, or cable modem links. It can
coexist with flows that use loss-based congestion control, and can operate
with shallow buffers, deep buffers, bufferbloat, policers, or AQM schemes that
do not provide a delay signal."

