Due to how BBR relies on pacing in the network stack make sure you do not combine BBR with any other qdisc ("packet scheduler") than FQ. You will get very bad performance, lots of retransmits and in general not very neighbourly behaviour if you use it with any of the other schedulers.
This requirement is going away in Linux 4.13, but until then blindly selecting BBR can be quite damaging.
Easiest way to ensure fq is used: set the net.core.default_qdisc sysctl parameter to "fq" using /etc/sysctl.d/ or /etc/sysctl.conf, then reboot. Check by running "tc qdisc show"
A related warning: your NIC driver can break the qdisc layer (or at least mislead it).
Prior to 4.12 the VIRTIO Net driver always orphans skbs when they're transmitted (see start_xmit()). This causes the qdisc layer to believe packets are leaving the NIC immediately until you've completely filled your Tx queue (at which point you will now be paced at line rate, but with a queue-depth delay between the kernel's view of when the packet hit the wire and when the packet actually hit the wire).
After looking at the code -- even in 4.12 enabling Tx NAPI still seems to be a module parameter.
(I'm not sure which other drivers might have the same issue -- my day job is limited to a handful of devices, and mostly on the device side rather than the driver side)
Just loading the sysctl values will not switch the packet scheduler on already existing network interfaces, but it will start using BBR on new sockets.
Switching the scheduler at runtime using tc qdisc replace is possible, but then you need to take extra care if the device is multi queue or not. Instead of explaining it all here just rebooting is probably simpler.
BBR is, in my opinion, one of the most significant improvements to networking in recent years.
I've started using it on a couple of long-range routes (e.g. Switzerland to Ireland, Frankfurt to Singapore) with Gigabit servers on the Internet, and it turns unreliable ~200 Mbit/s transfer rates into reliable > 850 Mbit/s.
And all that's needed is `sudo modprobe tcp_bbr && sysctl -w net.ipv4.tcp_congestion_control=bbr`.
Today’s Internet is not moving data as well as it should. TCP sends data at lower bandwidth because the 1980s-era algorithm assumes that packet loss means network congestion.
BBR models the network to send as fast as the available bandwidth and is 2700x faster than previous TCPs on a 10Gb, 100ms link with 1% loss. BBR powers google.com, youtube.com, and apps using Google Cloud Platform services.
Unlike prior TCP advancements like TCP QUIC which required a special browser, BBR is a server-side only improvement. Meaning you may already be benefiting from BBR without knowing it. BBR requires end users to make no improvements. This is especially relevant in the developing world which use older mobile platforms and have limited bandwidth.
There have been a lot of modifications to TCP since the 1980's to allow it to push a lot more bandwidth on faster networks. Most notably perhaps window scaling.
How does BBR avoid killing other streams that happen to share the same pipe? It seems it would consume more than its fair share if the other TCP streams are using older algorithms.
p.s. presumably if you get 1% loss with no congestion there's wireless/mobile involved?
Do you know if any experimental results of sharing with the other congestion avoidance flavors are available somewhere? Historically this requirement for backwards compatibility has been a big problem. Maybe YouTube is getting better but other web traffic is getting hosed?
Cool. Thanks! I worked on a UDP congestion avoidance algorithm that had bandwidth/latency feedback built into the protocol and had to deal with some of the same issues.
As a Chinese whose international bandwidth has been abysmal, BBR has been a godsend. The difference in speed when I turn to BBR on my shadowsocks server is astronomical.
> When a GCP customer uses Google Cloud Load Balancing or Google Cloud CDN to serve and load balance traffic for their website, the content is sent to users' browsers using BBR. This means faster webpage downloads for users of your site.
This makes it sound like BRR is only available for Google-managed services on GCP, is that correct? Can I use BRR on GCE servers (which can install the kernel module)? Seems like an odd thing to leave out.
Note that, in addition to Neal's instructions, you may want to load virtio_net with napi_tx=true
This makes virtio_net play more nicely with the qdisc layer. GCE requests moderately deep Tx queues 4096 -- without the module param you can have up to a 4096 packet delay between actual and as-seen-by-qdisc Tx times.
Over the weekend I setup one of the new consumer mesh products that's available, the Linksys Velop, with 9 nodes covering a good sized area between two homes.
One thing I've been noticing though is that there is considerable latency/packet loss at the moment (there is only one wired backhaul at the primary node and all of the other nodes are connected to each other wirelessly).
I've been running Ping Plotter against to all of the nodes and there seems to be considerable packet loss (a few percent) and spikes in latency (average for the two closest nodes to my laptop is about 15 ms, the middle ones out a ways are about 30-40 ms, and the furthest ones are at about 60 ms) but the spikes can be in the hundreds or even thousands of ms.
The area covered is about a 500 ft by 120 ft rectangle more or less (with my house on the bottom left of that rectangle and the other home on the bottom right of that rectangle).
My question would be...would this BBR algorithm help in some way to reduce the latency/packet loss in a situation like this? Or does it only apply for these other situations that Google would normally be encountering/dealing with?
Sounds like just bad physical-layer connectivity, nothing to do with TCP.
Most of my experience with wifi mesh is from years ago, with pure 2.4ghz stuff, back when it basically didn't work at all. How close are the nodes? Are there any long multi-hop chains? (Repeater talking to repeater talking to root. The more hops, the worse it works)
That doesn't seem like all that different of a problem. I'm certainly not an expert on BBR, but from reading the description, the design goals seem to explicitly include dealing with buffers better (by making efforts to not fill them to the max) and being less skittish about packet loss.
Specifically, the description (in the git commit) says it has a "congestion control algorithm that reacts to actual congestion, not packet loss or transient queue delay" and that it estimates the size of the queue it probably created and paces packets in order to "utilize the pipe without creating excess queue". (And that last part addresses large buffers. Even if they can't be turned off, they only fill up if you send enough data to fill them.)
Obviously it isn't magic and there is only so much any algorithm can do in the face of a cruddy physical network layer, but the traditional algorithm makes a bad situation much worse than it has to be, so there is still the potential for a newer algorithm (like BBR) to make a big improvement.
Anyway, more to the point, cruddy wifi is what a lot of people use to browse the web, so it's not surprising to me if Google tried to account for that in their design.
In this case do you mean the firmware being provided by Linksys for the hardware, or an additional layer of firmware embedded into the wireless hardware in some way? It actually looks like for the Velop they are using some form of OpenWRT it looks like from what I can see when I pull up the sysinfo page in the router so it makes it seem like they would have the ability to customize/tweak the buffering settings in some way (I took Georgia Tech's Networking class about two years ago now, but it was pretty neat to learn about the buffer bloat problem there in that course and how they were mentioning bigger buffers weren't necessarily better for performance).
The radio's firmware, not the Linux OS running on the application processor. All 802.11ac radios have closed-source firmware even when there are open-source Linux drivers to communicate with the NIC. The 802.11n chipsets by Atheros didn't use proprietary firmware and exposed a fairly low level interface to the host system. This led to the open-source ath9k Linux driver being the platform of choice for people trying to fix WiFi in general or improve the Linux WiFi stack in specific.
Ha, as soon as I saw this I was hoping you were going to chime in!
May I ask if you have any thoughts on BBR? In what ways is networking different from when you published yours that might warrant (or not!) another congestion control algorithm?
While I agree QUIC is a better long term solution[1], saying TCP ordering affects HTTP/2 is misleading. It is true, but it is quite easy to avoid bad behavior by using TCP_NOTSENT_LOWAT (also created by Google), to avoid HoL blocking. For example, SPDY had a similar problem which was ameliorated by only sending when the water mark is low enough:
Only sort of. You still have some HOL blocking problems with HTTP2 because you're still running on top of TCP. I recall this explicitly being one of the selling points of QUIC.
A smack in the face of net neutrality, because the protocol hogs the bandwidth at expense of all other traffic.
It's like putting a tank on a standard regulation road, and boasting about how well it performs in the standard city congestion environment, because you simply can roll over other cars and trucks and run the red lights.
Beauty of TCP Classic (i.e. Reno) is an ability to scale by fairly sharing bandwidth among flows. When one party is aggressively trying to utilize as much of the bandwidth as possible, it is no longer fair, and will simply force netadmins to classify Google's protocols into more aggressive queues in private, and supply fuel against net neutrality in public.
"It can operate over LAN, WAN, cellular, wifi, or cable modem links. It can coexist with flows that use loss-based congestion control, and can operate with shallow buffers, deep buffers, bufferbloat, policers, or AQM schemes that do not provide a delay signal."
Correct me if I'm wrong, but I think it would work like this:
Say client A (loss-based) and client B (BBR) are on the same congested network:
A would fill the bottleneck buffer until the buffer overflows, then back off quickly due to the high number of dropped packets. This creates a sawtooth-like pattern of gradual ramp-up and sharp falloff.
B would detect the bottleneck bandwidth and the RTT, so it knows to back off before the bottleneck buffer overflows. Then, while A is slowly ramping up again, B would detect that there's no congestion and send more traffic. B would then gradually back off as A fills the queue again, and so on.
If this is right, then BBR would co-exist well enough with connections with loss-based algorithms.
Due to how BBR relies on pacing in the network stack make sure you do not combine BBR with any other qdisc ("packet scheduler") than FQ. You will get very bad performance, lots of retransmits and in general not very neighbourly behaviour if you use it with any of the other schedulers.
This requirement is going away in Linux 4.13, but until then blindly selecting BBR can be quite damaging.
Easiest way to ensure fq is used: set the net.core.default_qdisc sysctl parameter to "fq" using /etc/sysctl.d/ or /etc/sysctl.conf, then reboot. Check by running "tc qdisc show"
Source: bottom note of https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...