Informally, the basic problem is that pure datagram networks with no backpressure, which describes the Internet, handle congestion badly. A big first-in, first-out queue at a choke point works especially badly. That's "bufferbloat".
Back in 1985, I proposed "fair queuing" (a term I coined) as a step to a solution.
Fair queuing is simply identifying "flows" (packets with the same endpoints, which may be
IP addresses or TCP/UDP ports), giving them individual queues, and servicing the queues fairly. I also proposed making TCP congestion-aware, a new idea at the time. That was enough to deal with the problems of the 1980s and 1990s. I did not forsee a future where people would be trying to run Netflix, Fortnite, and VoIP on the same cable modem connection at the same time.
The Internet works only because most of the congestion is at the edges, where the user's LAN feeds an ISP connection with less bandwidth than LAN. Most packets are lost there. If they're lost further upstream (say at the cable headend), the problems are much worse. We still can't deal with congestion in the middle of the network. Fortunately, fiber optic bulk bandwidth is cheap enough to prevent that from being the big bottleneck.
Most of the "bufferbloat" aftermarket fixes work by assuming the ISP connection has a fixed data capacity. So the user-side gear does rate-limiting, reordering, and dropping packets to handle congestion locally, to prevent the dumb FIFO queue in the ISP's edge router from building up. This can work if the ISP connection has constant outgoing bandwidth. If that varies, as on an overloaded cable segment, there's going to be trouble. And, of course, the ISP connection doesn't tell the user side nodes it's congested. So there's a lot of guessing and tweaking involved, which is why none of these fixes Just Work.
There are two levels of troublesome FIFO queue - within each host on the LAN, and at the router that connects to the ISP link. Each host has to decide what to send first. The default is FIFO, which, as noted, sucks. Then the router has to decide which packets from which local nodes to send up the ISP link first. Most ISP-provided routers are still FIFO, although some are more intelligent. A basic property of FIFO queues is that the one who sends the most wins. The nice guys who aren't blasting stuff up the pipe get squeezed out. This is why your VoIP stutters.
So there's a trend towards front-ending the ISP's router with another box to do traffic-shaping, which means reordering and dropping packets. That's what this article is about.
There are commercial "gamer routers" which do this, and firmware for various routers.
Now, with the ability to shape the traffic, the question is what to do. Basic fair queuing prevents a big stream from squeezing out a small stream. That's step one, and that's what the parent article is talking about. He's stopped Speed Test from squeezing out his pings.
But that may not be enough. If one node is frantically making large numbers of short HTTP connections, the usual case for an ad-heavy and tracker heavy web page, those may all look like separate flows to the router and get a big fraction of the bandwidth. That's no good.
Now you have to start defining policy rules, which is a huge pain.
Some of the "gamer routers" come with policy rules that know too much about specific games. Move and shoot packets get priority over texture updates. It's often enough to prioritize UDP packets over TCP until a UDP flow hits some relatively low bandwidth limit. If you can give a game low latency for the most important 5% of its traffic, it will often play well.
Arguably, each host on the local network should prioritize its own outgoing traffic, leaving the next router upstream to deal with prioritization between nodes. But that requires each node to know something about what the next router upstream is doing. There's no mechanism for this. All players are guessing what the other players are doing by observing round trip time and latency.
They don't talk to each other about this.
And that is why this area is still a mess.
If we could get fq_codel on every router, modem and switch, then the problem would be gone and we wouldn't need any of the fancier TCP congestion control algorithms (but ECN-capable TCPs would still be nice to have).
If we could get BBR on every server and client device, then that would mostly solve bufferbloat (for TCP traffic), but probably wouldn't be as good as having fq_codel throughout the network.
Having partial deployments of both is sub-optimal, but even the potential negative interactions between delay-based TCP congestion control and delay-eliminating AQM should still be better than not having either and suffering the full effects of bufferbloat.
Even on current hardware, anything that's currently handling packets with a CPU instead of fixed-function hardware should be able to add CoDel without sacrificing too much throughput. Home routers running SQM-style QoS run into performance problems not because of CoDel or fq_codel but because of the traffic shaping. When everything has AQM, you no longer need a traffic shaper on your home gateway router, just AQM, and the CPU requirements for that are vastly lower.
1) If we had sufficient backpressure in the ISPs provided router network driver, with "BQL" to manage the ringbuffers and fq_codel to do fq + aqm, we're mostly done, at least on the uplink.
The hope has been, since most home routers run linux, that 5 years after RFC8920 entered mainline linux, it would appear in ISP gear. fq_codel is now the default on most linux distros but without the backpressure from bql or running below line rate it's not effective unless further configured or shaped.
2) I wish having your own shaper was not "a trend". bql and fq_codel are lightweight compared to shaping.
3) sch_cake solves a bunch of remaining problems: A) doing both flow and host based fairness at the same time, so your host frantically issuing http requests only gets 1/hosts the bandwidth. B) uses a deficit shaper that consistently runs at a rate slightly less than the (ubiquitous) token bucket shaper ISPs use. So it can share the same setpoint but control the queue.
4) we have a lot of wifi devices now doing fq_codel by default. Very happy with the results.
Perhaps on rfc970's 40th anniversary, we can view these problems as solved!
Perhaps the right question to ask is "What is do you really need to know from the next node upstream?" Suppose you could query an upstream router for congestion info, and get back:
- In the last N0 seconds, this node received N1 packets from your IP address, dropped N1 for congestion reasons, dropped N2 for other reasons, and forwarded N3.
- Total bytes from you, N4. Total bytes forwarded, N5.
- Maximum link capacity bytes/sec, N6. Your upload limit is N7 bytes/sec.
- For forwarded packets, min/max/avg delay time N8.
- If QoS is supported (DCSP field meaningful), this info is repeated for each QoS level that does anything.
That's basically the information you need to tune a "bufferbloat" algorithm and evaluate how well it is working. The latter is important. In the real world, everybody is guessing about this. If you know, you can tune.
Some simple mechanism for that, perhaps a new ICMP message type, would be useful.
As long as the query packet contains a nonce, so you can match queries with replies, there's no real security issue. Anything which sees that packet sees the traffic anyway. One packet in, one packet out has no DDOS amplification potential. If a router doesn't understand the query, there's no reply, and the bufferbloat controller has to guess.
cat << EOF >> /etc/sysctl.conf
sudo sysctl -p
2. net.ipv4.tcp_congestion_control=bbr applies to IPv6 too.
3. You must set the queuing discipline to fq or it won't work.
net.core.default_qdisc = fq_model
net.ipv4.tcp_congestion_control = cubic
BBR will run well with fq_codel only after linux-4.13 (not yet released)
Before linux-4.13, BBR _needs_ fq packet scheduler.
fq is not fq_codel.
The way these congestion algorithms work, you've got to let the algorithm limit your bandwidth to ~95% of what's actually available. If I always got exactly 110mbps, I'd be okay with that. The problem is that I've got to limit it to 80 mbps or so for it to be useful during the day. And there's no way I'm going to do that and lose 40-50 mbps of my bandwidth every night.
Widely varying connection speeds are common enough that I'm surprised many people have found these algorithms useful. I wish there was one intelligent enough to respond dynamically to however much bandwidth is available.
I just now experimented with creating only the outgoing queue on my external interface. That's the slow direction. I went from a D in bufferbloat to an A, with just that one line addition.
Can you live with reducing just your upload speed? I sure can, I rarely upload much. But even if I were doing large amounts of cloud backup, it might not matter. If I go from 5000K to 4500K, is that really such a loss? Am I going to cry over 10%?
And if there is congestion such that my actual upload speed falls below 4500K, it's possible, maybe even likely, that I'm no worse off than before I created my upload queue.
Unfortunately the DSL Reports speedtest only tests one direction at a time. So maybe my fix doesn't work well if there is significant bidirectional simultaneous traffic?
Fortunately I'm running OpenBSD on my firewall, so it was very easy to experiment with this.
If so, explore your cable modem and see if it's possible to manually configure the link speed to 100 Mbps (leave duplex set to auto-negotiation, though). This should cause your link to always auto-negotiate to 100/full and, at that point, bufferbloat (in that direction, at least) should (in theory) be much less of an issue (or potentially even non-existent!).
Unfortunately, some cable modems have very few configuration options that can be controlled or modified by the end user (they download their configurstions via TFTP at startup, as you probably know) -- especially when it comes to ISP-provided (or, worse, "ISP-customized") cable modems.
Anyone using Ubiquiti Edge X routers can apply this too (Linux under the hood) in almost the exact same fashion. Works great even on slow DSL links.
With 2 lines added to conf, I managed to go from C, D grades to A almost instantly.
Consumers shop for routers and ISPs based on things besides QoS.
It supports Realtek USB adapters and I remember how easy it was to enable traffic shaping with pf and always have lag free ssh-consoles!
So for many consumers, it doesn't seems buffer bloat will be fixed anytime soon.
One thing we could do is to educate the customer and hope Speedtest, Fast.com from Netflix includes buffer bloat as listing like the one in DSLReprot.
I'm going to experiment a bit with my openbsd firewall.