Hacker News new | past | comments | ask | show | jobs | submit login
Fixing bufferbloat on your home network with OpenBSD 6.2 or newer (pauladamsmith.com)
163 points by paulsmith 8 months ago | hide | past | web | favorite | 48 comments

It's discouraging to me that, 33 years after I first identified this problem [1], there are no good solutions in wide use.

Informally, the basic problem is that pure datagram networks with no backpressure, which describes the Internet, handle congestion badly. A big first-in, first-out queue at a choke point works especially badly. That's "bufferbloat".

Back in 1985, I proposed "fair queuing" (a term I coined) as a step to a solution. Fair queuing is simply identifying "flows" (packets with the same endpoints, which may be IP addresses or TCP/UDP ports), giving them individual queues, and servicing the queues fairly. I also proposed making TCP congestion-aware, a new idea at the time.[2] That was enough to deal with the problems of the 1980s and 1990s. I did not forsee a future where people would be trying to run Netflix, Fortnite, and VoIP on the same cable modem connection at the same time.

The Internet works only because most of the congestion is at the edges, where the user's LAN feeds an ISP connection with less bandwidth than LAN. Most packets are lost there. If they're lost further upstream (say at the cable headend), the problems are much worse. We still can't deal with congestion in the middle of the network. Fortunately, fiber optic bulk bandwidth is cheap enough to prevent that from being the big bottleneck.

Most of the "bufferbloat" aftermarket fixes work by assuming the ISP connection has a fixed data capacity. So the user-side gear does rate-limiting, reordering, and dropping packets to handle congestion locally, to prevent the dumb FIFO queue in the ISP's edge router from building up. This can work if the ISP connection has constant outgoing bandwidth. If that varies, as on an overloaded cable segment, there's going to be trouble. And, of course, the ISP connection doesn't tell the user side nodes it's congested. So there's a lot of guessing and tweaking involved, which is why none of these fixes Just Work.

There are two levels of troublesome FIFO queue - within each host on the LAN, and at the router that connects to the ISP link. Each host has to decide what to send first. The default is FIFO, which, as noted, sucks. Then the router has to decide which packets from which local nodes to send up the ISP link first. Most ISP-provided routers are still FIFO, although some are more intelligent. A basic property of FIFO queues is that the one who sends the most wins. The nice guys who aren't blasting stuff up the pipe get squeezed out. This is why your VoIP stutters.

So there's a trend towards front-ending the ISP's router with another box to do traffic-shaping, which means reordering and dropping packets. That's what this article is about. There are commercial "gamer routers" which do this, and firmware for various routers.

Now, with the ability to shape the traffic, the question is what to do. Basic fair queuing prevents a big stream from squeezing out a small stream. That's step one, and that's what the parent article is talking about. He's stopped Speed Test from squeezing out his pings.

But that may not be enough. If one node is frantically making large numbers of short HTTP connections, the usual case for an ad-heavy and tracker heavy web page, those may all look like separate flows to the router and get a big fraction of the bandwidth. That's no good. Now you have to start defining policy rules, which is a huge pain.

Some of the "gamer routers" come with policy rules that know too much about specific games. Move and shoot packets get priority over texture updates. It's often enough to prioritize UDP packets over TCP until a UDP flow hits some relatively low bandwidth limit. If you can give a game low latency for the most important 5% of its traffic, it will often play well.

Arguably, each host on the local network should prioritize its own outgoing traffic, leaving the next router upstream to deal with prioritization between nodes. But that requires each node to know something about what the next router upstream is doing. There's no mechanism for this. All players are guessing what the other players are doing by observing round trip time and latency. They don't talk to each other about this.

And that is why this area is still a mess.

John Nagle

[1] https://tools.ietf.org/html/rfc970 [2] https://tools.ietf.org/html/rfc896

Would you opine on algorithms to control queues (e.g. RED, CoDel), and to respond to congestion (e.g. CUBIC, BBR- at TCP level, correct?)? Will they get anywhere? Or is a totally new approach needed?


I don't think a totally new approach is needed. We already have what we need to completely solve bufferbloat. What we're missing is universal deployment.

If we could get fq_codel on every router, modem and switch, then the problem would be gone and we wouldn't need any of the fancier TCP congestion control algorithms (but ECN-capable TCPs would still be nice to have).

If we could get BBR on every server and client device, then that would mostly solve bufferbloat (for TCP traffic), but probably wouldn't be as good as having fq_codel throughout the network.

Having partial deployments of both is sub-optimal, but even the potential negative interactions between delay-based TCP congestion control and delay-eliminating AQM should still be better than not having either and suffering the full effects of bufferbloat.

How do you handle CoDel at multigigabit transfer rates on limited hardware?

The ideal solution is to put CoDel in hardware where it can be cheap. FQ-CoDel would be better, but would require many times more silicon. It's probably worth it, but it's a harder proposition to sell to the ASIC designers.

Even on current hardware, anything that's currently handling packets with a CPU instead of fixed-function hardware should be able to add CoDel without sacrificing too much throughput. Home routers running SQM-style QoS run into performance problems not because of CoDel or fq_codel but because of the traffic shaping. When everything has AQM, you no longer need a traffic shaper on your home gateway router, just AQM, and the CPU requirements for that are vastly lower.

I'm not current on this and don't have the test setups to collect data, so I can't say much. All those feedback algorithms interacting implies not much predictability.

That RFC is older than me. I've spent the last few years working full time on TCP at a CDN. Giant, shoulders, et al. Thanks!

That RFC is like meeting at a Pink Floyd's concert. You meet three generations! And it's not done yet!

Always nice to see you, john.

1) If we had sufficient backpressure in the ISPs provided router network driver, with "BQL" to manage the ringbuffers and fq_codel to do fq + aqm, we're mostly done, at least on the uplink.

The hope has been, since most home routers run linux, that 5 years after RFC8920 entered mainline linux, it would appear in ISP gear. fq_codel is now the default on most linux distros but without the backpressure from bql or running below line rate it's not effective unless further configured or shaped.

2) I wish having your own shaper was not "a trend". bql and fq_codel are lightweight compared to shaping.

3) sch_cake solves a bunch of remaining problems: A) doing both flow and host based fairness at the same time, so your host frantically issuing http requests only gets 1/hosts the bandwidth. B) uses a deficit shaper that consistently runs at a rate slightly less than the (ubiquitous) token bucket shaper ISPs use. So it can share the same setpoint but control the queue.


4) we have a lot of wifi devices now doing fq_codel by default. Very happy with the results. https://arxiv.org/pdf/1703.00064.pdf

Perhaps on rfc970's 40th anniversary, we can view these problems as solved!

Do you happen to have an opinion on NDN [0]? I think that they have a very good plan, but their current output seems to consist mostly of graduate students pushing overly-complex convolutions rather than searching for simple solutions.

[0] https://named-data.net/

It is a very good remark to correlate John's comment about upstream/downstream information exchange and NDN. NDN can do that indeed. Have a look at this way to get NDN-like networks using IPv6 https://datatracker.ietf.org/doc/draft-muscariello-intarea-h... https://wiki.fd.io/view/cicn

NDN addresses a different issue. It's more like a design for a content-delivery network. How would that help with upward bottlenecks on asymmetrical home connections?

Perhaps the right question to ask is "What is do you really need to know from the next node upstream?" Suppose you could query an upstream router for congestion info, and get back:

- In the last N0 seconds, this node received N1 packets from your IP address, dropped N1 for congestion reasons, dropped N2 for other reasons, and forwarded N3.

- Total bytes from you, N4. Total bytes forwarded, N5.

- Maximum link capacity bytes/sec, N6. Your upload limit is N7 bytes/sec.

- For forwarded packets, min/max/avg delay time N8.

- If QoS is supported (DCSP field meaningful), this info is repeated for each QoS level that does anything.

That's basically the information you need to tune a "bufferbloat" algorithm and evaluate how well it is working. The latter is important. In the real world, everybody is guessing about this. If you know, you can tune.

Some simple mechanism for that, perhaps a new ICMP message type, would be useful. As long as the query packet contains a nonce, so you can match queries with replies, there's no real security issue. Anything which sees that packet sees the traffic anyway. One packet in, one packet out has no DDOS amplification potential. If a router doesn't understand the query, there's no reply, and the bufferbloat controller has to guess.

NDN has a request/reply model with symmetric forwarding. This means that an upstream network provides feedback to the downstream one. NDN has nothing to do with content delivery.

Fix it on Linux by enabling BBR TCP congestion control.

  cat << EOF >> /etc/sysctl.conf
  sudo sysctl -p
1. You will need a recent kernel.

2. net.ipv4.tcp_congestion_control=bbr applies to IPv6 too.

3. You must set the queuing discipline to fq or it won't work.

Running 30mbps fibre and getting marginally better results (upload especially) with these settings:

    net.core.default_qdisc = fq_model
    net.ipv4.tcp_congestion_control = cubic

CuBIC tends to produce funny shark tooth transfer rate patterns on an uneven connection especially in presence of packet loss and unexpected congestion.

Guessing you meant fq_codel instead of fq_model?

Yeah, good catch.

Doesn't this have to be done on the sending computer? Doing it on your computer would make a difference only when uploading; when downloading, the server you're downloading from would also need this setting.

300 mbit connection, ubuntu 18.04, linux-4.15.0 and I still get an F after the above tweaks.

On newer kernel is no longer required to install the "fq" qdisc to use BBR

More specifically, it appears to require this commit[1] that's in kernels 4.13 and later.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-ne...

Interesting! From a related discussion in 2017 on the bbr-dev mailing list:

  BBR will run well with fq_codel only after linux-4.13 (not yet released)

  Before linux-4.13, BBR _needs_  fq packet scheduler.

  fq is not fq_codel.
See: https://groups.google.com/forum/#!topic/bbr-dev/4jL4ropdOV8

Does this work for linux routers or just the endpoint machine?

Just the endpoint machine.

I'm quite aware that I have bad bufferbloat problems. Unfortunately suggestions like these are kind of useless to me, and I'm surprised that they're not useless for more people. My (cable) residential internet averages about 110mbps. However, sometimes during the day it will go down to 85mbps or so, and later at night it's usually about 120mbps, and I've seen up to 135.

The way these congestion algorithms work, you've got to let the algorithm limit your bandwidth to ~95% of what's actually available. If I always got exactly 110mbps, I'd be okay with that. The problem is that I've got to limit it to 80 mbps or so for it to be useful during the day. And there's no way I'm going to do that and lose 40-50 mbps of my bandwidth every night.

Widely varying connection speeds are common enough that I'm surprised many people have found these algorithms useful. I wish there was one intelligent enough to respond dynamically to however much bandwidth is available.

My cable is very asymmetric 150/5 nominal, 170/6 best case.

I just now experimented with creating only the outgoing queue on my external interface. That's the slow direction. I went from a D in bufferbloat to an A, with just that one line addition.

Can you live with reducing just your upload speed? I sure can, I rarely upload much. But even if I were doing large amounts of cloud backup, it might not matter. If I go from 5000K to 4500K, is that really such a loss? Am I going to cry over 10%?

And if there is congestion such that my actual upload speed falls below 4500K, it's possible, maybe even likely, that I'm no worse off than before I created my upload queue.

Unfortunately the DSL Reports speedtest only tests one direction at a time. So maybe my fix doesn't work well if there is significant bidirectional simultaneous traffic?

Fortunately I'm running OpenBSD on my firewall, so it was very easy to experiment with this.

I'm running OPNsense here, so it's also fairly easy to manage. If I interpreted the DSL Reports test correctly, it looks like I see bufferbloat (median ~1.5 seconds) when downloading, but not a significant amount when uploading. So I don't think shaping only the outgoing traffic would help me. But maybe I'm misunderstanding.

we designed flent and the rrul series of tests to be able to look at these issues hard and scientifically. See flent.org for the tool.

we also added ack filtering to "sch_cake" recently. do hope someone ports it to bsd.

Would 100 Mbps (down) be acceptable?

If so, explore your cable modem and see if it's possible to manually configure the link speed to 100 Mbps (leave duplex set to auto-negotiation, though). This should cause your link to always auto-negotiate to 100/full and, at that point, bufferbloat (in that direction, at least) should (in theory) be much less of an issue (or potentially even non-existent!).

Unfortunately, some cable modems have very few configuration options that can be controlled or modified by the end user (they download their configurstions via TFTP at startup, as you probably know) -- especially when it comes to ISP-provided (or, worse, "ISP-customized") cable modems.

You're correct. Even though I have a decent third-party modem that I own, it lacks any real configurablity because most of the options get locked away by TWC.

Great article. OpenBSD makes it so simple to do this.

Anyone using Ubiquiti Edge X routers can apply this too (Linux under the hood) in almost the exact same fashion. Works great even on slow DSL links.

Also very easy on OpenWRT - just add luci-app-sqm and sqm-scripts and then turn it on, select your codel and apply.

what exactly would you do in EdgerouterX ? Do you mind sharing the commands or point to some link?

Just enable smart queue QoS in the web interface. The ER-X should be able to handle links up to about 150-180Mbit with Smart Queue QoS enabled.

I have a new edgerouter 6p, would love to know what you did to improve bufferbloat.

Just enable smart queue QoS in the web interface. The ER 6P should be able to handle links up to about 300Mbit with Smart Queue QoS enabled.

Didn't know about this issue until I saw this article.

With 2 lines added to conf, I managed to go from C, D grades to A almost instantly.

Same, why haven’t the defaults changed after all these years?

Last thing an ISP wants is for all it's users to move to a slower internet connection because a good QoS lets them avoid the lag/stutter of VOIP, video calls, gaming, etc.

Consumers shop for routers and ISPs based on things besides QoS.

Because you need to specify the bandwidth, which the router maker cannot know?

Not really, fq_codel + bbr can work based on RTT changes alone.

I have 1000/1000 fiber and I can’t make a video call over Slack. People claim my video is freezing and they’re only getting every other word. I wonder if bufferbloat could be an explanation.

Unlikely unless another host on your LAN is saturating all 1000Mbit. My first guess would be poor wifi then funky shaping on the ISP side. Try it wired and if that doesn't help try it over a VPN.

The first question which came to my mind was if OpenBSD already supports fast enough WiFi devices to work as a proper access point – looks like last time I tried was quite a while ago: https://undeadly.org/cgi?action=article&sid=20101216231634

It supports Realtek USB adapters and I remember how easy it was to enable traffic shaping with pf and always have lag free ssh-consoles!

Many more ISP are giving Router / Modem / ONT as default. One of the reason Apple exited the Router Market. The problem is their WiFi Sucks, they do not have any incentive to provide you better WiFi, or some sort of QoS.

So for many consumers, it doesn't seems buffer bloat will be fixed anytime soon.

One thing we could do is to educate the customer and hope Speedtest, Fast.com from Netflix includes buffer bloat as listing like the one in DSLReprot.

I got a B in bufferbloat and 512 mbit down and 517 mbit up on my 500/500 fiber link. It seems that I could do better when downloading though, the bufferbloat on that graph is 50 ms.

I'm going to experiment a bit with my openbsd firewall.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact