Hacker News new | past | comments | ask | show | jobs | submit login

What is bufferbloat



When your buffers are too big for the connection, it induces latency as your computer sends 100Mbps of traffic, yet the modem is capped to say 5Mbps. Its better to drop this traffic as TCP/IP and UDP will throttle itself, rather than letting 500ms to 1 second of latency be induced by holding that data in a buffer, resulting in a jerky user experience.

Edit: Most Comcast/Xfinity modems and converged gateways are Intel based and have this and other issues, pure garbage devices.


This is the main reference I used when shopping for a cable modem a couple months ago:

https://badmodems.com/Forum/viewtopic.php?t=65000

You'd have to scroll down a bit on the page (I'm copying the factual data in case the remote link goes down at a later date).

The bad news is that a lookup table is literally required to know what chipset is in use inside of the device - just like with all of the WiFi adapters :(

(PS, this table looks like garbage, but many users on mobile will cry if I prefix with spaces to make it look OK... I'd really rather every newline were a <br> element.)

Motorola/Arris Modems Motorola SB 6121 4x4 (Intel Puma 5) Motorola 6180 8x4 (Intel Puma 5) Arris SB 6183 16x4 and Motorola MB7420 16x4 (Both Broadcom)

NetGear Modems NetGear CM1100 32x8 (Broadcom) NetGear CM1000 32x8 (Broadcom) NetGear Orbi CBK40 32x8 (Intel Puma 7) Note: I tested this model and was told that this modem build into Orbi does not have same issues as other Puma 6/7 modems. I haven't seen any issues with it since using it. The Orbi modem is based off Netgears CM700.

TP-Link TP-Link - TC-7610 8x4 (Broadcom)

Routers what work with zero issues with the above cable modems list in my current collection: Asus - RT-AC66U and GT-AC5300 (OEM and Merlin FW) D-Link - Many model routers tested including COVR models. Linksys - WRT1900AC v1 and WRTx32v1 NetDuma - R1 Current firmware version (1.03.6i) Netgear - Orbi CBK40, R7800, XR450 and XR500

Forum User Modem and Router Experiences Arris - SB 6141 8x4 (Intel Puma 5) and D-Link DIR-890L and ASUS RT-AC5300 Arris - SB 6141 8x4 (Intel Puma 5) and Asus RT-AC66U Arris - SB 6183 16x4 (Broadcom) and Linksys WRT1900ACM and WRT32x and NetGear XR500 Arris - SB 6183 16x4 (Broadcom) and NetGear XR500 Cisco - DPQ3212 (Broadcom) and Asus RT-AC66r, D-Link DGL-4500, NetDumaR1 and NetGear R7000 Motorola - MB 7220 (Broadcom) and Asus RT-AC66r, D-Link DGL-4500, NetDumaR1 and NetGear R7000 TP-Link - TC-7610 8x4 (Broadcom) and NetDuma R1


> (I'm copying the factual data in case the remote link goes down at a later date).

That's what the Internet Archive is for! The URL above has now been archived [0].

[0]: https://web.archive.org/web/20190417095456/https://badmodems...


Now we have a second reference in case, god forbid, something happen to the internet archive. Something happening is always possible!


Put your (cable/DSL/fiber) 'modem' [it is a router] in bridge modus and be done with it. Anything would work then. Ubiquity gear (which I use), but also different (more open source) stuff like Turris Omnia, Turris Mox, or a lovely PC Engines APU2.


The fault isn't just with cheapo Intel edge hardware - a lot of ISP infrastructure is built with the old telco mentality of "we never drop data". Which, as you correctly point out, is precisely the wrong thing to do for an overcongested IP network.

EDIT: And the resulting problem isn't just the resulting end-user latency. TCP's congestion control mechanisms (i.e. the ones that let the endpoints push as much traffic as the network can bear and no more) rely on quick feedback from the network when they push too much traffic. The traditional, quickest, and most widely implemented methods of feedback are packet drops - when those are replaced with wildly varying latency, it's hard to set a clear time-limit for "this packet was dropped", and Long-Fat-Network detection is a lot harder.


So, with TCP the speed should depend on the bandwidth-delay product (which depends on the full peer to peer round-trip latency, because it needs ACKs coming in faster than it empties the window, otherwise the sending peer just waits).

Whereas most UDP applications are constant rate, with some kind of control channel.

Bufferbloat should not matter for your home connection. (Unless it is constantly in use by more than one client.)

However, when congestion occurs and the data you sent, that sits in these buffers are already stale, irrelevant, but the problem is that there's no way to invalidate the cache on the middleboxes. And it leads to worse performance because it clogs up pipes with stale data when those pipes get full. So it prevents faster unclogging. This results in a jerk in TCP, because it scales back more than it should have without the unnecessary wait for the network to transmit the stale data.


> Bufferbloat should not matter for your home connection. (Unless it is constantly in use by more than one client.)

That is wrong. A single client can saturate the connection easily (eg. while downloading a software update or uploading a photo you just took to the cloud). Once the buffers are full, all other simultaneous connections suffer from a multi-second delay.

The result is that the internet becomes unusably slow as soon as you start uploading a file.


You can see this effect by going to fast.com.

Using my smartphone, it induces and measures > 700ms latency on my cable modem connection. That’s worse than old-fashioned high-orbit satellite internet!


I'd encourage you to get a non-Intel modem


The problem with bufferbloat is not necessarily excess retransmissions or stale data (although that does happen), it is primarily that delay significantly increases in general, and that delay in competing streams or intermittently active streams is highly variable.

Traditional tcp congestion control in an environment where buffers are oversized will keep expanding the congestion window until it covers the whole buffer or the advertised receive window, even if the buffer is several seconds of packets. There may be some delay based retransmission, but traditional stacks will also adapt and assume the network changed and the peer is expected to be 8 seconds away.


I have a 4G modem. Whenever I watch a video and skip forward a bunch of times, the connection hangs and I have to wait for about a minute before it resumes normal operation.

Is this bufferbloat? I guess what happens is that a bunch of packets get queued up and I have to wait until all of them are delivered?


Yes, that sort of jerky behavior is symptomatic of bufferbloat. Multiple 4G and 5G devices have now been measured as having up 1.6 seconds of buffering in them. They are terribly bloated. It was my hope that the algorithms we used to fix wifi ( https://www.usenix.org/system/files/conference/atc17/atc17-h... ) - where we cut latency under load by 25x and sped up performance with a slow station present by 2.5x - would begin to be applied against the bufferbloat problem there. Recently google published how much the fq_codel and ATF algorithms improved their wifi stack, here:

http://flent-newark.bufferbloat.net/~d/Airtime%20based%20que...

Ericson, at least, published a paper showing they recognized the problem: https://www.ericsson.com/en/ericsson-technology-review/archi...

and I do hope that shows up in something, however the chipsets on the handsets themselves also need rational buffer management.


That's probably something else. The server rate limits your client, or the ISP rate limits due to too many bursts, or the client needs to buffer more of the video.

To exclude cases you'd need to watch the network traffic with something like WireShark and look at retransmissions. If it suddenly shoots up and then packets start to trickle later but very slowly, then that could be bufferbloat.

But the 1 minute seems too long.


The whole connection hangs - it's not the server or buffering and I doubt it's the ISP.

Reading more about it, you are correct about 1min being too long, therefore it's probably not (just) bufferbloat.


Probably not. It's just crappy software.


Any cable modem brands known to not use bufferbloat-ing NICs?


DOCSIS 3.1 standard introduced a good but not great Active Queue Management scheme called PIE. But upgrading your modem only helps with traffic you're sending; your ISP needs to upgrade their equipment to manage the buffers at their end of the bottleneck in order to prevent your downloads from causing excessive induced latency.


The bufferbloat project introduced a great (IMHO) fq + AQM scheme called "cake", which smokes the DOCSIS 3.1 pie in every way, especially with it's new DOCSIS shaper mode in place. It's readily available in a ton of home routers now, notably openwrt, which took it up 3 years ago. It's also now in the linux mainline as of linux 4.19. The (first of several) papers on it is here: https://arxiv.org/abs/1804.07617

I hope to have a document comparing it to docsis 3.1 pie at some point in the next few months, in the meantime, I hope more (especially ISPs in their default gear) give cake a try! It's open source, like everything else we do at bufferbloat.net and teklibre.


Use a decent router on your side and configure it to rate limit slightly below the modems's limits. This avoids ever creating a queue in their boxes. You can run a ping while tweaking your router rate limit settings to find the point where it is just about queuing but not quite, to optimize both throughput and latency.


Depending on your speed, you may need a bit more than just a decent router. Many routers can't hardware accelerate qos traffic, which will be needed to limit the speed.

My Netgear R7000 can't handle my 400mbps connection using qos throttling. I will need probably at least a mid range Ubiquiti router to handle it.


Ubiquiti routers won't help you; they're even more reliant on hardware acceleration than typical consumer brands, and nobody has put the best modern AQM algorithms into silicon yet. What you really need is a CPU fast enough to perform traffic shaping and AQM in software, which ironically means x86 and Intel are the safest choices.


Well, bufferbloat is at it's worst on slow connections (<100Mbit) and 50 dollars worth of router can fix it there in software.


Only if the firmware implements the algorithms. OpenWRT is your best bet for this: I have it running on a TL-WDR3600 quite well.


> Any cable modem brands known to not use bufferbloat-ing NICs?

Avoid modems with Intel, specifically the various "Puma" chipsets. Best to double-check the spec sheet on whatever you buy.

The main alternative seems to be Broadcom-based modems: TP-Link TC7650 DOCSIS 3.0 modem and Technicolor TC4400 DOCSIS 3.1 modem (of which there are a few revisions now).


A $45 router is enough to de-bufferbloat connections up to several hundred megabits, and past that bufferbloat is less of a concern (in part because its difficult to saturate).


My $120+ Netgear r7000 can't quite handle my 400mbps connection when qos filtering is turned on. If anyone wants a reference.


There are cheap and good routers.

$69 MikroTik hAP ac2 will easily push 1Gbps+ with qos rules - https://mikrotik.com/product/hap_ac2#fndtn-testresults (it's a bit more tricky to setup and you need to make sure you don't expose the interface to the internet)


It's not the hardware that's at fault, it's the software and/or configuration. See here: https://news.ycombinator.com/item?id=17448022


From https://en.wikipedia.org/wiki/Bufferbloat

_Some communications equipment manufacturers designed unnecessarily large buffers into some of their network products. In such equipment, bufferbloat occurs when a network link becomes congested, causing packets to become queued for long periods in these oversized buffers. In a first-in first-out queuing system, overly large buffers result in longer queues and higher latency, and do not improve network throughput._

I hope I get this right, please correct if needed: So basically Intels chipsets were creating what looked like a fat network pipe that accepted packets from the host OS really fast but in fact was just a big buffer with a garden hose connecting it to the network. The result is your applications can write these fast bursts, misjudge transmission timing causing timing problems in media streams like an ip call leading to choppy audio and delay. The packets flow in fast, quickly backup and the ip stack along with your application now have to wait (edit: I believe the proper thing to say is the packets should be dropped but the big buffer just holds them keeping them "alive in the mind" of the ip stack. The proper thing to do is reject them and not hoard them?). The buffer empties erratically as the network bandwidth varies and might not ask for more packets until n packets have been transmitted. Then the process repeats as the ip stack rams a load more into the buffer and again, log jam.

A small buffer fills fast and will allow the software to "track" the sporadic bandwidth availability of crowded wireless networks. At that point the transmission rate becomes more even and predictable leading to accurate timing. That's important for judging the bitrate needed for that particular connection so packets arrive at the destination fast enough.

Bottom line is don't fool upstream connections into thinking that your able to transmit data faster than you actually can.


It's also a problem because a few protocols in you may use from time to time (like, say, TCP) rely on packet drops to discover and detect network throughput. TCP's basic logic is to push more and more traffic until packets start to drop, and then back off until they stop dropping. And then it keeps on doing this in a continuous cycle so that it can effectively detect changes in available throughput. If the feedback is delayed, then this detection strategy results in wild swings in the amount of traffic the TCP stack tries to push through, usually with little relation to the actual network throughput.

Buffering is layer 4's job. Do it on layer 2[a] and the whole stack gets wonky.

[a] Except on very small scales in certain pathological(ly unreliable) cases like Wi-Fi and cellular.


Is there no way to limit how much of the buffer is used via some config?


Usually not. There may be an undocumented switch somewhere in the firmware that a good driver could tweak? Depending on the exact hardware. But end-user termination boxes, whether delivered by the ISP or purchased by the end-user, are built for as cheap as possible and ship with whatever under-the-hood software configuration the manufacturer thought was a good idea. Margins are just too narrow to pay good engineers to do the testing and digging to fix performance issues. (Used to work at a company that sold high-margin enterprise edge equipment, and even there we were hard-strapped to get the buggy drivers and firmware working in even-slightly-non-standard configurations. Though 802.11 was most of the problem there.)

And in the case of telco equipment, that's an tradition-minded and misguided conscious policy decision.


Your analysis is correct.

Smaller buffers are in general better. However advanced AQM algorithms and fair queueing make for an even better network experience. Being one of the authors of fq_codel (RFC8290), it has generally been my hope to see that widely deployed. It is, actually - it's essentially the default qdisc in Linux now, and it is widely used in quite a few QoS (SQM) systems today. The hard part nowadays (since it's now in almost every home router) is convincing people (and ISPs) to do the right measurement and turn it on.

https://www.bufferbloat.net/projects/bloat/wiki/What_can_I_d...



It’s when you’re gaming and your ping jumps to 500 because someone is watching Netflix. It’s one of the main flaws in the currently deployed internet for end users. There are a lot of novel solutions (codel and so on) - but still not widely deployed.

More generally it refers to any hardware/system with large buffers - needed to handle large throughput but can lead to poor latency due to head of queue blocking.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: