Researchers discover major roadblock in alleviating network congestion

mhandley · on Aug 4, 2022

The paper is about delay-based congestion control algorithms such as BBR, Vegas, etc. From their conclusion:

"We offer three key conclusions for CCA designers, who should model (or better estimate) non-congestive delays explicitly in delay-convergent CCAs. First, to utilize the link efficiently, a CCA must maintain a queue that is larger than the non-congestive delay on the path; second, this alone is not enough to avoid starvation, but in addition the variation in the queueing delay in steady state must also be greater than one-half of the delay jitter; and third, if we have a prior upper bound on sending rate, we may be able to avoid starvation while also reducing the queueing delay variation."

So what non-congestive delays are the problem? A key one is the way modern WiFi agregates multiple packets into AMPDUs for transmission to reduce the number of medium acquisitions it needs to perform. When WiFi is under heavy load this gives good throughput, at the expense of per-packet jitter. If I understand correctly, their conclusions mean delay-based congestion control loses the signal it needs to converge when the target queue size is similar to the non-congestive jitter. Many of us working on congestion control have wondered about this in the context of BBR, and it's great that this paper formalizes the issue.

The implication is that such delay-based schemes need to add perhaps a few tends of milliseconds to their queue delay targets to avoid potential starvation. Doesn't seem like a real roadblock, as per the article title, but the desire to reduce latency further does perhaps increase the incentive for more ECN/AQM research. It's perfectly possible to return ECN congestion marking before a queue starts to build, so this latency cost isn't fundamental.

gz5 · on Aug 4, 2022

Flow control and QoS was critical 20 years ago. I helped build a global VoIP network, and some of our patents included dynamic routing across multiple ASes...they were critical.

Now, we (different company) have similar real-time algorithms, and the algorithms see much less problems (mainly across backbones of AWS, Azure, Oracle, IBM, Alibaba).

I suspect this is due to more bandwidth, more routes and better network optimization from the ISPs (we still see last mile issues but those are often a result of problems which better flow control algorithms usually can't completely solve).

Curious if ISP engineers can give a more expert view on the current state of the need or impact of better flow control in middle mile and/or last mile situations?

foobiekr · on Aug 4, 2022

Honestly, QoS at the internet level was never really a big thing outside of special cases like VoIP. Network gear vendors tried like hell, desperately, from around 1998 forward to convince everyone to do diffserv, qos, etc etc etc like crazy plus DPI because they thought they would charge more by having complex features and "be more than a dumb pipe."

The situation now is that bandwidth is plentiful. A lot changed in 20Y.

100G and 400G are now quite cheap - most of the COGS for a given router is the optics, not the chassis, control plane or NPU, and optics has been, on and off, a pretty competitive space.

Plus, almost all the traffic growth has been in cache-friendly content - non-video/audio/sw image growth has been modest and vastly outpaced by those. Not just cache friendly but layered cache friendly. Of the modern traffic types, only realtime audio-video like Zoom is both high volume and sensitive to latency and congestion. That's a small component, is often (but not always) either hosted or has a dedicated network of pops that do early handoff, and so on, so your typical backbone is now mostly carrying CDN cache misses..

phkahler · on Aug 4, 2022

Not a networking guy. I'm curious if packets have priorities, and if so does everyone get greedy and claim to be high priority? They talk about delay reduction in the article, but a lot of the internet bandwidth today seems to be video which doesn't need to have low latency once it's got a bit in the receive buffer. It just seems like gamers packets should be prioritized for delay while streaming stuff should be (maybe) prioritized for bandwidth, possibly with changing priority depending how far ahead of the viewer the buffer is. Not sure where regular web traffic would fit in this - probably low delay?

uluyol · on Aug 4, 2022

Paket properties are not respected on the public internet, but organizations do make use of them internally.

For public clouds who operate global networks, they can typically send video streams at low/mid priority all the way to "last mile" ISP by peering with so many networks and just running massive WANs internally. So they can get most of the benefits of prioritization, even though the internet doesn't support it.

kkielhofner · on Aug 4, 2022

"The internet" is "best effort". What this really means is all of the networks that make up the internet don't pay attention to the DSCP marks[0] in the IP header.

In reality almost all large networks (internal and internet traffic handling) use MPLS[1] or some variant to tunnel/encapsulate different types of traffic and handle priority that way while not paying attention to whatever DSCP markings users can arbitrarily set. MPLS (in most cases) is invisible to the end user so the carrier can do their own QoS while not allowing customer configuration to impact it.

If "the internet" cared about DSCP you would definitely see the situation you're describing where everyone would just mark their traffic highest priority. Note you can still mark it, it's just that no one cares or respects it.

On your network and queues you can definitely use DSCP and 802.1p[2] (layer 2 - most commonly ethernet) to prioritize traffic. Thing here is you need equipment end to end (every router, switch, etc) that's capable of parsing these headers and adjusting queueing accordingly.

As if this isn't complicated enough, in the case of the typical edge connection (a circuit from an ISP) you don't have direct control of inbound traffic - when it gets to you is just when it gets to you.

Unless you use something like ifb[3], in which case you can kind of fake ingress queuing by way of wrapping it through another interface that effectively makes the traffic look like egress traffic. All you can really do here is introduce delay and or drop packets which for TCP traffic most commonly will trigger TCP congestion control, causing the transmitting side to back off because they'll think they're sending data too fast for your link.

UDP doesn't have congestion control but in practice that just means it's implemented higher in the stack. Protocols like QUIC, etc have their own congestion control implemented that in many cases can effectively behave like TCP. The difference here is the behavior in these scenarios is left dictated to the implementation as opposed to being at the mercy of the kernel/C lib/wherever else TCP is implemented.

Clear as mud, right?

Good news is many modern end user routers just kind of handle this with things like FQ-Codel, etc.

[0] - https://en.wikipedia.org/wiki/Differentiated_services

[1] - https://en.wikipedia.org/wiki/Multiprotocol_Label_Switching

[2] - https://en.wikipedia.org/wiki/IEEE_P802.1p

[3] - https://wiki.linuxfoundation.org/networking/ifb

sitkack · on Aug 4, 2022

Thank you! I learned a ton.

kkielhofner · on Aug 4, 2022

No problem. I'm happy I've had experience with all of this and learned it. I'm happier I don't have to deal with it on a daily basis anymore!

vlovich123 · on Aug 4, 2022

And 20 years of properly tuning congestion control algorithms. Don’t underestimate the benefits BBR and the fight against buffer bloat in the early 2000s did to improve the quality of TCP stacks.

bornfreddy · on Aug 4, 2022

Spoiler: they have only shown that existing algorithms can't always avoid starvation, not that such algorithms don't exist.

Sakos · on Aug 4, 2022

From the article:

> While Alizadeh and his co-authors weren’t able to find a traditional congestion control algorithm that could avoid starvation, there may be algorithms in a different class that could prevent this problem. Their analysis also suggests that changing how these algorithms work, so that they allow for larger variations in delay, could help prevent starvation in some network situations.

bornfreddy · on Aug 4, 2022

Yes. The title is misleading though, these are not roadblocks, just imperfections of existing solutions.

agentgumshoe · on Aug 4, 2022

Roadblocks can be removed...

sixbrx · on Aug 4, 2022

Roadblocks don't block hypothetical flying cars either though.

davidgrenier · on Aug 4, 2022

Proudly, I thought I had identified a pun.

bryanrasmussen · on Aug 4, 2022

If you cannot avoid starvation, can you detect that starvation has happened to a particular user? I suspect the answer is no, because if you can detect which user is starved surely you could take bandwidth from those with a surfeit etc.

irrational · on Aug 4, 2022

Like GI Joe says, Knowing is half the battle. Now that we know about the problem, hopefully someone(s) can devise a solution.

stingraycharles · on Aug 4, 2022

Does anyone have a link to the paper? I’ve been working with various congestion control / QoS algorithms over the past two years as a hobby, and there are plenty of new developments going on in recent years. I’m curious which algorithms they studied, and what the actual roadblock is, because I’m sceptical they weren’t just looking for a great punch line for an article (e.g. perhaps the problem is more theoretical than practical).

Deathmax · on Aug 4, 2022

I'm guessing it's http://people.csail.mit.edu/venkatar/cc-starvation.pdf

vadiml · on Aug 4, 2022

Given the fact that the root cause of the problem with current CC algorithms is inability to discriminate between congestion and jitter provoked delays the obvious solution will be to implement some kind of method to report jitter state to the source. Maybe some kind of ICMP packet or IP option.

foobiekr · on Aug 4, 2022

Someone should coin this as a law: "All problems on the internet will be solved with more bandwidth."

Everything else will turn out to be a configuration burden, control plane load problem, security issue, hard to debug (especially after the fact, impossible) and so on.

Bandwidth and NPUs with designed-in minimal latency are easy to metric, easy to measure, easy to deploy, easy to implement and so on. They have very predictable behaviors.

Reality is that we are entering a phase where networks can be vastly simplified. MPLS is going, SR is here for the time being, QoS is dying, multicast dead, SDWAN is going to be a thing for a few more years then dead, and so on.

stingraycharles · on Aug 4, 2022

Isn’t this already somewhat there, the ECN bit of TCP?

jdthedisciple · on Aug 4, 2022

At least in TCP can't jitter be detected and measured using ACKs?

I thought some algorithms even do that already

toast0 · on Aug 4, 2022

You can measure jitter, but you don't know how much of the delay was due to congestion, and how much was due to other factors.

Something mentioned elsewhere in the thread is wifi physical layers may wait to send small packets until they get a few small packets to aggregate or a timeout. Other systems that aggregate packets may do something similar.

On time division multiplexed systems, it may take measurable time to wait for an assigned slot, even if the channel isn't congested. Some packets would get lucky and have a short wait and others would have a longer wait.

This would be challenging to signal as the delay is added at a hop-by-hop level, but whether the delay is significant enough to signal is unknowable at that level. Maybe you could ask all hops to add to a field indicating milliseconds of delay (or some other fixed increments), but I don't think there's room for that in existing fields and you'd have a heck of a time getting meaningful support. ECN took how long to be usable on the internet, because some middleboxes would drop connections with it enabled, to say nothing of devices actually flagging congestion.

signa11 · on Aug 4, 2022

rfc-4689 (iirc) defines jitter as the fluctuation in forwarding delay between 2 consecutive received packets in a stream.

not sure how this would work for tcp-ack's which can be cumulative. moreover, any approach that does this measurement must account for both delayed and lost/corrupt packets.

imho, only a true realtime jitter measurement would do, anything else would be a crude approximation, and might result in the same flaws as before...

[edit]: examples of '...anything else...' mentioned above might be inter-arrival histogram where receiver relies on packets being transmitted at a fixed cadence, in this case lost/corrupted packets would badly skew the computed numbers. another approach might be post-processing (after packet capture) where limited buffer space might prove to be the achilles heel etc.

vadiml · on Aug 4, 2022

In VOIP applications the correct jitter estimation is very important and RTP/RTCP protocols are doing it pretty well. Of course each RTP packet has a timestamp which simplifies the task

urthor · on Aug 4, 2022

Not sure that will work in real world scenarios.

How would you know they're honestly reporting jitter?

skyde · on Aug 4, 2022

Instead of using TCP connection as the unit of bandwidth, ISP use "Link/circuit". Think of it like configuring a guaranteed minimum bandwidth for a VNET.