One of the solutions they mention is underutilizing links. This is probably a good time to mention my thesis work, where we showed that streaming video traffic (which is the majority of the traffic on the internet) can pretty readily underutilize links on the internet today, without a downside to video QoE! https://sammy.brucespang.com
Packet switching won over circuit switching because the cost-per-capacity was so much lower; if you have to end up over-provision/under-utilize links anyways, why not use circuit switching?
TFA suggest a 900% overcapacity, not a few percent. I just skimmed GP's article but it seems to suggest a ~100% overcapacity for streaming-video specifically.
A physical circuit costs a lot, so much more that it's not even funny.
You can deploy a 24-fiber optical cable and allow many thousand virtual circuits to run on it in parallel using packet switching. Usually orders of magnitude more when they share bandwidth opportunistically, because the streams of packets are not constant intensity.
Running thousands of separate fibers / wires would be much more expensive, and having thousands of narrow-band splitters / transcievers, also massively expensive.
Phone networks have tried that all, and gladly jumped off the physical circuits ship as soon as they could.
> Anyway, circuits are more expensive than just running a packet-switched network lightly loaded.
This was undoubtedly true (and not even close) 20 years ago. As technology changes, it can be worth revisiting some of these axioms to see if they still hold. Since virtual circuits require smart switches for the entire shared path, there are literal network effects making it hard to adopt.
The old and new standard ways to do virtual circuit switching are ATM (heavily optimized for low latency voice - 53 byte packets!) and MPLS (seems to be a sort of flow labeling extension to "host" protocols such as IP - clever!).
Both are technologies that one rarely has any contact with as an end user.
Sources: Things I've read a long time ago + Wikipedia for ATM, Wikipedia for MPLS.
> my thesis work, where we showed that streaming video traffic [...] can pretty readily underutilize links on the internet today, without a downside to video QoE!
was slightly at a loss in what exactly needed to be shown here until i clicked the link and came to the conclusion that you re-invented(?) pacing.
I would definitely not say that we re-invented pacing! One version of the question we looked at was: how low a pace rate can you pick for an ABR algorithm, without reducing video QoE? The part which takes work is this "without reducing video QoE" requirement. If you're interested, check out the paper!
> One version of the question we looked at was: how low a pace rate can you pick for an ABR algorithm, without reducing video QoE?
that is certainly an interesting economical optimization problem to reason about, though imho somewhat beyond technical merit, as simply letting the client choose quality and sending the data full speed works well enough.
addition:
i totally agree that things have to look economical in order to work and that there are technical edge-cases that need to be handled for good ux, but i dont't quite see how client-side buffer occupancy in the seconds range is in the users interest.
i did not read that paper with a focus on adaptive bitrate selection for video streaming services that came out 8 years after the pacing implementation hit the kernel. thx thou
Can you comment on latency-sensitive video (Meet, Zoom) versus latency-insensitive video (YouTube, Netflix)? Is only the latter “streaming video traffic”?
We looked at latency-insensitive like YouTube and Netflix (which were a bit more than 50% of internet traffic last year [1]).
I'd bet you could do something similar with Meet and Zoom–my understanding is video bitrates for those services are lower than for e.g. Netflix which we showed are much lower than network capacities. But it might be tricky because of the latency-sensitivity angle, and we did not look into it in our paper.
> Meet and Zoom–my understanding is video bitrates for those services are lower than for e.g. Netflix
For a given quality, bitrate will generally be higher in RTC apps (though quality may be lower in general depending on the context and network conditions obviously) because of tradeoffs between encoding latency and efficiency. However, RTC apps generally already try to underutilize links because queuing is bad for latency and latency matters a lot for the RTC case.
the term “streaming video" usually refers to the fact that the data is sent slower than the link capacity (but intermittently faster than the content bitrate)
op used the term presumably to describe "live content" eg. the source material is not available as a whole (because the recording is not finished); which can be considered a subset of "streaming video"
the sensitivity in regard to transport characteristics stems from the fact that "live content" places an upper bound for the time required for processing and transferring the content-bits to the clients (for it to be considered "live").
Sounds like a good argument for using a CDN. Or to phrase it more generally, an intermediary that is as close as possible to the host experiencing the fluctuating bandwidth bottleneck, while still being on the other side of the bottleneck. That way it can detect bandwidth drops quickly and handle them more intelligently than by dropping packets - for instance, by switching to a lower quality video stream (or even re-encoding on the fly).
Section 6.2 of the paper says Google does traffic engineering (i.e. designating some traffic to be latency tolerant). That's just what solution 3 is in the article. For a single company, of course it can prioritize different traffic differently. It would be more troublesome for an ISP to do the same.
Can someone explain this to a layman? Because it seems to me the four solutions proposed are:
1. Seeing the future
2. Building a ten times higher capacity network
3. Breaking Net neutrality by deprioritizing traffic that someone deems “not latency sensitive”
4. Flood the network with more redundant packets
Hi! I’m also a layman who doesn’t really know what he’s talking about.
The article ends with “ I will leave you with a question: Are we trying to make end-to-end congestion control work for cases where it can’t possibly work?”
So, it seems to me that there may not be any good solutions to latency spikes. The article is basically pointing out that you either pursue one of the unfortunate solutions mentioned, or be resigned to accept that no congestion control mechanism will ever be sufficient to eliminate the spikes. This seems a valuable message to the people who might be involved in developing the sort congestion control mechanisms they’re talking about.
I think the author is saying that there ARE solutions, but none of them are really viable.
Seeing the future obviously can't happen. Building a higher capacity network is just wasted money. Breaking NN is going to be unpopular, not to mention determination of "not latency sensitive" is going to be difficult to impossible unless there's a "not latency sensitive" flag on TCP packets that people actually use in good faith. And flooding the network with more redundant packets is just going to be a colossal waste of bandwidth and could easily make congestion issues worse.
DSCP already exists, but there's never been an Internet-wide incentive to use it correctly.
It is theoretically plausible that end-users could mark their packets 'latency/jitter sensitive' or 'high throughput'. If there was an Internet-wide consensus that those two options made useful trade-offs then there would be an incentive to use it correctly. That would be as NN-safe as any other scheme.
For example, maybe 'latency/jitter sensitive' buffers less, but at a 25+% throughput penalty. 'High throughput' would then be faster, but have much higher latency/jitter.
The trick would be ensuring fairness without requiring too much hardware at scale.
It's a political debate between industry and 'the people' whether control over network utilization is exercised by components/bottom-up or endpoints/top-down.
both sides claim fairness/net-neutrality and performance on their side but one is deployed and working while the other one is standardized and looking for an application.
presumably there is room for a compromise but defining "queue building flow" w/o regard to link capacity seems like a step in the wrong direction
> That is not true for networks where link capacity can change rapidly, such as Wi-Fi and 5G.
Is this problem almost exclusively to do with the "last-mile" leg of the connection to the user? (Or the two legs, in the case of peer-to-peer video chats.) I would expect any content provider or Internet backbone connection to be much more stable (and generally over-provisioned too). In particular, there may be occasional routing changes but a given link should be a fiber line with fixed total capacity. Or is having changes in what other users are contending for that capacity effectively the same problem?
I don't agree. When I think about my experience at say Google, there are quite a variety of ways we'd avoid graphs hitting 100% (or say 90% if that's where the user experience takes a marked turn for the worse).
* We absolutely would spend the money to avoid globally hitting that number.
* We'd use ToS so that user-facing TCP traffic wouldn't get dropped, but instead some less latency-sensitive and more loss-resistant transfer protocol traffic would instead.
* We'd have several levels/types of load balancing from DNS-based as they come into the network (typically we'd direct to the closest relevant datacenter but less so as it gets overloaded) to route advertising to Maglev<->GFE balancing to GFE<->application balancing and so on.
* etc.
I would expect that'd be true to some extent for any content provider. There are surely some problematic hops in the network (I've seen alleged leaked graphs of Comcast backbone traffic flattening out at 100% at the same time every day) but the entire network is oversubscribed...and running out of capacity regularly in practice? No way.
Its oversubscribed as many users share the same link at some point and that link is not big enough to allow all users to use all their bandwidth at the same time. ISPs will oversubscribe and add more capacity when needed to avoid congestion, around 70-95% utilization depending on link size. American ISPs seems to not care as much though.
It's really not interesting to say that a major ISP doesn't have capacity for all of their customers to use their advertised bandwidth at once. That extreme just doesn't happen, and so the cost/benefit of preparing for it just isn't there. Some oversubscription is normal. And when I said typically overprovisioned, I meant relative to actual observed/projected load rather than theoretical worst-case.
For their upstream links to actually be maxed out (thus "experiencing congestion" as Hikikomori put it) with any regularity is more remarkable—suggests they screwed up their capacity planning or just don't care. I kind of expect that from Comcast but not ISPs in general.
For those links to be of varying capacity (like the 5G/Wifi networks the article mentions) would be truly surprising to me.
It's not helpful to say the term oversubscribed if you mean something different than the existing meaning. Just make up your own word for what you mean or use a different word.
> ts oversubscribed as many users share the same link at some point and that link is not big enough to allow all users to use all their bandwidth at the same time.
that's a definition of oversubscription.
> around 70-95% utilization depending on link size.
2/3's with dumb queues, <100% with the computational tradeoff of sqm
I think this thread has gotten quite deep without anyone engaging with my original point: capacity-varying links are really only a thing right next to the user, and even actual drops are mostly next to the user too. I've explained why within Google user-facing packets rarely drop and suggested other content providers have similar mechanisms.
I said this:
> the entire network is oversubscribed...and running out of capacity regularly in practice? No way.
and you took that to mean "the network is not oversubscribed"? and this is what you're focusing on? No, the "...and" was the important part. Forget the word oversubscribed. It's a word you introduced to the conversation, and it's a distraction. I don't care about the theoretical potential for congestion; I care about where congestion mostly happens in practice.
It's fun to think about the theoretical maximum in a multi-link scenario. One thing that pops out of the analysis -- there are diminishing returns to capacity from adding more links. Which means at some point an additional link can begin to reduce capacity as it starts to eat more into maintenance and uptime budget than it offers in capacity.
There are multiple SD-WAN vendors with active-active multipath functionality. Typically the number of parallel active paths is capped at something like 2 or 4. A few esoteric vendors do high numbers (12-16). Fundamentally your premise is correct, but the overhead % is a single digit as I understand it. Slightly different than Amdahl's law in my eyes (transmission of data v computation).
What do you mean “this purpose”? The text linked in OP says
> Congestion signalling methods cannot work around this problem either, so our analysis is also valid for Explicit Congestion Notification methods such as Low Latency Low Loss Scalable Throughput (L4S).
L4S (or at least, the dualq implementation) uses a shallow buffer. You can't get 1000ms of latency if your buffer is only 15ms. Although the title claims there's no way round latency spikes, their paper [1] explores the case where latency spikes are traded off against allowing packet loss, which is what you're going to get if you fill a shallow buffer. For real-time traffic, packet dropping is usually preferable.