Hacker News new | past | comments | ask | show | jobs | submit login
PCC: Re-architecting Congestion Control for Consistent High Performance [pdf] (arxiv.org)
32 points by adamnemecek on Sept 29, 2014 | hide | past | web | favorite | 13 comments

Thank you all for your interests!

We just updated a full-length version of PCC to arXiv, but it takes some time for arXiv to process and post online. The full version discussed our approach in a more detail and, I believe, clearer way. We have more evaluation for more challenging network scenarios and also evaluated PCC with different utility functions expressing different data transfer objectives.

So here is a preview: http://web.engr.illinois.edu/~modong2/pcc_fullpaper.pdf During this two weeks, we are writing some simple docs and making the file-transfer SOFTWARE more user-friendly. We also plan to release our experiments/demo CODE so one can easily play with and compare PCC with existing TCP variants.

Code is available at https://github.com/modong/pcc, but currently it is under some refactoring and some compile/run instructions are outdated. We will have a more formal release in two weeks.



Thanks for the preview Mo, I am reading through it now. Would there be any further advantage to measuring one-way-delay as opposed to measuring RTT?

That's not an unreasonable approach. When I came up with the original approach to congestion control in TCP 30 years ago (see Internet RFCs 896 and 970), TCP behavior under load was so awful that connections would stall out and fail. The ad-hoc solutions we put in back then at least made TCP well behaved under load.

In the early days of TCP/IP, it was clear that there was no known solution to congestion in the middle of a big datagram network. There still isn't. Out near the edges, we can use fair queuing, which I invented and named, and which is offered in most Cisco routers for links below Ethernet speeds. We still don't know what to do in the middle of the network. "Random Early Drop" is an ugly hack, but it's what we have in the big routers. What saved us for years is that fiber backbones had so much more bandwidth than the last mile consumer connections that the congestion stayed near the ends. Then came streaming HD video.

The treatment of congestion went off in a different direction than I'd expected. Originally, ICMP Source Quench messages were used for congestion control. Routers were expected to tell endpoints when to slow down. But they weren't implemented in Berkeley's networking stack, and gradually were phased out. (They would have been a denial of service attack vector, anyway.) Instead, congestion was inferred from packet loss. Even worse, congestion is inferred from not having a packet acknowledged, so you don't know which direction is congested. That's not a particularly good approach, but it's what we've got.

I originally took the position that, as an endpoint, you were entitled to one packet in flight. That is, the network should be provisioned so that every connection can have one packet in transit without packet loss. If you have that much capacity, and you have fair queuing in routers, you only lose packets if you have more than one in flight. You're thus competing with yourself for bandwidth, and an "increase slowly, back off big on packet loss" strategy works fine.

In a world with Random Early Drop, not enough fair queuing, and big FIFO buffers (someone called this "bufferbloat"), you get both packet loss and excessive transit time. It's hard for the endpoints to adapt to that no matter what they do.

Anyway, there's nothing wrong with using a smarter algorithm to calculate optimal window sizes and retransmission times. As long as it evaluates how well it's doing and feeds that back into the control algorithm in a rational way, that's fine. Today, you could probably afford to run machine learning algorithms in routers. So go for it!

If you're going to do that, it's a good time to fix delayed ACK as well. As is well known, the interaction between delayed ACKs and the Nagle algorithm is awful. This is a historical accident; I implemented one, the Berkeley guys implemented the other, and by the time I found out I was out of networking and involved with a small startup called Autodesk. So that never got fixed. Here's how to think about that. A delayed ACK is a bet. You're betting that there will be a reply, upon with an ACK can be piggybacked, before the fixed timer runs out. If the fixed timer runs out, and you have to send an ACK as a separate message, you lost the bet. Current TCP implementations will happily lose that bet forever without turning off the ACK delay. That's just wrong.

The right answer is to track wins and losses on delayed and non-delayed ACKs. Don't turn on ACK delay unless you're sending a lot of non-delayed ACKs closely followed by packets on which the ACK could have been piggybacked. Turn it off when a delayed ACK has to be sent.

I should have pushed for this in the 1980s.

John Nagle

John, if you care to get back in touch with Jim Gettys and I, we can update you on where things stand on the whole bufferbloat effort.

FQ on modern cisco routers is generally quite crippled, and even that much is not available elsewhere.

However, on the whole, the fq + aqm situation along the edge of the internet is looking up, with nearly every third party home router firmware incorporating fq_codel (notably openwrt and derivatives), and it being integral to streamboost, and other QoS systems like those in the netgear X4.

Some examples of the kind of results we get now from these new algos:



I really liked the PCM paper (wanted code tho) for clearly identifying the parameter space modern tcps need to operate in. FQ everywhere, or nearly everywhere, would make the tcp's job far easier.

There is an aqm/packet scheduling working group in the ietf in the process of standardization, etc.

They lost me when it became obvious that PCC doesn't interoperate very well with "normal" TCP (let's call it reno-tcp[1]). TCP is not the optimal solution and PCC may very well be better but there is just no way to use a new and unfair-to-reno and ignore all consequences. There must be some safeguards to handle all the older systems (be they routers, embedded systems, IoT or just plain old Win XP machines).

It is an interesting approach and an interesting research that drives us forward but it is in no way a practical approach.

They implemented this thing in UDT, if they can come up with a Linux patch that would be awesome for real world testing and implementation verification.

[1] Reno is the original TCP congestion control, New-Reno was the most common adaptation for a long time. These days CUBIC is the default in Linux, Windows has their own algorithm. All of these behave in a very similar way to the old Reno, they all have the basic structure of AIMD - Additional Increase, Multiplicative Decrease (slowly increasing on no signal and fast decrease on loss signal). in the case of CUBIC and H-TCP the slow increase is not very slow and the fast decrease was slowed down to handle very high speed links and high latency (high bandwidth delay product).

baruch -- I'll add a bit to what Mo said. There are many systems, including major services from large companies that we all use every day, that use alternate or unfriendly TCPs behind the scenes (and sometimes in the public Internet). So while we wouldn't recommend deploying PCC Internet-wide without a good deal more development and testing, it's quite practical and useful today for many systems such as where PCC can be isolated from TCP.

Further on in section 4.3 they actually discuss TCP friendliness in more detail and show that 1 PPC flow (with often more than 10x throughput) is more friendly to 1 unselfish TCP flow than 10 selfish TCP flows. Also, PPC is not too far ahead of SABUL (UDT), and SABUL has been shown to have excellent fairness when compared with TCP.

They only compared with multi-TCP, and while this is a method that can be used to utilize high BDP with AIMD-TCP it is not a common use case. In most cases you just have a browser with independent streams or a bittorrent client with quite a few somewhat short-lived connections to a wide range of locations and also just a plain single transfer from a remote server to upgrade your Debian system.

These are all not quite multi-TCP and there was no real mention of fairness of PCC against a single TCP stream, even of the CUBIC form, not to mention just plain old NewReno.

I can't tell if it's really unfair as I didn't test it myself, but they hint that there is a wide gap between PCC and NewReno or even CUBIC. The algorithm as I understand it also seems to imply that PCC will dominate a NewReno stream with ease, especially when PCC is an established connection and the NewReno is only starting up as the PCC will feel a small bump of losses, probably not do much about it (i.e. not reduce rate by much) and the NewReno generated packet losses will fly under the radar of PCC. At least that's the feeling I get from reading the paper, again, no idea about actual behavior but I'm no longer paid to look at packet traces of congestion control algorithms. (Shame, it was fun while it lasted :-)

I think in practice it will work well, given that bufferbloat is often the cause of congestion on a home network. TCP itself is usually quadratically impaired when it comes to reacting and backing off in the face of increasing latency caused by buffers (uploading a large file through Dropbox or Google Drive will most likely cause ping times on other machines on the same network to degrade from milliseconds to seconds).

When it comes to bufferbloat, a single TCP flow can dominate all others and be highly unfair because TCP's congestion algorithms are so slow to react (they often only react to packet loss and not latency).

LEDBAT and delay-based congestion algorithms were designed to handle bufferbloat well, and PCC is essentially a superset of these, taking into account loss rate and throughput in addition to just delay.

This actually can lead to a very interesting discussion about TCP friendliness, which I didn't have space to elaborate in paper:

1. The Trend of Selfish Applications: TCP's poor performance is not only an object of research study. More and more applications have chosen to ignore the notion of TCP friendliness and use selfish application-level optimization techniques such as opening many parallel connections (as in modern web browsers and GridFTP), contacting multiple servers in parallel (in BitTorrent), or modifying TCP to use a very large initial window size. In fact, some CDN providers use selfish optimization of TCP to accelerate data transfer from CDN servers to end users on the public Internet. This trend indicates that often, applications prefer higher performance regardless of potential TCP unfriendliness. Although we recommend use of PCC only when it is isolated from legacy TCP, it is likely that due to PCC's robust high performance, some or many users (especially those suffering in network environments that are challenging for TCP) will use PCC as another selfish performance optimization technique. Legacy TCP users, on the other hand, will unfortunately suffer from performance degradation when competing with PCC and other selfish optimization techniques under FIFO queue based network.

What happens next in this tussle? One possibility is a slow arms race towards selfish high performance rate control. If done through adoption of PCC using our safe utility function, it will actually avoid a ``tragedy of the commons'' situation as we have shown. But more immediately, network owners (ISPs) will likely take action to avoid complaints.

2.ISP response to selfishness:

o maintain legacy TCP users' satisfaction, network operators can respond to selfishness by deploying flow-level or customer-level resource isolation at the point of congestion. Of course this requires hardware support, but it is feasible for two reasons. First, unlike protocols like XCP and RCP, an ISP can deploy resource isolation as a local change at an individual congested device, requiring no signaling or packet header changes or explicit feedback to end-hosts. Second, the belief that per-flow FQ is infeasible for high speed switching hardware is now a misconception, with merchant silicon supporting hundreds of thousands of flows at hundred-gigabit speeds. Even by 1999, early implementations used limited hardware (less than 33Mhz FPGA or CMOS) to support FQ on multi-gigabit links. Today, an off-the-shelf EZchip NP-4 supports 256K flows with 100 Gbit throughput, VSC2202 supports 500K flows with Gigabit Ethernet throughput, and so on. We obtained a quote of $25 for the VSC2202, of which the cost of the FQ logic itself is only a fraction.} In certain cases resource isolation is already deployed; cellular networks for example use separate queues for each user.

3. Towards a new architecture?

The above trends suggest that in-network resource isolation may be increasingly deployed as a consequence of the continuous tussle between applications optimising selfishly and ISPs protecting their infrastructure with means that are feasible to deploy. We argue that this process is actually (slowly) evolving the Internet towards a fundamentally better architecture: end-hosts focus on optimising their desired performance, while the network's job is to isolate resources across users. This architecture offers a clean separation of concerns. End-hosts in this architecture deal with what they can directly observe and control, leading to simple, clean designs and the flexibility to express their heterogenous objectives and even safely evolve to new end-to-end protocols. Similarly, the network deals with what it directly observes and cares about, i.e., competition between users or flows. Our evaluation has already demonstrated the flexibility that results from this architecture.

While there are users of selfish transports and there is space for them in controlled networks it will be hard to get something selfish widely used considering the slow adoption of congestion control.

It would be nice to see quantification of NewReno TCP behavior against PCC, both short and long lived connections and how they fair in such an environment. If you want to convince the world to switch to your approach you'll need to do that.

At the end most people will just use the default of their OS and will never even consider what happens underneath so you'll need to convince the Linux kernel and BSDs that your approach is better overall and viable for the wide internet.

One thing I failed to take account before is that most transfers are affected by the congestion control on the server and less so on the client (besides uploads) so you really need to focus on the OS vendors to implement PCC in their OS and possibly make it a default. Linux has a configurable congestion control mechanism and I added H-TCP to it back at the time. Wasn't too much work either.

I don't really buy the ISP response argument, they will fail to do it early enough and will only get to it when things break severely enough and then their response will be overly aggressive.

It would be useful however to maybe add a mode detection to PCC that will throttle it somewhat to not take full bandwidth in order to leave some space for legacy TCP. It will add complexity to PCC and this needs to be balanced but at the end you'll need to do what it takes to get PCC beyond closed networks and into the internet and you have many gatekeepers to convince.

I think this is a really good idea. Where TCP will often just multiplicatively decrease sending rate at every dropped packet (which happens 0.5% to 1% of the time if the network is normal), PCC does not react immediately but rather feeds this into a learning algorithm, which also learns from doing frequent experiments increasing/decreasing the sending rate and measuring the impact on loss rate, throughput and latency.

Essentially, if I understand it correctly, it's just measuring loss rate, throughput and latency and iteratively adjusting sending rate in response. I think it's very similar to LEDBAT and delay-based congestion control except that it's not just taking delay from bufferbloat into account, but also loss rate and throughput, which makes sense if the goal is to optimize these.

Thank you!

PCC measures these performance metrics and combine them via a utility function describing the data delivery goal, which can be "high throughput, low loss rate, don't care latency" or "high throughput, low latency" as we evaluated in Sec. 4.4 in the full paper

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact