We just updated a full-length version of PCC to arXiv, but it takes some time for arXiv to process and post online. The full version discussed our approach in a more detail and, I believe, clearer way. We have more evaluation for more challenging network scenarios and also evaluated PCC with different utility functions expressing different data transfer objectives.
So here is a preview:
During this two weeks, we are writing some simple docs and making the file-transfer SOFTWARE more user-friendly. We also plan to release our experiments/demo CODE so one can easily play with and compare PCC with existing TCP variants.
Code is available at https://github.com/modong/pcc, but currently it is under some refactoring and some compile/run instructions are outdated. We will have a more formal release in two weeks.
In the early days of TCP/IP, it was clear that there was no known solution to congestion in the middle of a big datagram network. There still isn't. Out near the edges, we can use fair queuing, which I invented and named, and which is offered in most Cisco routers for links below Ethernet speeds. We still don't know what to do in the middle of the network. "Random Early Drop" is an ugly hack, but it's what we have in the big routers. What saved us for years is that fiber backbones had so much more bandwidth than the last mile consumer connections that the congestion stayed near the ends. Then came streaming HD video.
The treatment of congestion went off in a different direction than I'd expected. Originally, ICMP Source Quench messages were used for congestion control. Routers were expected to tell endpoints when to slow down. But they weren't implemented in Berkeley's networking stack, and gradually were phased out. (They would have been a denial of service attack vector, anyway.) Instead, congestion was inferred from packet loss. Even worse, congestion is inferred from not having a packet acknowledged, so you don't know which direction is congested. That's not a particularly good approach, but it's what we've got.
I originally took the position that, as an endpoint, you were entitled to one packet in flight. That is, the network should be provisioned so that every connection can have one packet in transit without packet loss. If you have that much capacity, and you have fair queuing in routers, you only lose packets if you have more than one in flight. You're thus competing with yourself for bandwidth, and an "increase slowly, back off big on packet loss" strategy works fine.
In a world with Random Early Drop, not enough fair queuing, and big FIFO buffers (someone called this "bufferbloat"), you get both packet loss and excessive transit time. It's hard for the endpoints to adapt to that no matter what they do.
Anyway, there's nothing wrong with using a smarter algorithm to calculate optimal window sizes and retransmission times. As long as it evaluates how well it's doing and feeds that back into the control algorithm in a rational way, that's fine. Today, you could probably afford to run machine learning algorithms in routers. So go for it!
If you're going to do that, it's a good time to fix delayed ACK as well. As is well known, the interaction between delayed ACKs and the Nagle algorithm is awful. This is a historical accident; I implemented one, the Berkeley guys implemented the other, and by the time I found out I was out of networking and involved with a small startup called Autodesk. So that never got fixed. Here's how to think about that. A delayed ACK is a bet. You're betting that there will be a reply, upon with an ACK can be piggybacked, before the fixed timer runs out. If the fixed timer runs out, and you have to send an ACK as a separate message, you lost the bet. Current TCP implementations will happily lose that bet forever without turning off the ACK delay. That's just wrong.
The right answer is to track wins and losses on delayed and non-delayed ACKs. Don't turn on ACK delay unless you're sending a lot of non-delayed ACKs closely followed by packets on which the ACK could have been piggybacked. Turn it off when a delayed ACK has to be sent.
I should have pushed for this in the 1980s.
FQ on modern cisco routers is generally quite crippled, and even that much is not available elsewhere.
However, on the whole, the fq + aqm situation along the edge of the internet is looking up, with nearly every third party home router firmware incorporating fq_codel (notably openwrt and derivatives), and it being integral to streamboost, and other QoS systems like those in the netgear X4.
Some examples of the kind of results we get now from these new algos:
I really liked the PCM paper (wanted code tho) for clearly identifying the parameter space modern tcps need to operate in. FQ everywhere, or nearly everywhere, would make the tcp's job far easier.
There is an aqm/packet scheduling working group in the ietf in the process of standardization, etc.
It is an interesting approach and an interesting research that drives us forward but it is in no way a practical approach.
They implemented this thing in UDT, if they can come up with a Linux patch that would be awesome for real world testing and implementation verification.
 Reno is the original TCP congestion control, New-Reno was the most common adaptation for a long time. These days CUBIC is the default in Linux, Windows has their own algorithm. All of these behave in a very similar way to the old Reno, they all have the basic structure of AIMD - Additional Increase, Multiplicative Decrease (slowly increasing on no signal and fast decrease on loss signal). in the case of CUBIC and H-TCP the slow increase is not very slow and the fast decrease was slowed down to handle very high speed links and high latency (high bandwidth delay product).
These are all not quite multi-TCP and there was no real mention of fairness of PCC against a single TCP stream, even of the CUBIC form, not to mention just plain old NewReno.
I can't tell if it's really unfair as I didn't test it myself, but they hint that there is a wide gap between PCC and NewReno or even CUBIC. The algorithm as I understand it also seems to imply that PCC will dominate a NewReno stream with ease, especially when PCC is an established connection and the NewReno is only starting up as the PCC will feel a small bump of losses, probably not do much about it (i.e. not reduce rate by much) and the NewReno generated packet losses will fly under the radar of PCC. At least that's the feeling I get from reading the paper, again, no idea about actual behavior but I'm no longer paid to look at packet traces of congestion control algorithms. (Shame, it was fun while it lasted :-)
When it comes to bufferbloat, a single TCP flow can dominate all others and be highly unfair because TCP's congestion algorithms are so slow to react (they often only react to packet loss and not latency).
LEDBAT and delay-based congestion algorithms were designed to handle bufferbloat well, and PCC is essentially a superset of these, taking into account loss rate and throughput in addition to just delay.
1. The Trend of Selfish Applications:
TCP's poor performance is not only an object of research study. More and more applications have chosen to ignore the notion of TCP friendliness and use selfish application-level optimization techniques such as opening many parallel connections (as in modern web browsers and GridFTP), contacting multiple servers in parallel (in BitTorrent), or modifying TCP to use a very large initial window size. In fact, some CDN providers use selfish optimization of TCP to accelerate data transfer from CDN servers to end users on the public Internet. This trend indicates that often, applications prefer higher performance regardless of potential TCP unfriendliness. Although we recommend use of PCC only when it is isolated from legacy TCP, it is likely that due to PCC's robust high performance, some or many users (especially those suffering in network environments that are challenging for TCP) will use PCC as another selfish performance optimization technique. Legacy TCP users, on the other hand, will unfortunately suffer from performance degradation when competing with PCC and other selfish optimization techniques under FIFO queue based network.
What happens next in this tussle? One possibility is a slow arms race towards selfish high performance rate control. If done through adoption of PCC using our safe utility function, it will actually avoid a ``tragedy of the commons'' situation as we have shown. But more immediately, network owners (ISPs) will likely take action to avoid complaints.
2.ISP response to selfishness:
o maintain legacy TCP users' satisfaction, network operators can respond to selfishness by deploying flow-level or customer-level resource isolation at the point of congestion. Of course this requires hardware support, but it is feasible for two reasons. First, unlike protocols like XCP and RCP, an ISP can deploy resource isolation as a local change at an individual congested device, requiring no signaling or packet header changes or explicit feedback to end-hosts. Second, the belief that per-flow FQ is infeasible for high speed switching hardware is now a misconception, with merchant silicon supporting hundreds of thousands of flows at hundred-gigabit speeds. Even by 1999, early implementations used limited hardware (less than 33Mhz FPGA or CMOS) to support FQ on multi-gigabit links. Today, an off-the-shelf EZchip NP-4 supports 256K flows with 100 Gbit throughput, VSC2202 supports 500K flows with Gigabit Ethernet throughput, and so on. We obtained a quote of $25 for the VSC2202, of which the cost of the FQ logic itself is only a fraction.} In certain cases resource isolation is already deployed; cellular networks for example use separate queues for each user.
3. Towards a new architecture?
The above trends suggest that in-network resource isolation may be increasingly deployed as a consequence of the continuous tussle between applications optimising selfishly and ISPs protecting their infrastructure with means that are feasible to deploy. We argue that this process is actually (slowly) evolving the Internet towards a fundamentally better architecture: end-hosts focus on optimising their desired performance, while the network's job is to isolate resources across users. This architecture offers a clean separation of concerns. End-hosts in this architecture deal with what they can directly observe and control, leading to simple, clean designs and the flexibility to express their heterogenous objectives and even safely evolve to new end-to-end protocols. Similarly, the network deals with what it directly observes and cares about, i.e., competition between users or flows. Our evaluation has already demonstrated the flexibility that results from this architecture.
It would be nice to see quantification of NewReno TCP behavior against PCC, both short and long lived connections and how they fair in such an environment. If you want to convince the world to switch to your approach you'll need to do that.
At the end most people will just use the default of their OS and will never even consider what happens underneath so you'll need to convince the Linux kernel and BSDs that your approach is better overall and viable for the wide internet.
One thing I failed to take account before is that most transfers are affected by the congestion control on the server and less so on the client (besides uploads) so you really need to focus on the OS vendors to implement PCC in their OS and possibly make it a default. Linux has a configurable congestion control mechanism and I added H-TCP to it back at the time. Wasn't too much work either.
I don't really buy the ISP response argument, they will fail to do it early enough and will only get to it when things break severely enough and then their response will be overly aggressive.
It would be useful however to maybe add a mode detection to PCC that will throttle it somewhat to not take full bandwidth in order to leave some space for legacy TCP. It will add complexity to PCC and this needs to be balanced but at the end you'll need to do what it takes to get PCC beyond closed networks and into the internet and you have many gatekeepers to convince.
Essentially, if I understand it correctly, it's just measuring loss rate, throughput and latency and iteratively adjusting sending rate in response. I think it's very similar to LEDBAT and delay-based congestion control except that it's not just taking delay from bufferbloat into account, but also loss rate and throughput, which makes sense if the goal is to optimize these.
PCC measures these performance metrics and combine them via a utility function describing the data delivery goal, which can be "high throughput, low loss rate, don't care latency" or "high throughput, low latency" as we evaluated in Sec. 4.4 in the full paper