

PCC: Re-architecting Congestion Control for Consistent High Performance [pdf] - adamnemecek
http://arxiv.org/pdf/1409.7092v1.pdf

======
Animats
That's not an unreasonable approach. When I came up with the original approach
to congestion control in TCP 30 years ago (see Internet RFCs 896 and 970), TCP
behavior under load was so awful that connections would stall out and fail.
The ad-hoc solutions we put in back then at least made TCP well behaved under
load.

In the early days of TCP/IP, it was clear that there was no known solution to
congestion in the middle of a big datagram network. There still isn't. Out
near the edges, we can use fair queuing, which I invented and named, and which
is offered in most Cisco routers for links below Ethernet speeds. We still
don't know what to do in the middle of the network. "Random Early Drop" is an
ugly hack, but it's what we have in the big routers. What saved us for years
is that fiber backbones had so much more bandwidth than the last mile consumer
connections that the congestion stayed near the ends. Then came streaming HD
video.

The treatment of congestion went off in a different direction than I'd
expected. Originally, ICMP Source Quench messages were used for congestion
control. Routers were expected to tell endpoints when to slow down. But they
weren't implemented in Berkeley's networking stack, and gradually were phased
out. (They would have been a denial of service attack vector, anyway.)
Instead, congestion was inferred from packet loss. Even worse, congestion is
inferred from not having a packet acknowledged, so you don't know which
direction is congested. That's not a particularly good approach, but it's what
we've got.

I originally took the position that, as an endpoint, you were entitled to one
packet in flight. That is, the network should be provisioned so that every
connection can have one packet in transit without packet loss. If you have
that much capacity, and you have fair queuing in routers, you only lose
packets if you have more than one in flight. You're thus competing with
yourself for bandwidth, and an "increase slowly, back off big on packet loss"
strategy works fine.

In a world with Random Early Drop, not enough fair queuing, and big FIFO
buffers (someone called this "bufferbloat"), you get both packet loss and
excessive transit time. It's hard for the endpoints to adapt to that no matter
what they do.

Anyway, there's nothing wrong with using a smarter algorithm to calculate
optimal window sizes and retransmission times. As long as it evaluates how
well it's doing and feeds that back into the control algorithm in a rational
way, that's fine. Today, you could probably afford to run machine learning
algorithms in routers. So go for it!

If you're going to do that, it's a good time to fix delayed ACK as well. As is
well known, the interaction between delayed ACKs and the Nagle algorithm is
awful. This is a historical accident; I implemented one, the Berkeley guys
implemented the other, and by the time I found out I was out of networking and
involved with a small startup called Autodesk. So that never got fixed. Here's
how to think about that. A delayed ACK is a bet. You're betting that there
will be a reply, upon with an ACK can be piggybacked, before the fixed timer
runs out. If the fixed timer runs out, and you have to send an ACK as a
separate message, you lost the bet. Current TCP implementations will happily
lose that bet forever without turning off the ACK delay. That's just wrong.

The right answer is to track wins and losses on delayed and non-delayed ACKs.
Don't turn on ACK delay unless you're sending a lot of non-delayed ACKs
closely followed by packets on which the ACK could have been piggybacked. Turn
it off when a delayed ACK has to be sent.

I should have pushed for this in the 1980s.

John Nagle

~~~
dtaht3
John, if you care to get back in touch with Jim Gettys and I, we can update
you on where things stand on the whole bufferbloat effort.

FQ on modern cisco routers is generally quite crippled, and even that much is
not available elsewhere.

However, on the whole, the fq + aqm situation along the edge of the internet
is looking up, with nearly every third party home router firmware
incorporating fq_codel (notably openwrt and derivatives), and it being
integral to streamboost, and other QoS systems like those in the netgear X4.

Some examples of the kind of results we get now from these new algos:

[http://snapon.lab.bufferbloat.net/~cero2/jimreisert/results....](http://snapon.lab.bufferbloat.net/~cero2/jimreisert/results.html)

[http://burntchrome.blogspot.com/2014/05/fixing-
bufferbloat-o...](http://burntchrome.blogspot.com/2014/05/fixing-bufferbloat-
on-comcasts-blast.html)

I really liked the PCM paper (wanted code tho) for clearly identifying the
parameter space modern tcps need to operate in. FQ everywhere, or nearly
everywhere, would make the tcp's job far easier.

There is an aqm/packet scheduling working group in the ietf in the process of
standardization, etc.

------
modong
Thank you all for your interests!

We just updated a full-length version of PCC to arXiv, but it takes some time
for arXiv to process and post online. The full version discussed our approach
in a more detail and, I believe, clearer way. We have more evaluation for more
challenging network scenarios and also evaluated PCC with different utility
functions expressing different data transfer objectives.

So here is a preview:
[http://web.engr.illinois.edu/~modong2/pcc_fullpaper.pdf](http://web.engr.illinois.edu/~modong2/pcc_fullpaper.pdf)
During this two weeks, we are writing some simple docs and making the file-
transfer SOFTWARE more user-friendly. We also plan to release our
experiments/demo CODE so one can easily play with and compare PCC with
existing TCP variants.

Code is available at
[https://github.com/modong/pcc](https://github.com/modong/pcc), but currently
it is under some refactoring and some compile/run instructions are outdated.
We will have a more formal release in two weeks.

Thanks

-Mo

~~~
jorangreef
Thanks for the preview Mo, I am reading through it now. Would there be any
further advantage to measuring one-way-delay as opposed to measuring RTT?

------
baruch
They lost me when it became obvious that PCC doesn't interoperate very well
with "normal" TCP (let's call it reno-tcp[1]). TCP is not the optimal solution
and PCC may very well be better but there is just no way to use a new and
unfair-to-reno and ignore all consequences. There must be some safeguards to
handle all the older systems (be they routers, embedded systems, IoT or just
plain old Win XP machines).

It is an interesting approach and an interesting research that drives us
forward but it is in no way a practical approach.

They implemented this thing in UDT, if they can come up with a Linux patch
that would be awesome for real world testing and implementation verification.

[1] Reno is the original TCP congestion control, New-Reno was the most common
adaptation for a long time. These days CUBIC is the default in Linux, Windows
has their own algorithm. All of these behave in a very similar way to the old
Reno, they all have the basic structure of AIMD - Additional Increase,
Multiplicative Decrease (slowly increasing on no signal and fast decrease on
loss signal). in the case of CUBIC and H-TCP the slow increase is not very
slow and the fast decrease was slowed down to handle very high speed links and
high latency (high bandwidth delay product).

~~~
modong
This actually can lead to a very interesting discussion about TCP
friendliness, which I didn't have space to elaborate in paper:

1\. The Trend of Selfish Applications: TCP's poor performance is not only an
object of research study. More and more applications have chosen to ignore the
notion of TCP friendliness and use selfish application-level optimization
techniques such as opening many parallel connections (as in modern web
browsers and GridFTP), contacting multiple servers in parallel (in
BitTorrent), or modifying TCP to use a very large initial window size. In
fact, some CDN providers use selfish optimization of TCP to accelerate data
transfer from CDN servers to end users on the public Internet. This trend
indicates that often, applications prefer higher performance regardless of
potential TCP unfriendliness. Although we recommend use of PCC only when it is
isolated from legacy TCP, it is likely that due to PCC's robust high
performance, some or many users (especially those suffering in network
environments that are challenging for TCP) will use PCC as another selfish
performance optimization technique. Legacy TCP users, on the other hand, will
unfortunately suffer from performance degradation when competing with PCC and
other selfish optimization techniques under FIFO queue based network.

What happens next in this tussle? One possibility is a slow arms race towards
selfish high performance rate control. If done through adoption of PCC using
our safe utility function, it will actually avoid a ``tragedy of the commons''
situation as we have shown. But more immediately, network owners (ISPs) will
likely take action to avoid complaints.

2.ISP response to selfishness:

o maintain legacy TCP users' satisfaction, network operators can respond to
selfishness by deploying flow-level or customer-level resource isolation at
the point of congestion. Of course this requires hardware support, but it is
feasible for two reasons. First, unlike protocols like XCP and RCP, an ISP can
deploy resource isolation as a local change at an individual congested device,
requiring no signaling or packet header changes or explicit feedback to end-
hosts. Second, the belief that per-flow FQ is infeasible for high speed
switching hardware is now a misconception, with merchant silicon supporting
hundreds of thousands of flows at hundred-gigabit speeds. Even by 1999, early
implementations used limited hardware (less than 33Mhz FPGA or CMOS) to
support FQ on multi-gigabit links. Today, an off-the-shelf EZchip NP-4
supports 256K flows with 100 Gbit throughput, VSC2202 supports 500K flows with
Gigabit Ethernet throughput, and so on. We obtained a quote of $25 for the
VSC2202, of which the cost of the FQ logic itself is only a fraction.} In
certain cases resource isolation is already deployed; cellular networks for
example use separate queues for each user.

3\. Towards a new architecture?

The above trends suggest that in-network resource isolation may be
increasingly deployed as a consequence of the continuous tussle between
applications optimising selfishly and ISPs protecting their infrastructure
with means that are feasible to deploy. We argue that this process is actually
(slowly) evolving the Internet towards a fundamentally better architecture:
end-hosts focus on optimising their desired performance, while the network's
job is to isolate resources across users. This architecture offers a clean
separation of concerns. End-hosts in this architecture deal with what they can
directly observe and control, leading to simple, clean designs and the
flexibility to express their heterogenous objectives and even safely evolve to
new end-to-end protocols. Similarly, the network deals with what it directly
observes and cares about, i.e., competition between users or flows. Our
evaluation has already demonstrated the flexibility that results from this
architecture.

~~~
baruch
While there are users of selfish transports and there is space for them in
controlled networks it will be hard to get something selfish widely used
considering the slow adoption of congestion control.

It would be nice to see quantification of NewReno TCP behavior against PCC,
both short and long lived connections and how they fair in such an
environment. If you want to convince the world to switch to your approach
you'll need to do that.

At the end most people will just use the default of their OS and will never
even consider what happens underneath so you'll need to convince the Linux
kernel and BSDs that your approach is better overall and viable for the wide
internet.

One thing I failed to take account before is that most transfers are affected
by the congestion control on the server and less so on the client (besides
uploads) so you really need to focus on the OS vendors to implement PCC in
their OS and possibly make it a default. Linux has a configurable congestion
control mechanism and I added H-TCP to it back at the time. Wasn't too much
work either.

I don't really buy the ISP response argument, they will fail to do it early
enough and will only get to it when things break severely enough and then
their response will be overly aggressive.

It would be useful however to maybe add a mode detection to PCC that will
throttle it somewhat to not take full bandwidth in order to leave some space
for legacy TCP. It will add complexity to PCC and this needs to be balanced
but at the end you'll need to do what it takes to get PCC beyond closed
networks and into the internet and you have many gatekeepers to convince.

------
jorangreef
I think this is a really good idea. Where TCP will often just multiplicatively
decrease sending rate at every dropped packet (which happens 0.5% to 1% of the
time if the network is normal), PCC does not react immediately but rather
feeds this into a learning algorithm, which also learns from doing frequent
experiments increasing/decreasing the sending rate and measuring the impact on
loss rate, throughput and latency.

Essentially, if I understand it correctly, it's just measuring loss rate,
throughput and latency and iteratively adjusting sending rate in response. I
think it's very similar to LEDBAT and delay-based congestion control except
that it's not just taking delay from bufferbloat into account, but also loss
rate and throughput, which makes sense if the goal is to optimize these.

~~~
modong
Thank you!

PCC measures these performance metrics and combine them via a utility function
describing the data delivery goal, which can be "high throughput, low loss
rate, don't care latency" or "high throughput, low latency" as we evaluated in
Sec. 4.4 in the full paper

