

DCTCP: TCP optimized for lower latency in Data Centers - xtacy
http://www.stanford.edu/~alizade/Site/DCTCP.html

======
vrv
This work is really cool and is attacking an important problem in a lot of
datacenter networks. It's also great that they've released the source code
too. As someone who's worked on this problem, I encourage those interested in
reading up more about this.

Here are a few papers on the problem in chronological order:

1\. Original paper briefly talking about the "incast" TCP problem in storage
environments: <http://portal.acm.org/citation.cfm?id=1049998>

2\. Our follow up work on that problem from a few years back: Measurement
paper ( <http://portal.acm.org/citation.cfm?id=1364825> ) and our initial
solution ( <http://portal.acm.org/citation.cfm?id=1592604> ) with a
microsecond retransmission Linux patch here: <https://github.com/vrv/linux-
microsecondrto>

3\. Another paper talking about Incast in Datacenter environments, focusing on
a different form of the workload:
<http://portal.acm.org/citation.cfm?id=1592693>

4\. RAMCloud - a project that briefly talks about the need for low-latency
transports: <http://www.stanford.edu/~ouster/cgi-bin/papers/ramcloud.pdf>

5\. ICTCP - paper from last year that tries to solve the Incast problem using
receiver advertised window algorithms: [http://conferences.sigcomm.org/co-
next/2010/CoNEXT_papers/13...](http://conferences.sigcomm.org/co-
next/2010/CoNEXT_papers/13-Wu.pdf)

DCTCP tries to go beyond solving the Incast problem and focuses on trying to
control buffer occupancy in datacenter environments that contain both long
flows and bursty flows.

I think these papers all assume lossy link layers (Ethernet), but there are
standards and other technologies (Datacenter Ethernet, Myrinet, Infiniband)
that aim for lossless link layers to make the transport problem easier, but
come with various other drawbacks today (cost, compatibility, etc.). In the
meantime, I hope DCTCP or the microsecond TCP patch prove useful in solving
some of these problems.

~~~
xtacy
I would like to hear more from application programmers here, about their
experience in dealing with incast. For example, Facebook uses UDP to throttle
messages at the application layer to combat incast:
<https://www.facebook.com/note.php?note_id=39391378919>.

Also, a few catches with DCTCP:

1\. It's not completely an end host solution and requires ECN support from
switches, which should be widely available. Can someone pitch in about the
availability of ECN in their networks?

2\. DCTCP and plain TCP don't mix well, and hence not incrementally
deployable. DCTCP and TCP+ECN also don't mix well!

~~~
vrv
I'll let others chime in, but the DCTCP paper mentions "our application
reduces the amount of data each worker sends and employs jitter. Facebook,
reportedly, has gone to the extent of developing their own UDP-based
congestion control [29]."

I recently talked with someone on the Facebook memcached team about this
problem and they mentioned that moving to UDP has been useful, but if I'm not
mistaken, I believe they gave up on 100% in-order reliability in return for
low-latency.

------
antihero
I've always wondered if you could incorporate damping into TCP with something
like PID. So you'd have a transfer function to use the error rate as an input
and adjust the transfer rate based on PID. Perhaps more CPU overhead, but we
have amazing CPUs now.

~~~
xtacy
It's an interesting thought, and one could always make good use of more
feedback to make a control loop behave nicely with respect to throughput,
fairness, convergence time and stability.

There have been many improved congestion control algorithms proposed with this
train of thought: XCP[1], RCP[2] to name a few. One of the reasons why they
have not caught on is that they require a _lot_ of support from the network.

[1] XCP: <http://www.isi.edu/isi-xcp/> [2] RCP:
<http://yuba.stanford.edu/rcp/>

