
Data Center TCP: TCP Congestion Control for Data Centers - okket
https://tools.ietf.org/html/rfc8257
======
vitus
Is this the same as the SIGCOMM '10 DCTCP paper [0]? I see that the author
lists of both strongly draw from Microsoft, but there's no overlap otherwise.
If so, what's changed in the past 7 years? Why publish the RFC now, especially
since it's not on track to become an Internet standard?

Something called DCTCP has been in the Linux kernel since 2014 [1]; that
commit even cites what looks like an earlier draft of this RFC that dates back
to 2014, although that was on the standards track back then [2]. Why was that
effort seemingly abandoned?

[0] [https://people.csail.mit.edu/alizadeh/papers/dctcp-
sigcomm10...](https://people.csail.mit.edu/alizadeh/papers/dctcp-
sigcomm10.pdf)

[1]
[https://git.kernel.org/linus/e3118e8359bb7c59555aca60c725106...](https://git.kernel.org/linus/e3118e8359bb7c59555aca60c725106e6d78c5ce)

[2] [https://tools.ietf.org/html/draft-bensley-tcpm-
dctcp-00](https://tools.ietf.org/html/draft-bensley-tcpm-dctcp-00)

~~~
otterley
The RFC clearly states:

> This document describes DCTCP as implemented in Microsoft Windows Server
> 2012 [WINDOWS]. The Linux [LINUX] and FreeBSD [FREEBSD] operating systems
> have also implemented support for DCTCP in a way that is believed to follow
> this document. Deployment experiences with DCTCP have been documented in
> [MORGANSTANLEY].

> Why publish the RFC now, especially since it's not on track to become an
> Internet standard?

Presumably it's to guide future implementors who wish to attain compatibility
with the existing implementations.

------
paralelogram
Is this better than Google's BBR?

[https://cloudplatform.googleblog.com/2017/07/TCP-BBR-
congest...](https://cloudplatform.googleblog.com/2017/07/TCP-BBR-congestion-
control-comes-to-GCP-your-Internet-just-got-faster.html)

~~~
unmole
Entirely different usecases.

~~~
jchw
Really? Google appears to claim to be using BBR for intra-data-center
communication.

~~~
kev009
DCTCP is designed for.. data center networks. You have first party control and
can ensure ECN works end to end on your equipment.

BBR is designed for the "hostile internet" where you can't rely on ECN
marking, and basically tons of people are willingly and unwillingly plotting
against you.. middle boxes that do policing and shaping and just plain bizarre
things, routers that clear options, other worse/unfair congestion controls,
extreme variation in buffer sizes, etc

------
lwheelock
How is this practical considering known incompatibilities with traditional ECN
and most enterprise environments include inter connectivity with systems out
of the “controlled environment”?

The demand for this type of congestion control is seemingly driven by top of
rack topology used in cloud architecture.

It would then stand to reason that hardware manufacturers would better solve
variable queuing requirements in the switch than developing protocol support
known to have a major incompatibility and potentially introduce
interoperability problems between vendors in mixed networks.

~~~
jsnell
First, DCTCP will degrade gracefully if ECN isn't working. Ok, so you won't
get the early warning about congestion. But eventually congestion would cause
packet loss, and DCTCP would react to it appropriately.

Second, Linux allows setting the congestion control algorithm per-route. So
you could set up DCTCP for communicating with the IPs in the same data center,
and use the default CC algorithm for everything else. And what if you can't
use/don't want to use per-route settings? Well, you'll generally have two
classes of machines anyway. Frontends that can communicate with the outside
world, and backends that can't. So you could set up different congestion
control based on the role of the machine.

Solving this in the switches seems tricky. Sure, per-flow rather than global
or per-port queues could be used to solve the mice vs. elephants problem. But
it does not help with TCP incast unless you also add huge buffers. You want
switches to be simple, fast and cheap. A switch with per-flow queueing and
huge buffers seems like the opposite.

------
enos
How is this different than Explicit Congestion Notification (ECN)? ECN has
been in the kernel for many years and is half enabled by default.

~~~
YouKnowBetter
DCTCP improves congestion control by using ECN markings to estimate the extent
of congestion instead of just the presence of it. Given congestion markings on
all of the packets in a window, DCTCP will halve the window just like
traditional TCP does when it detects packet loss or when TCP+ECN does when it
sees a single ECN marking. If DCTCP sees fewer markings it will back off
proportionally less.

Link: [https://www.soe.ucsc.edu/sites/default/files/technical-
repor...](https://www.soe.ucsc.edu/sites/default/files/technical-reports/UCSC-
SOE-14-14.pdf)

~~~
tonykarkats
Yes, that is the idea. DCTCP just uses the ECN markings to adapt the
congestion window proportionally to the amount of congestion.

