
Let's make TCP faster - flardinois
http://googlecode.blogspot.com/2012/01/lets-make-tcp-faster.html
======
Karellen
Hmmmm....the article prelude, and points 1 and 3, and the rationale document
linked for point 2, all seem to be about optimising TCP for HTTP/the Web.

The thing is, a heck of a lot more runs over the Internet/TCP than just
HTTP/the web. Also, it can very well be argued that a lot of the "end-user"
perceived problems they are trying to fix (e.g. HTTP total request-response
round trip latency) are acutally problems with HTTP, rather than TCP - notably
the fact that for "small" web requests all HTTP effectively does is re-
implement a datagram protocol (albeit with larger packets than UDP) on top of
TCP, with all the consequent overhead of setting up and tearing down a TCP
connection.

It's an interesting set of fixes. But are they the right fixes, at the right
level? Would moving to SPDY instead of HTTP fix the problems better, at a more
appropriate level? With less chance of impacting all the other protocols that
run (and are yet to run) over TCP?

~~~
forcefsck
Furthermore 1,2 and 3 seem to make an impact only in establishing the initial
connection, which is not that much time, even 40% latency decrease would be
less than half a second gain in most situations, and if I got it right, the
article states in point 3 that only 33% of http traffic is preceded with a new
connection establishment.

~~~
Flenser
I'm guessing googlebot creates a lot of initial connections and if this was
implemented widely it would speed up crawling.

------
giulivo
I found this part to be the real great news:

 _All our work on TCP is open-source and publicly available. We disseminate
our innovations through the Linux kernel, IETF standards proposals, and
research publications._

------
ajross
OK, dumb question which I'm too lazy to look up for myself: what is TCP Fast
Open, and how is it different from T/TCP? My vague memory is that the latter
was dropped because allocating port numbers without requiring an explicit
round trip simply could not be made robust vs. DDOS attacks. What tricks is
TFO using that T/TCP didn't?

( _edit: Not so lazy after all I guess. The draft RFC
here:<http://tools.ietf.org/html/draft-cheng-tcpm-fastopen-00> and after a
very quick perusal I don't see an attempt to solve the DOS problem either. It
seems like it just requires apps to handle the transactions really fast and
then close the connection?_)

~~~
nzmsv
The "Fast Open Cookies" are designed to protect against the DDoS attacks.
These are acquired by the client on the first TFO connection. Subsequent SYN
packets reuse the cookie. The number of outstanding TFO cookies is limited.
The paper on TCP Fast Open explains this in more detail than the RFC
(<http://research.google.com/pubs/pub37517.html>)

------
JoeAltmaier
Lots of things about TCP&Co are stale, and don't work well in a modern
network. That paper covers connection establishment. Other issues include
network address establishment, device discovery and LAN broadcast.

In my last job creating mobile wireless drivers, we had a problem with
wireless roaming. TCP/DHCP are set up assuming IP address establishment is a
very infrequent operation. Typically it could take several seconds, which is
fine if it only happens at boot or when a human trips over a cable and plugs
it back in.

But wireless devices 'plug back in' each time they roam to a new AP. In an
industrial environment (warehouse, 60 APs installed over several acres,
forklift driving 20MPH) you may need to roam every second or so.

Its time to examine every aspect of TCP for large (huge) installations, very
frequent device discovery (power-save in handheld devices), rapidly changing
network topologies and so on.

~~~
IgorPartola
Apparently Mac OS has a somewhat non-standard way to join networks with DHCP
address assignment very fast. Otherwise, I agree.

~~~
modeless
I don't understand why LAN DHCP address assignment is so slow. RTT to the DHCP
server is almost always in the single-digit millisecond range, so why does
DHCP often take multiple seconds? Can someone explain this?

~~~
w1nk
It doesn't have to be slow. It can be (especially in larger networks) due to
spanning tree on the switches. Check out:
[http://serverfault.com/questions/102346/dhcp-server-slow-
to-...](http://serverfault.com/questions/102346/dhcp-server-slow-to-give-out-
ips)

------
tmcw
I hope that this really actually helps everyone. SPDY has been in Chrome & on
Google Maps and such for a long time, but not elsewhere: it's disabled on
Firefox, unavailable on Safari and the like. And it's not implemented
elsewhere: node-spdy is getting awesome but has taken a while to get there.
Working for a place that could really benefit from something like SPDY, it
seemed a bummer that only a duo of competitors products would work with an
open protocol, for lack of documentation, interest, or what-have-you.

~~~
modeless
SPDY seems poised for widespread adoption. It's only a matter of time before
Firefox enables it, and the combined share of Chrome and Firefox is now over
50%. That should spur server adoption, and once it starts affecting benchmark
scores the other browsers will be scrambling to implement it.

------
wazoox
Of course I don't know much about this, but I find the first call to action a
bit surprising:

 _1\. Increase TCP initial congestion window to 10 (IW10)._

It seems contradictory with the general concept that too much buffering harms
latency and may actually be aggravating congestion:
<http://queue.acm.org/detail.cfm?id=2071893>

~~~
wmf
Bufferbloat is caused by buffering hundreds of packets, not 10. Bandwidth-
delay product has increased so much that the initial congestion window also
needs to increase so that TCP can ramp up in a reasonable time.

~~~
ck2
Their own paper shows that it's still a controversial issue and after a
certain point decreases performance.

~~~
sp332
This paper seems unreservedly in favor of larger windows.
<http://research.google.com/pubs/pub36640.html>

_Based on our large scale experiments, we are pursuing efforts in the IETF to
standardize TCP’s initial congestion window to at least ten segments.
Preliminary experiments with even higher initial windows show indications of
beneﬁting latency further while keeping any costs to a modest level. Future
work should focus on eliminating the initial congestion window as a manifest
constant to scale to even large network speeds and Web page sizes._

~~~
ck2
I could be reading it wrong, but I think I see some issues:

[https://docs.google.com/gview?url=http://www.cs.helsinki.fi/...](https://docs.google.com/gview?url=http://www.cs.helsinki.fi/u/ijjarvin/ietf79/slides.pdf)

    
    
       IW10, while improving elapsed times, imposes higher queuing delay than IW3
       However, if self-congesting, IW3 is more aggressive in terms of queuing delay
       AQM (RED) failed to control the increase in the queuing delay

------
necro
2 years ago we were discussing a few of the direct advantages of this in a
comment here <http://news.ycombinator.com/item?id=1143317> including
tcp_slow_start_after_idle which also interacts with icwnd.

Also it's much easier as of late to get the benefit from a larger initial
cwnd. Back then you needed to recompile the kernel with source tweaks, now you
just use a backport or depending on your distro version you already have the
benefit as kernel 2.6.39 has the change...
<http://kernelnewbies.org/Linux_2_6_39>

~~~
youngtaff
If you need it IW10 can now be implemented on Windows Server 2008 R2 see -
[http://www.andysnotebook.com/2011/11/increasing-the-tcp-
init...](http://www.andysnotebook.com/2011/11/increasing-the-tcp-initial-
congestion-window-on-windows-2008-server-r2.html)

Jim Gettys article "IW10 Considered Harmful" is worth a read too -
[http://tools.ietf.org/html/draft-gettys-iw10-considered-
harm...](http://tools.ietf.org/html/draft-gettys-iw10-considered-harmful-00)

------
vy8vWJlco
TCP fast open (TFO) effectively fires data in the blind in the establishment
phase and then handles the timeout gracefully. That sounds like vanilla UDP
(or your favorite best-effort protocol) to me.

~~~
X-Istence
Except that it is handled by the kernel, rather than by the program itself.

~~~
vy8vWJlco
Do you mean that being ring-0 makes TCP faster, or insulates existing code?

~~~
humbledrone
I think grandparent means that the kernel's TCP implementation handles
subsequent retransmits, etc, whereas with UDP that's all up to the
application. Maybe a TFO SYN is somewhat equivalent to a single UDP packet,
but every packet after that gets to take advantage of TCP's reliability, which
is obviously not handled by UDP.

~~~
vy8vWJlco
I guess my point is: doesn't demoting the front-end of TCP constitute
admission that it should have been carried by UDP in the first place?

TFO basically says the handshake is really just UDP, and the TCP connection
doesn't really exist except as a byproduct of an ongoing UDP-based exchange.
The 3-way handshake is just the first 3 messages in that chain, and the TCP
channel doesn't exist until that many have occured, but the unreliable
"phantom" UDP channel doesn't go away once reliability is established. The
head/outstanding link in the chain is always unreliable.

I think TCP is a strange mental error: nobody ever needed a to make TCP a real
transport protocol next to ICMP and UDP, etc. It didn't need an IP transport
number of it's own. TCP is just the idea of "reliability" and can exist
entirely in software (and for that reason should, since it's one less thing to
maintain in the kernel). UDP is enough. (and ICMP, for example addresses a
different problem: out-of-band network feedback.)

Existing code would work the same. I could still ask for a "TCP" connection,
and start sending with the real data carried by UDP and benefit from 1 round
trip if I don't need to send more.

TFO does that too -- allows some of the unreliability to creep in in the hope
that the system is reliable enough that it's worth it -- but it also adds
complexity to the existing name "TCP", and I'm not convinced that's good or
worth it. TFO solves the right problem in the wrong place IMHO.

~~~
X-Istence
I don't see how it adds complexity. Instead of sending just 3 packets we now
send 10. Now we can fit more data into the initial startup window.

There is no additional complexity. This is baked into the kernel.

TCP being in user land in software would be absolutely terrible. There would
be many different implementations, it wouldn't be standardised and the fact of
the matter is that I as an application developer don't want to have to create
TCP on top of UDP. I want to be able to say connect here and establish a
connection and make sure my data makes it.

~~~
vy8vWJlco
Sorry if I wasn't clear. I'm not talking about the window size, just TFO.

I agree, you as an application developer shouldn't have to recreate TCP. The
code already exists, I'm just suggesting that it shouldn't live in the
kernel/OS. There's no difference to users or developers at the application
layer. (I think evolutionary pressure is a good thing, but there's no reason
not to preserve the interfaces for compatibility.)

<?ego_rant("on")?> That said, since we build towers up -- and TCP has already
been working for a long time -- it may be against the grain to redirect growth
towards the perimeter. It feels retrograde and less snazzy. But if we don't
take advantage of the land below us too, the building topples/the goal
suffers. Examples would include redundant encapsulation of frames, unnecessary
round trips, etc. Start imagining tunneling TCP over TCP (if you've ever
forwarded X11 connections over SSH over a 56k modem, you probably know what
that would be like). It begins to feel like we're base64-encoding everything.

I think there's an even more important example to think about though.

People jump through major hoops to make their webservers incredibly fast, and
able to handle 100s of thousands of connections per second. Worker thread
pools, I/O completion ports, you name it. Unfortunately, webservers are
serving up TCP connections and TCP needs state to be reliable (otherwise it's
just UDP). Unfortunately since TCP is being used to transfer HTTP, which is
supposed to be stateless, these goals work against each other.

Imagine how fast a webserver might be if it didn't have to hold onto
connection data at all... TFO alone doesn't get you there, it just gets you
back to 1 round trip. <?ego_rant("off")?>

I am saying that we wouldn't need to "invent" TFO at this late date if we had
started from there (no time like the present). TFO is like digging up though.
:)

------
newman314
I can't see to find kernel patches for #2 or #3. Anyone else have better luck?

Also, I would like to see more emphasis given to research on mobile networks,
which is my area of interest. Perf for large stable networks is not the same
for choppy 3G-ish mobile networks.

~~~
newman314
Ninja commenting here...

After more digging, I was able to find this. If you scroll all the way to the
end, there is some verbiage about just setting TCP_RTO_MIN to 1. However, the
author claims this causes issues with delayed ACK unless another (missing)
patch is applied.

<https://github.com/vrv/linux-microsecondrto> [http://www.pdl.cmu.edu/PDL-
FTP/Storage/sigcomm147-vasudevan....](http://www.pdl.cmu.edu/PDL-
FTP/Storage/sigcomm147-vasudevan.pdf)

------
DrCatbox
Will this effect other uses of TCP than HTTP? Like IRC or SSH?

~~~
lorax
Most of these changes seem focused on improving small, short-lived
connections. IRC and SSH are mostly long-lived and I don't think it will have
a noticeable impact on them. For large bulk data transfers (like sep or ftp)
the Proportional Rate Reduction for TCP (PRR) should help.

------
exor
Why do us small business owners care about optimizing TCP?

Why does Google? Because web search is behind billions of dollars of revenue.
Micro-optimizations matter to them.

