
Why upgrading your Linux Kernel will make your customers much happier - sams99
http://samsaffron.com/archive/2012/03/01/why-upgrading-your-linux-kernel-will-make-your-customers-much-happier
======
ChuckMcM
TL;DR version - another new guy discovers TCP slow start, and not doing it,
wow major speedup! Now, imagine the whole Internet doing that, whoops.

I once peevishly pointed out that if you weren't required to stop at stop
signs your commute would go faster, but if _nobody_ was required to stop at
stop signs it would be slower because every other intersection would have an
accident blocking the way.

This is also very much true of TCP congestion control algorithms. And while a
few people not using them can get away with it, everyone not using them and
you will find your network latency goes from a median with a low standard
deviation, to a slightly lower median with a HUGE standard deviation.

One of the things that slow start does is it spreads the change in median
latency over a longer period of time. You can think of this intuitively where
each new connection starts slow and then gradually gets faster, until it is as
fast as it can be, and as more people start connections they start slow and
get faster, while the current connections get slightly slower to accomodate
the new traffic. The result is a non-chaotic adjustment of the network flow.

The converse is that everyone starts out going as fast as they can, they not
only overwhelm the node the node ends up getting massively congested for a
moment trying to sort things out. And of course IP doesn't care if you lose a
fragment, you'll eventually resend it. So now during this massive congestion
the re-transmits are causing more congestion. You get lots of pushback and
finally everyone is back to a level where the network is doing ok with it and
wham! a new connection opens up and everyone gets hosed again and backs off
again, and then ramps up again.

Moral of the story, if only _you_ don't do slow start you can be fast, if
_everyone_ doesn't start slowly, the network latency gets really unpredictable
and poor.

~~~
barrkel
I object to your metaphor; stop signs do slow down traffic. If every 4-way
stop sign junction had a roundabout (either full-sized or mini, depending on
available space), traffic wouldn't need to stop very often, overall throughput
would be much higher, and accidents would probably be even less.

4-way stop sign junctions were probably the most asinine, time-wasting, fuel-
wasting road control I found when I drove in the US.

~~~
vacri
Discussing this with a US friend, I have come to the conclusion that for
single-lane roads, a roundabout is superior to a 4-way stop as you only have
to watch one direction for traffic.

Once you get to multiple lanes, the answer is simple: both roundabouts and
4-way stops are inferior...

~~~
khafra
Yes, cloverleafs are necessary for >1 lane per direction.

~~~
Nick_C
Nah, we have plenty of 2 lane roundabouts here in Australia, and I'm sure the
UK does too. It's really a matter of what you're used to. They don't seem to
get used much in the US from what I see.

------
lloeki
This article gives the IW status on Windows and Linux. What is the status on
other systems (e.g Mac OS X, FreeBSD, Solaris...) ?

Does it matter only server side, or do clients benefit of having this window
increased too?

Also, a comment of the article mentions this:

    
    
        Why are you talking about upgrading the kernel, when you can simply do:
        
            ip route change default via MYGATEWAY dev MYDEVICE initcwnd 10
    

which would be similar to the netsh tunable on Windows. So upgrading the
kernel is only needed to have it set to 10 _by default_.

EDIT:

It seems Mac OS X is using either NewReno or LEDBAT instead of the mentioned
CUBIC or Vegas. Look for _tcp_ledbat_cwnd_init_ in [1] which looks quite
simple, or _tcp_newreno_cwnd_init_or_reset_ in [0] which looks a bit more
involved:

    
    
        /* Calculate initial cwnd according to RFC3390,
         * - On a standard link, this will result in a higher cwnd
         * and improve initial transfer rate.
         * - Keep the old ss_fltsz sysctl for ABI compabitility issues.
         * but it will be overriden if tcp_do_rfc3390 sysctl is set.
         */
    

PS: xnu-1699.24.23 is Lion 10.7.3

[0]
[http://opensource.apple.com/source/xnu/xnu-1699.24.23/bsd/ne...](http://opensource.apple.com/source/xnu/xnu-1699.24.23/bsd/netinet/tcp_newreno.c)

[1]
[http://opensource.apple.com/source/xnu/xnu-1699.24.23/bsd/ne...](http://opensource.apple.com/source/xnu/xnu-1699.24.23/bsd/netinet/tcp_ledbat.c)

~~~
The_Fox
The initcwnd change is helpful on any host that has more than 2 segments worth
of data ready to send at the beginning of the connection. So a client that
wants to send lots of data would benefit from the change.

For 99% of web browsing, the client's request fits in one or two segments and
so would not benefit from the change.

------
sgt
In OS X all you need to do is set a sysctl setting. No need for even a
restart. Check out net.inet.tcp.slowstart_flightsize (set it to 10) if you are
interested.

------
snissn
If upgrading my Linux Kernel will solve all of my problems, why is the
experimental comparison between a Linux box and a windows box? Just saying..

~~~
sams99
mainly cause I did not have a chance to set up an old Linux VM. The number
hold though, the initial congestion window is 2-3 on the 2 line kernels.

~~~
snissn
I don't disagree, but it makes it an apples an oranges comparison. It
introduces the variables of how linux vs windows deals with TCP (not
withstanding that linux 2.x vs 3. might have some internal ipv4 changes, but
youre recommendation is to upgrade anyway, so that's fine) but also changes in
the webserver.. It seems like the changes are hard coded into the compiled
kernel, so there's no way to simply change configuration flags?

That said, thanks for the post, and I'll definitely be tcpdumping in the
upcoming week and reading some more about slowstart!

Maybe testing with net.ipv4.tcp_slow_start_after_idle 0 vs 1 would make a
cleaner comparison?

~~~
sams99
I totally agree with the concern, but the only way for a clean comparison here
would be for me to spin up a new VM. I observe the exact same patterns as I
get from the windows VM on our Linux 2.x prod box so assume they are the same.

There were a slew of TCP changes leading up to the 3 branch which included
changing the default congestion control algorithm to cubic.

slow start after idle does not really play part here. The test is for a
clean/new connection.

I am no expert but it is possible I could lower my IW on my 3.2 box to 3 to
demonstrate the same pattern, however that too is not a clean comparison.

If my sys admins push me I may set up another VM to demonstrate this.

~~~
snissn
Thanks again for your blog post and comments in this thread! Experiments where
you already are really confident about the conclusions are pretty silly, but I
feel like despite that being skeptical in general to posts on the internet has
value. I haven't investigated yet, but if I do investigate clear benefits of
slow start and can make a corroborating case, i'll be happy to correspond and
write it up in a blog post.. No promises though :)

------
yorhel
Not really a solution for the short term, but CCNx[1] looks like it'd solve a
lot of problems that TCP currently has for both large file transfers and short
web browsing. 1\. <http://www.ccnx.org/>

~~~
obtu
That site is terribly vague, what are the specific problems and how does CCNx
address them?

~~~
yorhel
It would indeed be nice if the project had a proper introductionary page or
something. Either way, the paper "Networking Named Content" is pretty much the
best introduction you can get, and a very interesting read at the same time.
Googling gave me a PDF at the following URL:
[http://conferences.sigcomm.org/co-
next/2009/papers/Jacobson....](http://conferences.sigcomm.org/co-
next/2009/papers/Jacobson.pdf)

------
pkh80
I'm sure there is a chance I am missing some bits here, but I setup a 3.0.18 /
Ubuntu 11.10 / Apache 2.2 server and to compare against 2.6.39 Apache 2.2
server and in all tests they are basically identical.

~~~
sams99
yeah ... the change was introduced in 2.6.39 ... but its a pretty rare kernel
to have afaik, not even in debian backports anymore

------
ilaksh
What versions of Ubuntu have this larger (faster) setting?

~~~
mixmastamyk
Looks like Oneiric and Precise have > 3.0 kernels.

~~~
jcastro
And the Oneiric kernel is backported to 10.04LTS:

Installing "linux-image-generic-lts-backport-oneiric" outta do it.

------
alpb
I wonder if Apple has implemented this in OS X kernel.

~~~
Flow
sudo sysctl -w net.inet.tcp.slowstart_flightsize=10

~~~
alpb
Wow it changed from 1 to 10. Should I put that to boot script or what do you
recommend?

------
Drbble
Linkbait title. Editors, please fix.

