
Linux Kernel Tuning for C500k - superjared
http://blog.urbanairship.com/blog/2010/09/29/linux-kernel-tuning-for-c500k/
======
evgen
Some other tricks that were not touched upon in the article, but which may
apply depending on the nature of your traffic:

1) If you have lots of short connections and you want to tune the amount of
time that the kernel will keep half-closed connections around then you can
play around with changing the values of net.ipv4.tcp_fin_timeout,
net.ipv4.tcp_tw_reuse, net.ipv4.tcp_tw_recycle, and
net.ipv4.tcp_max_tw_buckets.

2) If you have a modern NIC then you probably need to tweak the txqueuelen in
your ifconfig options.

3) If you are get hits from a large number of random browsers then sometimes
setting net.ipv4.tcp_no_metrics_save and net.ipv4.tcp_moderate_rcvbuf to turn
off cacheing of flow metrics helps.

4) Increase net.core.somaxconn to increase your listen queue size.

5) If you have a local firewall like iptables in place make sure you increase
net.ipv4.ip_conntrack_max, direct your high-traffic ports to the NOTRACK
table, and play around with all of the various
net.ipv4.netfilter.ip_conntrack_tcp_timeout_* settings.

~~~
bnoordhuis
Good tips. The only thing I would recommend against is setting
net.core.somaxconn too high - a too large backlog at a time when your server
is already resource constrained might just push it over the brink.

------
metabrew
I tested one server up to 1 million concurrent connections a couple of years
ago, search for sysctl in this article to see the settings I used:
[http://www.metabrew.com/article/a-million-user-comet-
applica...](http://www.metabrew.com/article/a-million-user-comet-application-
with-mochiweb-part-3)

------
sophacles
Just a technical note on the 64K myth section. My understanding is that TCP
connections track by the tuple (remote_host, remote_port, local_host,
local_port) so a single client can have 64k unique connections to each port on
a remote machine.

If that is actually the case, the document gets its myth correction wrong (by
a lot) :)

Can anyone clarify this?

~~~
superjared
You are right. The part I didn't really make clear is that we only serve on
the single external port. Were we to use multiple, then yes, we could have 64k
* 64k per IP pair.

~~~
sophacles
Ahh -- that makes sense. Thanks for clarifying.

------
nwmcsween
This isn't relevant to newer kernels as these settings are dynamic based on
memory size since 2.6.26ish - The kernel will set this based on usage no need
to tweak. The only real issue is making sure you buy a high end network card
that will offload as much as possible to avoid x context switches per second
(I don't know what it is exactly with netpoll).

------
chrisbolt
My, how things have changed since the C10k problem...

<http://www.kegel.com/c10k.html>

~~~
superjared
The C10k solutions are effectively the same as for C500k, those being epoll
(Linux), kqueue (BSD), etc. Our Java NIO server utilizes epoll to handle
C500k.

~~~
c00p3r
What Java NIO server utilizes for EINTR? ^_^

------
metachris
Interesting post, thanks for sharing!

About the suggested sysctl.conf settings: I think you'd also need to adjust
_net.core.rmem_max_ and _net.core.wmem_max_ in order for the
_net.ipv4.tcp_rmem_ and _net.ipv4.tcp_wmem_ settings to be effective.

Furthermore it couldn't hurt to increase _net.core.netdev_max_backlog_ , which
is the maximum number of packets queued on the input side, when the interface
receives packets faster than kernel can process them.

~~~
superjared
Regarding the `net.core` parameters. We do modify those, but my assumption
(probably wrong) was that the `net.ipv4` changes would override the core
configs. I'll take a look and update the post. Good point about
`netdev_max_backlog`, I need to read up on that one too.

Thanks for the feedback!

~~~
metachris
I'm playing around with the exact same things -- maximum concurrent tcp
sockets on Amazon EC2 large instances.

How many did you get running?

~~~
robotadam
We got over 500k/instance:
[http://blog.urbanairship.com/blog/2010/08/24/c500k-in-
action...](http://blog.urbanairship.com/blog/2010/08/24/c500k-in-action-at-
urban-airship/)

------
JoachimSchipper
Linking this with the IPv6 stuff currently on the front page: note that none
of this would be necessary if the clients were running IPv6 (or otherwise un-
NAT-ed) - the server could simply send them a UDP packet or even open a TCP
connection.

------
ashish01
This is interesting stuff. I jumped into node.js programming a while ago and
will like to run similar tests on node.js. Can anyone tell me how client side
load of 500K long lived connections achieved ? Is there a standard set of
programs to achieve this or some custom scripts.

------
nivertech
Did somebody solved C1000K (or C1M) problem?

<http://news.ycombinator.com/item?id=1755575>

Is maximum number of connections that you can reach on largest EC2 instance is
the same as on physical server?

------
plainOldText
I'm wondering what are the major side effects of this. Hmm?

~~~
robotadam
A good question. Shrinking TCP buffer sizes can have a negative performance
impact when sending large amounts of data; our use case was keeping track of a
large number of mostly silent connections, and so we benefit from the smaller
memory footprint.

------
c00p3r
btw, the most common source of high load is (surprise!) disk I/O.

So, moving a /var/log (not just /var) on separate device connected to distinct
controller port is a big deal.

If you're running, say, mail server, you should separate /var/spool and
/var/log and /var/db/mysql if any.

Partitioning, serious network card (think Broadcom) and big CPU caches are
good things to begin with.

~~~
ciupicri
Broadcom on Linux?! I was under the impression that their drivers aren't too
good or open source friendly.

------
c00p3r
LOL! This is called tuning nowadays?

Even Oracle providing much more good advices, let alone some individual pros.

Good starting point: <http://www.puschitz.com/InstallingOracle10g.shtml>

Update: Oh, yes, I understood. Newfags doesn't know what Oracle is. MySQL =
RDBMS, I see. ^_^

