Hacker News new | past | comments | ask | show | jobs | submit login
Linux Kernel Tuning for C500k (urbanairship.com)
158 points by superjared on Sept 29, 2010 | hide | past | favorite | 25 comments



Some other tricks that were not touched upon in the article, but which may apply depending on the nature of your traffic:

1) If you have lots of short connections and you want to tune the amount of time that the kernel will keep half-closed connections around then you can play around with changing the values of net.ipv4.tcp_fin_timeout, net.ipv4.tcp_tw_reuse, net.ipv4.tcp_tw_recycle, and net.ipv4.tcp_max_tw_buckets.

2) If you have a modern NIC then you probably need to tweak the txqueuelen in your ifconfig options.

3) If you are get hits from a large number of random browsers then sometimes setting net.ipv4.tcp_no_metrics_save and net.ipv4.tcp_moderate_rcvbuf to turn off cacheing of flow metrics helps.

4) Increase net.core.somaxconn to increase your listen queue size.

5) If you have a local firewall like iptables in place make sure you increase net.ipv4.ip_conntrack_max, direct your high-traffic ports to the NOTRACK table, and play around with all of the various net.ipv4.netfilter.ip_conntrack_tcp_timeout_* settings.


Good tips. The only thing I would recommend against is setting net.core.somaxconn too high - a too large backlog at a time when your server is already resource constrained might just push it over the brink.


I tested one server up to 1 million concurrent connections a couple of years ago, search for sysctl in this article to see the settings I used: http://www.metabrew.com/article/a-million-user-comet-applica...


Just a technical note on the 64K myth section. My understanding is that TCP connections track by the tuple (remote_host, remote_port, local_host, local_port) so a single client can have 64k unique connections to each port on a remote machine.

If that is actually the case, the document gets its myth correction wrong (by a lot) :)

Can anyone clarify this?


You are right. The part I didn't really make clear is that we only serve on the single external port. Were we to use multiple, then yes, we could have 64k * 64k per IP pair.


Ahh -- that makes sense. Thanks for clarifying.


This isn't relevant to newer kernels as these settings are dynamic based on memory size since 2.6.26ish - The kernel will set this based on usage no need to tweak. The only real issue is making sure you buy a high end network card that will offload as much as possible to avoid x context switches per second (I don't know what it is exactly with netpoll).


My, how things have changed since the C10k problem...

http://www.kegel.com/c10k.html


The C10k solutions are effectively the same as for C500k, those being epoll (Linux), kqueue (BSD), etc. Our Java NIO server utilizes epoll to handle C500k.


What Java NIO server utilizes for EINTR? ^_^


The approaches stayed pretty much the same:

1. Serve many clients with each thread

2. Serve one client with each server thread

3. Build the server code into the kernel


It's really #1. #2 will barely get you to 10K and definitely not to 500K, while #3 is too brittle (although if you're feeling lazy, I noticed that Solaris has SSL and HTTP reverse proxies in the kernel).


Interesting post, thanks for sharing!

About the suggested sysctl.conf settings: I think you'd also need to adjust net.core.rmem_max and net.core.wmem_max in order for the net.ipv4.tcp_rmem and net.ipv4.tcp_wmem settings to be effective.

Furthermore it couldn't hurt to increase net.core.netdev_max_backlog, which is the maximum number of packets queued on the input side, when the interface receives packets faster than kernel can process them.


Regarding the `net.core` parameters. We do modify those, but my assumption (probably wrong) was that the `net.ipv4` changes would override the core configs. I'll take a look and update the post. Good point about `netdev_max_backlog`, I need to read up on that one too.

Thanks for the feedback!


I'm playing around with the exact same things -- maximum concurrent tcp sockets on Amazon EC2 large instances.

How many did you get running?



On an EC2 Large we hit 520K.


Linking this with the IPv6 stuff currently on the front page: note that none of this would be necessary if the clients were running IPv6 (or otherwise un-NAT-ed) - the server could simply send them a UDP packet or even open a TCP connection.


This is interesting stuff. I jumped into node.js programming a while ago and will like to run similar tests on node.js. Can anyone tell me how client side load of 500K long lived connections achieved ? Is there a standard set of programs to achieve this or some custom scripts.


Did somebody solved C1000K (or C1M) problem?

http://news.ycombinator.com/item?id=1755575

Is maximum number of connections that you can reach on largest EC2 instance is the same as on physical server?


I'm wondering what are the major side effects of this. Hmm?


A good question. Shrinking TCP buffer sizes can have a negative performance impact when sending large amounts of data; our use case was keeping track of a large number of mostly silent connections, and so we benefit from the smaller memory footprint.


btw, the most common source of high load is (surprise!) disk I/O.

So, moving a /var/log (not just /var) on separate device connected to distinct controller port is a big deal.

If you're running, say, mail server, you should separate /var/spool and /var/log and /var/db/mysql if any.

Partitioning, serious network card (think Broadcom) and big CPU caches are good things to begin with.


Broadcom on Linux?! I was under the impression that their drivers aren't too good or open source friendly.


LOL! This is called tuning nowadays?

Even Oracle providing much more good advices, let alone some individual pros.

Good starting point: http://www.puschitz.com/InstallingOracle10g.shtml

Update: Oh, yes, I understood. Newfags doesn't know what Oracle is. MySQL = RDBMS, I see. ^_^




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: