Some other tricks that were not touched upon in the article, but which may apply depending on the nature of your traffic:
1) If you have lots of short connections and you want to tune the amount of time that the kernel will keep half-closed connections around then you can play around with changing the values of net.ipv4.tcp_fin_timeout, net.ipv4.tcp_tw_reuse, net.ipv4.tcp_tw_recycle, and net.ipv4.tcp_max_tw_buckets.
2) If you have a modern NIC then you probably need to tweak the txqueuelen in your ifconfig options.
3) If you are get hits from a large number of random browsers then sometimes setting net.ipv4.tcp_no_metrics_save and net.ipv4.tcp_moderate_rcvbuf to turn off cacheing of flow metrics helps.
4) Increase net.core.somaxconn to increase your listen queue size.
5) If you have a local firewall like iptables in place make sure you increase net.ipv4.ip_conntrack_max, direct your high-traffic ports to the NOTRACK table, and play around with all of the various net.ipv4.netfilter.ip_conntrack_tcp_timeout_* settings.
Good tips. The only thing I would recommend against is setting net.core.somaxconn too high - a too large backlog at a time when your server is already resource constrained might just push it over the brink.
Just a technical note on the 64K myth section. My understanding is that TCP connections track by the tuple (remote_host, remote_port, local_host, local_port) so a single client can have 64k unique connections to each port on a remote machine.
If that is actually the case, the document gets its myth correction wrong (by a lot) :)
You are right. The part I didn't really make clear is that we only serve on the single external port. Were we to use multiple, then yes, we could have 64k * 64k per IP pair.
This isn't relevant to newer kernels as these settings are dynamic based on memory size since 2.6.26ish - The kernel will set this based on usage no need to tweak. The only real issue is making sure you buy a high end network card that will offload as much as possible to avoid x context switches per second (I don't know what it is exactly with netpoll).
The C10k solutions are effectively the same as for C500k, those being epoll (Linux), kqueue (BSD), etc. Our Java NIO server utilizes epoll to handle C500k.
It's really #1. #2 will barely get you to 10K and definitely not to 500K, while #3 is too brittle (although if you're feeling lazy, I noticed that Solaris has SSL and HTTP reverse proxies in the kernel).
About the suggested sysctl.conf settings: I think you'd also need to adjust net.core.rmem_max and net.core.wmem_max in order for the net.ipv4.tcp_rmem and net.ipv4.tcp_wmem settings to be effective.
Furthermore it couldn't hurt to increase net.core.netdev_max_backlog, which is the maximum number of packets queued on the input side, when the interface receives packets faster than kernel can process them.
Regarding the `net.core` parameters. We do modify those, but my assumption (probably wrong) was that the `net.ipv4` changes would override the core configs. I'll take a look and update the post. Good point about `netdev_max_backlog`, I need to read up on that one too.
Linking this with the IPv6 stuff currently on the front page: note that none of this would be necessary if the clients were running IPv6 (or otherwise un-NAT-ed) - the server could simply send them a UDP packet or even open a TCP connection.
This is interesting stuff. I jumped into node.js programming a while ago and will like to run similar tests on node.js. Can anyone tell me how client side load of 500K long lived connections achieved ? Is there a standard set of programs to achieve this or some custom scripts.
A good question. Shrinking TCP buffer sizes can have a negative performance impact when sending large amounts of data; our use case was keeping track of a large number of mostly silent connections, and so we benefit from the smaller memory footprint.
1) If you have lots of short connections and you want to tune the amount of time that the kernel will keep half-closed connections around then you can play around with changing the values of net.ipv4.tcp_fin_timeout, net.ipv4.tcp_tw_reuse, net.ipv4.tcp_tw_recycle, and net.ipv4.tcp_max_tw_buckets.
2) If you have a modern NIC then you probably need to tweak the txqueuelen in your ifconfig options.
3) If you are get hits from a large number of random browsers then sometimes setting net.ipv4.tcp_no_metrics_save and net.ipv4.tcp_moderate_rcvbuf to turn off cacheing of flow metrics helps.
4) Increase net.core.somaxconn to increase your listen queue size.
5) If you have a local firewall like iptables in place make sure you increase net.ipv4.ip_conntrack_max, direct your high-traffic ports to the NOTRACK table, and play around with all of the various net.ipv4.netfilter.ip_conntrack_tcp_timeout_* settings.