If you're running nginx in production on Ubuntu or Debian, you'd probably be well-served by using their packages rather than the distribution default ones.
In order for a collision to take place, we’d have to get a new connection from an existing client, AND that client would have to use the same port number that it used for the earlier connection, AND our server would have to assign the same port number to this connection as it did before.
Ephemeral ports aren't assigned to inbound connections, they're used for outbound connections. So, for the client-to-nginx connection, both the server IP and port are fixed (the port will be either 80 or 443) - only the client IP and port change, so for a collision all you need is for a client to re-use the same port on its side quickly.
For the nginx to node connection, both IPs and the server port are fixed, leaving only the ephemeral port used by nginx to vary. You don't have to worry about out-of-order packets here though, since the connection is loopback.
Note that only the side of the connection that initiates the close goes into TIME_WAIT - the other side goes into a much shorter LAST_ACK state.
So just to be sure, some noobs questions: The 'Ephemeral Ports' and the 'TIME_WAIT state' tricks are here to handle the connections from nginx to Node.js (not for the client to nginx)?
Socket from client to nginx are well identified by the client IP an the client port. On each client request, nginx create a new socket to node.js?
There can be more than one node.js intance running? That's the main goal of nginx here, or there is some additional benifices?
> Edit, ok: "nginx is used for almost everything: gzip encoding, static file serving, HTTP caching, SSL handling, load balancing and spoon feeding clients" http://blog.argteam.com/coding/hardening-node-js-for-product...
Really surprised this article doesn't mention tcp_tw_reuse or tcp_tw_recycle. These have a more substantial impact that simply adjusting TW, as those ports will still be in a FIN_WAIT status for a long time before reuse as well.
I've been playing around with these settings on very loaded machines:
# Retry SYN/ACK only three times, instead of five
net.ipv4.tcp_synack_retries = 3
# Try to close things only twice
net.ipv4.tcp_orphan_retries = 2
# FIN-WAIT-2 for only 5 seconds
net.ipv4.tcp_fin_timeout = 5
# Increase syn socket queue size (default: 512)
net.ipv4.tcp_max_syn_backlog = 2048
# One hour keepalive with fewer probes (default: 7200 & 9)
net.ipv4.tcp_keepalive_time = 3600
net.ipv4.tcp_keepalive_probes = 5
# Max packets the input can queue
net.core.netdev_max_backlog = 2500
# Keep fragments for 15 sec (default: 30)
net.ipv4.ipfrag_time = 15
# Use H-TCP congestion control
net.ipv4.tcp_congestion_control = htcp
On 2.6.3x, someone posted a year or two ago to one of the linux mailing lists demoing an ipv6 stack hang under high traffic when tcp_tw_recycle is set to true.
tcp_tw_reuse and tcp_tw_recycle are dangerous. We have seen a significant number of connections from clients behind a NAT gateway being dropped with tcp_tw_recycle = 1.
Absolutely use with caution and test extensively. Everyone one of these tweaks are tuning things at a very granular level and may cause more problems than help.
Btw, the dropped clients has to do with recycle -- reuse is far 'safer', protocol speaking.
_"A large part of this is due to the fact that nginx only uses HTTP/1.0 when it proxies requests to a back end server, and that means it opens a new connection on every request rather than using a persistent connection"_
As for node.js, core only ever holds a connection open for once through the event loop, and even then, only if there are requests queued. If you have any kind of high volume tcp client in node, this will also cause issues w/ ephemeral port exhaustion and thus tcp memory loading. Check out https://github.com/TBEDP/agentkeepalive in that case. Related to tcp memory load issues in general, this is a helpful paper http://www.isi.edu/touch/pubs/infocomm99/infocomm99-web/
There's some good info in here. We ran a flash hotel sale a while back. Only lasted for 60 seconds but with about 800 booking req/second. Discovered many of the same issues but I never quite got iptables stable (hash table flooding, then other issues) so I ended up getting it to ignore some of the traffic. Will try out the solutions in here next time to see how it goes.
Yeah that was a good take-away from the article. I was not aware of that either.
I would guess it's to allow long connections for ssh or similar without timeouts, but there are other ways to prevent timeouts without it eating all those resources.
There are ~32k ephemeral ports. Typical servers have ~32G of memory. It's certainly not hard to imagine a request architecture where a single request can be handled in less than 1MB of per-request memory.
In this case, they were reverse proxying with an old nginx that only supported HTTP/1.0 for backend connections. So they do need an ephemeral port for each request.
so we've made a few changes since I'd initially done the ephemeral port tuning, the most important being switching to unix domain sockets rather than TCP. with that, we probably no longer need the ephemeral port setting.
No more:
http://www.quora.com/Why-doesnt-Nginx-talk-HTTP-1-1-to-upstr...
http://nginx.org/en/docs/http/ngx_http_upstream_module.html#...
http://mailman.nginx.org/pipermail/nginx/2011-August/028324....