
Coping with the TCP TIME-WAIT state on busy Linux servers - arunc
http://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux.html
======
ars
Seems like if you are in the situation described at the top - behind a load
balancer with a limited number of possible connections per minute, you also
_don 't_ have to deal with NAT - so you can safely enable options that don't
work with it.

So it's one or the other: Either you have to deal with NAT, but then you also
have plenty of remote IP's and no issue of running out, or have only one
remote IP and no issue of NAT.

i.e. you'll never have both problems at once.

~~~
sounds
Wouldn't it be that clients behind a NAT might end up attempting to connect
from the same source address:port (the NAT box), or a load balancer would make
connections all from the same address?

i.e. when you have to deal with NAT, there's a nonzero possibility of a
customer getting an error simply because your app is popular and their ISP has
run out of IP's and ports.

Or when you don't have to deal with NAT directly, you're limited to ~60k
incoming connections per minute which may not be enough.

The article suggests listening on more ports so incoming connections are not
all on the same port. I have my doubts that will work. (User-hostile to expect
users to add a port number to the URL, or non-standard ports only used for
automated connections which limits the scope of the solution.)

If behind a load balancer, assigning more IPs to the server is easy since
they're not public IPs, so that seems like a good solution.

On the other hand, the article hints at two really good solutions:

1\. End the connection with a RST instead of a clean close(). Yes, the other
side will see an error at that point but I wonder if browsers wouldn't just
silently ignore that after receiving the entire response?

2\. Definitely won't work for browsers, but if the client does the close()
call then the client gets to handle the TIME-WAIT and the server can recycle
ports as fast as it wants.

P.S. With IPv6 assigning more public IPs is a great solution.

~~~
ars
> Wouldn't it be that clients behind a NAT might end up attempting to connect
> from the same source address:port (the NAT box)

They shouldn't. The NAT should know not to recycle ports so quickly.

> or a load balancer would make connections all from the same address?

It would, but then your load balancer is not behind a NAT.

> Or when you don't have to deal with NAT directly, you're limited to ~60k
> incoming connections per minute which may not be enough.

But then you can enable the options that don't work well with NAT.

> The article suggests listening on more ports .. User-hostile ....

No, that's only for the load balancer to do _internally_. It's not for
external use.

------
takeda
NAT is probably the most common one, at my company we run into strange issues
because some idiot enabled tw_recycle.

In our situation we used a hardware load balancer (F5 Viprion) which worked in
active-active mode. The tw_recycle was enabled on the server nodes that were
load balanced.

Long story short, everything worked fine until some traffic was applied, then
some connections started to hang for few seconds. Our first assumption was
that the load balancer had issues or the switches it was connected to. It took
tons of hours and packet captures to realize that the problem was due to
differences of tcp timestamp (blades don't have exact same time and per rfcs
they don't need to) and then tracing dependence on timestamp to this setting.

So please don't enable this setting, everything might work fine initially in
your setup for months, then the one day you will observe strange behavior and
start pulling your hair trying to figure out what's going on.

------
penguindev
This is an _amazing_ writeup. I have run into this before (running out of
sockets) stress testing a web proxy, and when I did, I cursed the TCP
designers for designing only 32 bit sequence numbers and 16 bit port numbers.

That said, maybe there is still a need for TIME_WAIT in a distributed protocol
that can't guarantee sequence number uniqueness 100%. I'm glad the article
provided detailed cpu and memory measurements explaining their costs, which
don't seem too bad. It's running out of tuples due to a practically arbitrary
and short-sighted limit that is the killer.

Also liked the interesting notes about socket linger.

------
joosters
If your servers are handling HTTP traffic, another big win is to make sure
HTTP keep-alives are enabled on the servers. This will cause connections to be
reused and so fewer connections will be closed.

~~~
msantos
Unless of course you have a high bounce rate (intended or otherwise) never use
HTTP keep-alive.

~~~
acdha
Can you provide some background on when and why you believe this is good
advice? Otherwise this is exactly the kind of technical folklore the original
post was complaining about – it's a broad assertion with no supporting theory
to help people understand whether it's applicable to their situation.

~~~
msantos
I meant do NOT use keep-alive if you have a high bounce rate.

High traffic and high bounce rate, will leave a lot of TIME_WAIT hanging
around for precious seconds and soon enough your server won't be able to
accept new connections.

Obviously this applies if you have both: high traffic and high bounce rate.
Any high traffic service over TCP where a client connects to the server for a
very brief moment and doesn't come back until hours or days later, or never.

~~~
acdha
These are exactly the missing details, modulo the qualifier that "high
traffic" generally means at least tens of thousands of persistent connections
on a modern web server.

------
feld
net.inet.tcp.nolocaltimewait on FreeBSD is pretty nice if you're connecting to
thinks locally (proxy, nginx, varnish, etc etc)

net.inet.tcp.nolocaltimewait: Do not create compressed TCP TIME_WAIT entries
for local connection

