The first issue mentioned is the disparity between maxconn as a global setting and as a backend setting, and it requires some more clarification. First, the article mentions that if global maxconn is reached, the new connections will remain queued in the socket's listening queue. The length of this queue is finite and is specified by the listen() syscall's backlog parameter. It appears that for recent versions of Linux this value defaults to 128, though in the past I have seen it much lower (on the order of 5). This means that if you set your maxconn to be, say, 256 and don't change the default backlog value, you'll get 384 connections connected or queued before the next client is refused.
To me it makes sense to set this value fairly high. There is not a great cost to holding a connection open: typically a few dozen bytes, but this way you can simply have your service working slower and catching up once the number of requests decreases. Say you set maxconn to 4096 globally and say the backend can process only 32 requests at a time. In this case you essentially buffer the client connections in the HAProxy (or whatever you use on the front-end) queue instead of outright refusing them. You still get all the benefits from the backend's maxconn so the backend doesn't start thrashing, but you accept a lot more connections before your users start seeing "Server refused connection" errors. Of course if you routinely need to process more than 32 concurrent requests your backend will never catch up, so in that case you want to increase its performance or add more backends.
Additionally, HAProxy is, AFAIK, the only HTTP/TCP server that has the option to log when a connection is first established, not when it is finalized, making it a lot easier to debug certain types of problems, as well as detecting Slowlaris attacks.
Large queues can have problems. Perhaps your queue is effectively 30 seconds deep at times. But your clients time out around 10 seconds. Now every request in the queue is useless. If your clients retry, then now you have a feedback loop generating uselessness. You'd be better off with a short queue, and rejecting requests much faster.
Browsers don't automatically retry AFAIK, and users are not likely to hit refresh after seeing a hard error like connection refused. If you reach a queue that gets to be 30 seconds long, you most likely need more/more powerful backends.
I can pick up two $300 foundry/brocade load balancers on ebay that will handle 1,000,000 connections at wirespeed (each) through custom ASICs, do full health checks, layer 7 header hashing, layer 7 switching, run multiple load balancing algorithms, hand out dhcp, serve as a VRRP gateway with it's peer, and have totally stateless failover on top of it.
It's really weird to have these settings to cripple your load balancer, which already is an idea that gives me an involuntary shudder, and to cap it off the idea is that this is somehow a performance hack.
I posit a different idea. Trust the fact that web servers are written by folks who actually realized that their code would be on the internet, serving up web pages, and that some thought has gone into optimizing things already. Artificially shaping traffic and guessing that things run faster because you just figured it would is why ops people will not give you access to machines in production.
Crank the load balancer up to 11. If your code can't handle traffic, don't break the rest of the network to protect it from traffic. Fix the back end problems.
The workaround for source port exhaustion seems obivous: bind lots of IPv6 addresses on the interface that haproxy uses for communication with the backend servers. They are enough of them available :-). (Or if you use a private network, IPv4 addresses work too, of course).
For what it's worth, if the remote server is the "active close" (the one that initiate the tcp FIN), the client ip:port doesnt enter the TIME_WAIT state.
To me it makes sense to set this value fairly high. There is not a great cost to holding a connection open: typically a few dozen bytes, but this way you can simply have your service working slower and catching up once the number of requests decreases. Say you set maxconn to 4096 globally and say the backend can process only 32 requests at a time. In this case you essentially buffer the client connections in the HAProxy (or whatever you use on the front-end) queue instead of outright refusing them. You still get all the benefits from the backend's maxconn so the backend doesn't start thrashing, but you accept a lot more connections before your users start seeing "Server refused connection" errors. Of course if you routinely need to process more than 32 concurrent requests your backend will never catch up, so in that case you want to increase its performance or add more backends.
Additionally, HAProxy is, AFAIK, the only HTTP/TCP server that has the option to log when a connection is first established, not when it is finalized, making it a lot easier to debug certain types of problems, as well as detecting Slowlaris attacks.