

Some notes playing with HAProxy - ymichael
https://ymichael.com/2015/06/06/some-notes-playing-with-haproxy.html

======
IgorPartola
The first issue mentioned is the disparity between maxconn as a global setting
and as a backend setting, and it requires some more clarification. First, the
article mentions that if global maxconn is reached, the new connections will
remain queued in the socket's listening queue. The length of this queue is
finite and is specified by the listen() syscall's backlog parameter. It
appears that for recent versions of Linux this value defaults to 128, though
in the past I have seen it much lower (on the order of 5). This means that if
you set your maxconn to be, say, 256 and don't change the default backlog
value, you'll get 384 connections connected or queued before the next client
is refused.

To me it makes sense to set this value fairly high. There is not a great cost
to holding a connection open: typically a few dozen bytes, but this way you
can simply have your service working slower and catching up once the number of
requests decreases. Say you set maxconn to 4096 globally and say the backend
can process only 32 requests at a time. In this case you essentially buffer
the client connections in the HAProxy (or whatever you use on the front-end)
queue instead of outright refusing them. You still get all the benefits from
the backend's maxconn so the backend doesn't start thrashing, but you accept a
lot more connections before your users start seeing "Server refused
connection" errors. Of course if you routinely need to process more than 32
concurrent requests your backend will never catch up, so in that case you want
to increase its performance or add more backends.

Additionally, HAProxy is, AFAIK, the only HTTP/TCP server that has the option
to log when a connection is first established, not when it is finalized,
making it a lot easier to debug certain types of problems, as well as
detecting Slowlaris attacks.

~~~
fizx
Large queues can have problems. Perhaps your queue is effectively 30 seconds
deep at times. But your clients time out around 10 seconds. Now every request
in the queue is useless. If your clients retry, then now you have a feedback
loop generating uselessness. You'd be better off with a short queue, and
rejecting requests much faster.

~~~
IgorPartola
Browsers don't automatically retry AFAIK, and users are not likely to hit
refresh after seeing a hard error like connection refused. If you reach a
queue that gets to be 30 seconds long, you most likely need more/more powerful
backends.

------
tcannon
I can pick up two $300 foundry/brocade load balancers on ebay that will handle
1,000,000 connections at wirespeed (each) through custom ASICs, do full health
checks, layer 7 header hashing, layer 7 switching, run multiple load balancing
algorithms, hand out dhcp, serve as a VRRP gateway with it's peer, and have
totally stateless failover on top of it.

------
tcannon
I'm double posting, sorry.

It's really weird to have these settings to cripple your load balancer, which
already is an idea that gives me an involuntary shudder, and to cap it off the
idea is that this is somehow a performance hack.

I posit a different idea. Trust the fact that web servers are written by folks
who actually realized that their code would be on the internet, serving up web
pages, and that some thought has gone into optimizing things already.
Artificially shaping traffic and guessing that things run faster because you
just figured it would is why ops people will not give you access to machines
in production.

Crank the load balancer up to 11. If your code can't handle traffic, don't
break the rest of the network to protect it from traffic. Fix the back end
problems.

------
perlgeek
The workaround for source port exhaustion seems obivous: bind lots of IPv6
addresses on the interface that haproxy uses for communication with the
backend servers. They are enough of them available :-). (Or if you use a
private network, IPv4 addresses work too, of course).

------
paraboul
For what it's worth, if the remote server is the "active close" (the one that
initiate the tcp FIN), the client ip:port doesnt enter the TIME_WAIT state.

