
Keepalives Considered Harmful (Sometimes) - jgrahamc
https://blog.cloudflare.com/keepalives-considered-harmful/
======
ciprian_craciun
For me the main take-away of the article is: on the backend, between two HTTP
services running on the same host (either through TCP on loopback or via UNIX
socket domains, or between two hosts with good dedicated network), it is
"better" to disable HTTP Keep-Alives, especially when there is connection
stickiness from end-to-end.

(Or if the two services treat subsequent requests coming from the same TCP
connection as independent and distribute them round-robin to their workers,
than perhaps keep-alive is OK.)

My experience with this is through HAProxy which up to version 2.(1?) tied one
client's inbound connection to one backed's outbound connection; thus I always
forced the backend side to disable keep alive. (However lately HAProxy is able
to pool connections if asked fore, thus this doesn't apply anymore.)

\----

I wished the article made it clear that keep-alive on the client side of the
TCP connection is still useful... I.e.:

[ Client(with-KA) -> internet -> load-balancer(with-KA) ] -->> [
routing(without-KA) -> backend(without-KA) ]

------
mwcampbell
> Due to how Linux works (see the first link in this blog post), most of the
> load goes to a few workers out of the pool.

I'm a little disappointed that the author apparently decided to accept and
work around brokenness in the OS, though I admit that sometimes I do this
myself. I read the linked blog post, and I wonder if Cloudflare uses any of
the proposed kernel patches for fixing the broken LIFO behavior of epoll.

~~~
ciprian_craciun
The main issue here is not the `accept()` syscall unfairness, but the fact
that the client connection is tied to the backend connection, and thus an
imbalance can be easily created if that client sends requests which are out of
the "average" behaviour.

