
Stop Wasting Connections, Use HTTP Keep-Alive - mgartner
https://lob.com/blog/use-http-keep-alive
======
hkolk
Better yet.. switch to http2 (where keep-alive is deprecated):
[https://developer.mozilla.org/en-
US/docs/Web/HTTP/Headers/Ke...](https://developer.mozilla.org/en-
US/docs/Web/HTTP/Headers/Keep-Alive)

~~~
baroffoos
Its amazing that not everyone is on http2 yet when its basically free speed.

~~~
Tharkun
It's hardly a surprise. It takes time for people to migrate to new protocols.
Not everyone can just leave 20 years of engineering effort behind and switch
to HTTP2 because it's a bit faster in some situations.

~~~
robocat
HTTP2 has multiple optimisations e.g. I only just realised it compresses
XMLHTTPRequest _requests_ , not just responses.

We use CloudFlare, so most of our users get HTTP2 even though our own
infrastructure is still HTTP1.1 (however some corporate customers have
proxies, which usually downgrade the browser connection to HTTP1.1).

We log whether HTTP2 or HTTP1.1 is used by the browser by JavaScript reading
`window.performance.getEntries()[0].nextHopProtocol` which is supported by
most modern browsers.

------
toast0
The hidden danger, mentioned in the article, is if the client sends a second
request while the server closes an idle connection. Until http/2, the client
can't tell if the server closed the connection before or after it received the
request. Many servers send a hint about the idle time out, but few client
libraries process it (that I've seen). The larger the latency between server
and client, the bigger deal this is.

~~~
beagle3
This is always an issue: you send an HYTP POST request (even on http/0.9) -
and connection closes before you saw a response. Did the server receive it?
You don’t know.

Pipelining might amplify it, but it is always there, especially with
unreliable mobile connections.

~~~
toast0
If it's the first request on a connection, and it appears that the server
closes the connection, I have a reasonable expectation that the server doesn't
care for my request. When the request times out, who knows -- most tcp stacks
won't tell me it the server acked it, but many networks will fake acks these
days anyway.

On pipelined requests it's not too bad, you're not supposed to pipeline
requests that aren't safe to retry. But pipelining ends up being somewhat rare
in practice. Reusing an inactive connection is actually pretty risky, the
server may be able to shut it down, your network may have silently dropped the
connection already (some NAT timeouts are really short, I've seen cases in
real mobile networks where the timeout was under a minute!).

I'm not thrilled with multiplexing in http/2, but the sensible stream closure
would be really nice to have. If you see a goaway, you know it it saw your
request or not, so you can resend it with a clear concensce.

~~~
beagle3
“Reasonable” to a human, maybe.

If you are trying to build a robust system, in which requests don’t get lost,
the difference is in quantity but not in quality - you must robustly handle
the uncertainty in both cases.

~~~
toast0
Sure -- when I build a system, I make sure I can always retry all the
requests. Because there's never a guarantee that the client got the response,
or stored the results successfully. It could make a follow up request, but
lost power or been killed or whatever before the results were stored, and next
time start over. Sometimes the server failed to store, but told the client it
did -- that's fun too, but thankfully I control the servers and can usually
limit the damage of that.

But, most people don't realizing the byzantine hell we all inhabit; and
http(s) client library defaults for retrying apparently idempotent requests
will often work well enough; but server idle configured less than client idle
is much easier to trip over.

------
berkut
So I've been writing a web server recently (mostly for learning purposes of
HTTP and web stuff again as I've been out of that field for over a decade),
and I've discovered that Firefox seems to be the only browser I've tried which
still really utilises it fully and respects the headers: Safari sort of does,
but only ever once, regardless of the header params, and Chrome never does,
again, regardless of what the Connection header item says in the first
response from the server.

[https://www.chromium.org/developers/design-
documents/network...](https://www.chromium.org/developers/design-
documents/network-stack/http-pipelining)

shows they removed it in Chrome due to issues, and weirdly this Firefox
ticket:

[https://bugzilla.mozilla.org/show_bug.cgi?id=264354](https://bugzilla.mozilla.org/show_bug.cgi?id=264354)

(last updated 5 years ago), seems to show they're not going to enable keep
alive for HTTP 1.1, but Firefox is most definitely utilising it.

~~~
iforgotpassword
These tickets talk about pipelining, not keep-alive. The chrome link
specifically points out why they decided to disable it. Chrome and Firefox
still support regular keep-alive and do reuse connections on http 1.1

~~~
berkut
Fair point, however I've seen no evidence Chrome is doing keep-alive on Linux
or Mac OS - it always seems to send a FIN ACK after the response from the
server to close the connection.

Firefox most definitely _is_ utilising it fully, and obeys the Keep-Alive
params specified, which I can't see any of the other browsers doing.

~~~
pvg
_I 've seen no evidence Chrome is doing keep-alive on Linux or Mac OS_

That sounds like a bug with your server implementation. The web would melt
down if Chrome's keep-alive support didn't work.

~~~
berkut
Possibly, and that's what I assumed at first, but I don't really see how:
after sending the response the socket waits for more data in the form of
another request from the user agent (I've tried with and without a poll /
socket timeout).

Firefox does this perfectly. Chrome sends a FIN ACK in response to the
response from the server, and it closes the connection its side, triggering
the next recv() within the server to return 0 (or poll() to fail, depending on
how I do it). But then it opens another connection after that with another
request which could have been within the previous connection.

I also can't see Chrome doing it on non HTTPS websites in general when
monitoring through wireguard, which is why I don't believe it's an issue with
my server's implementation: I haven't actually observed Chrome utilise it at
all on several computers on both Linux and Mac OS.

~~~
pvg
I don't know why you're seeing what you're seeing but your testing/monitoring
setup sounds like it's at a bit too low level for the thing you're trying to
figure out. A few basic things you might want (or should, really, if you're
implementing HTTP from scratch) to try:

\- Serve real content, say, a basic HTML page with a couple of external
resources.

\- Serve same and monitor (turn on all teh logs) with apache.

\- Use the Chrome dev tools. In the network tab, right click on the column
headers in the request list and enable the 'Connection ID' column. Keep in
mind any modern browser will open concurrent connections in the base HTTP 1.1
case.

~~~
berkut
Thanks, but I've done this (although used Nginx instead).

The web server I've written is now serving several websites I'm hosting, some
of which have thousands of images, so there's plenty of scope for connection
re-use.

Each connection ID is different within dev tools in Chrome - although errors
have 0 - (often sequential, sometimes there are huge gaps). And this is the
same with HTTP websites on the net (just tried www.briscoes.co.nz and am still
seeing the same thing in Chrome 72).

Using Wireshark I can see that Firefox is doing the correct thing, so I'm not
sure why you think what I'm looking at is too low-level: I think it's pretty
clearly something the client's doing at the TCP level to close the connection
its end after its received the content length.

It may well be that I'm not sending a header Chrome's looking for, but I'm not
sure what it is, as in its request it's sending "Connection: keep-alive", and
the response has the same, and Firefox works fine.

It's almost like it's a client configuration or something, but I don't know
what that would be, as I've tried on several machines.

~~~
pvg
'Too low level' is Wireshark before Chrome dev tools. I didn't know if you'd
tried the dev tools since you didn't mention it. As to the other stuff, I'm
not sure what portable Chrome-breaking field you emit - www.briscoes.co.nz
instantly shows connection reuse on my end.

[https://i.imgur.com/LJIaRac.png](https://i.imgur.com/LJIaRac.png)

------
nvarsj
People seem to always get confused by this - http/1.1 is persistent by default
and the vast majority of servers/clients use it.

The "Keep-Alive" header was something tacked onto http/1.0 and doesn't really
mean anything these days.

~~~
tyingq
Unfortunately, depends on the server. Node says this:

 _" Sending a 'Connection: keep-alive' will notify Node.js that the connection
to the server should be persisted until the next request."_

The article seems to confirm this behavior.

So clients have to account for non RFC compliant servers.

~~~
nvarsj
Are you sure about that? That seems to violate the http/1.1 RFC. I think the
node.js docs are talking about http/1.0 there. http/1.1 uses the "Connection"
header to indicate whether to persist the connection.

------
cetra3
Sure, if Apple have fixed their issues with Keep-Alive for Safari:

[https://stackoverflow.com/questions/25372318/error-domain-
ns...](https://stackoverflow.com/questions/25372318/error-domain-
nsurlerrordomain-code-1005-the-network-connection-was-lost/25996971)

~~~
eridius
According to
[https://tools.ietf.org/html/rfc2068#section-19.7.1.1](https://tools.ietf.org/html/rfc2068#section-19.7.1.1)
HTTP/1.1 defines the "Keep-Alive" header for use with keep-alive parameters
but does not actually define any keep-alive parameters. I don't see a
definition of this header in any of the RFCs that obsolete this one (just some
mentions of issues with persistent connections and HTTP/1.0 servers, and the
fact that Keep-Alive is a hop-by-hop header).

MDN's documentation on this header references
[https://tools.ietf.org/id/draft-thomson-hybi-http-
timeout-01...](https://tools.ietf.org/id/draft-thomson-hybi-http-
timeout-01.html#rfc.section.2) for the parameters, but this is an experimental
draft that expired in 2012.

Which is to say, I can't really fault Safari for not respecting keep-alive
parameters that never made it out of the experimental draft phase.

------
iforgotpassword
A common mistake I see when people use libcurl is to create a new handle for
each request, which pretty much guarantees no connection reuse. Reuse your
handles for profit.

------
baybal2
One more reason to do so: if you use client side TLS auth with a cert (or
moreover a slow smartcard,) reauthenticating every connection will grind your
performance to pieces.

------
dmarlow
What's the advice these days for when things are behind a reverse proxy or a
load balancer, or both? I ran into issues where things end up balancing
unevenly when multiple things are mixed.

------
swizzler
Be aware of the maximum connection bottlenecks in your stack, especially if
you have an interactive app.

I've had apache refuse new requests because old connections were holding
slots.

------
paulddraper
I switched servers to Keep-Alive, but then I found out that it would introduce
race conditions, at least with the Apache HTTP Client.

The client would in the middle of sending a new request, but the server would
have already decided to close the connection and the request would fail.

I believe this is a common problem, can and yet the spec has nothing to
address this obvious race condition.

Right?

------
TYPE_FASTER
Assuming the client and server HTTP implementations don't have any bugs, and
there aren't any network devices (proxies, etc.) in between, sure. Are modern
clients able to start at HTTP/2, and gracefully degrade through HTTP/1.1 with
keep-alive down to HTTP/1.0 if necessary? That would be really cool if so.

------
spockz
My understanding was that with http/1.1 all connections are kept alive until a
close is emitted. How is this different?

~~~
mgartner
Based on
[https://en.wikipedia.org/wiki/HTTP_persistent_connection#HTT...](https://en.wikipedia.org/wiki/HTTP_persistent_connection#HTTP_1.1)
it sounds like your statement is correct.

But the fact that the underlying HTTP connection is kept-alive by default
doesn't necessary mean that the client is going to actually re-use that
connection for multiple HTTP requests. And, in fact, in Node.js the connection
is not reused by default.

------
dana321
Or better yet, use gzip and inline all images as base64 encoded. The file size
is very similar to raw data, and the number of requests with associated http
headers is reduced.

~~~
mr_toad
Might be a good idea for statically generated pages, or the statically
generated parts of pages, but I’d be wary of adding any more load to dynamic
pages.

Do any static site generators rewrite image links as data URIs?

~~~
flomei
> Do any static site generators rewrite image links as data URIs?

I'm using Hugo and this is at least no default behaviour, not sure though if
there is a switch for that.

------
snek
you don't need that npm library. just `new Agent({ keepAlive: true })` works

------
hidiegomariani
I believed keep alive was enabled by default on browsers with HTTP1.1

------
jrochkind1
The benchmarks say "to make 1000 HTTP requests", but I wanna know for sure if
it was `https` (SSL) or not.

The SSL connection init costs are real, although SSL session re-use can help
there even without keep-alive.

~~~
mgartner
These metrics were collected for HTTPS specifically. I've omitted the URL I
used in the benchmark scripts, but am using an HTTPS agent. Scripts live here:
[https://github.com/mgartner/node-keep-alive-
benchmark](https://github.com/mgartner/node-keep-alive-benchmark)

------
superkuh
Unfortunately most people's primary computing devices are smart phones and
smart phone radios cannot keep a TCP connection open, or won't, because of
power usage.

~~~
sigjuice
Why does the radio have to be on just because there is an open TCP connection?

~~~
dmlittle
Are you asking why the radio has to be on or implying there are other reason
that the radio could be on for? If you have a TCP connection open but the
radio is off how are you going to receive incoming packets?

~~~
toast0
If it's a keep alive connection, you can just get the packets when you next
turn on the radio. It's not a big deal to get the close right away; not that
there's a good way to tell the system that. Anyway, you probably have a tcp
connection open for the system push channel; if that requires the radio to
stay on, it will be on.

------
fabioyy
GRPC

------
forreal1126
too bad 99% of routers drop keepalives

~~~
mh-
TCP keepalives != HTTP keepalives.

~~~
tonyarkles
Also, in my experience at least, it's not necessarily that routers drop TCP
keep-alives, but rather that the keep-alive interval for most OSes is way
longer than the router's connection timeout for idle entries in the NAT table.

I was burned hard by this in Azure. It seems that the default expiry time is
around 4 minutes for the TCP load balancers. You can bump it to 30 min, but if
I recall the default interval on Linux is 2 hours. Any long-standing idle TCP
connections would get into a state where both sides believed they were
connected, but the packets would get dropped to the floor. When the LB timed
out, it didn't emit any FIN or RST packets, so neither side knew it had been
torn down.

Fun debugging on that one. During the day there was enough activity to keep
the connections alive, but at night they'd break. The overall behaviour was
that the service worked great all day, but the first few actions out-of-
business-hours would fail due to application-layer timeouts, and then
everything would work great again until it had sat idle for a while.

