Hacker News new | past | comments | ask | show | jobs | submit login
Stop Wasting Connections, Use HTTP Keep-Alive (lob.com)
215 points by mgartner 47 days ago | hide | past | web | favorite | 110 comments

Better yet.. switch to http2 (where keep-alive is deprecated): https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ke...

Its amazing that not everyone is on http2 yet when its basically free speed.

It's hardly a surprise. It takes time for people to migrate to new protocols. Not everyone can just leave 20 years of engineering effort behind and switch to HTTP2 because it's a bit faster in some situations.

HTTP2 has multiple optimisations e.g. I only just realised it compresses XMLHTTPRequest requests, not just responses.

We use CloudFlare, so most of our users get HTTP2 even though our own infrastructure is still HTTP1.1 (however some corporate customers have proxies, which usually downgrade the browser connection to HTTP1.1).

We log whether HTTP2 or HTTP1.1 is used by the browser by JavaScript reading `window.performance.getEntries()[0].nextHopProtocol` which is supported by most modern browsers.

It’s not quite; HTTP/2 is not in fact uniformly superior to HTTP/1.1. Search around and you’ll find the reasons; all I’ll mention here is the two biggest keywords: WebSockets and head-of-line blocking.

The end result is that HTTP/2 is an improvement for most common workloads, but not all; especially in app-type scenarios with lots of mobile users with suboptimal connections and comparatively few requests (e.g. because you already do batching rather than sending zillions of requests), HTTP/2 can regress typical performance.

WebSockets over HTTP/2 is now specified in RFC 8441; not sure what the implementation status of that is. That solves one of the main problems.

My understanding is that HTTP/3 (with UDP-based QUIC instead of TCP) then resolves all remaining known systemic regressions between HTTP/1.1 and HTTP/2. So yeah, HTTP/1.1 to HTTP/3 should be pretty close to “free speed”.

But even then, it changes performance and load characteristics, and requires the appropriate software support, and that means that many users will need to be very careful about the upgrade, so that they don’t break things. So it’s not quite free after all.

H2 is nearly always better than HTTP/1, BUT it also turns some H1-specific perf optimization techniques (eg sharding, sprites...) into anti-patterns.

I’m not amazed. While many stacks support it, most organizations still have lift on their end to implement this behind the other “priority” customer change requests.

Of all the micro-optimizations I could think of for web apps, the one with the highest cost and the least benefit would probably be supporting http2 (or *quic). In almost all cases, there is a fix that will speed up http1.1 to acceptable levels.

What's the "cost" to supporting HTTP 2 as an app developer, though? As far as I know, adding support to nginx requires changing one line of code. That's about as close to free as you can get.

For a tiny startup, you might be able to just add http2 support in 10 minutes and everything might be fine, but most of the time it's more complicated. It's a bit like if I said, can I change your app libraries to bleeding edge? It's just a one line change.

Could you be more specific, though? What's more complicated? I'm legitimately curious because I know very little about HTTP 2, but at work (not a tiny startup) we recently enabled it and it turned out to be a trivial change. Unless you're implementing the networking layer of your backend yourself, it seems like a change with practically no cost or tradeoff, as long as your server software supports it.

I haven't implemented it myself, but here's some example scenarios:

Policy: What is allowed architecturally and what isn't? Are there regulatory requirements? Do you have strict enforcement mechanisms?

Instrumentation: Do you need to watch traffic going over the wire? Will your network filters flag it? Do you have application proxies that route traffic based on payload? How is it going to handle multiplexing if existing solutions don't take it into account? Are you using any proprietary stuff?

QA: Every client, server and intermediary may be using different implementations, and that means bugs. Have you certified all the devices in the chain to make sure they operate correctly? (It doesn't matter, until it really matters)

Operation: Each implementation needs to be upgraded one at a time, so the extent of your technology will determine how long and potentially error-prone all this will be. It will be different for each org, but definitely take a long time for really big ones.

This all makes sense. I guess ultimately, the more moving parts you have, the more things can go wrong with a change like this. Thanks!

I imagine its more bureaucratic complexity than technical. This change would require lots of committee meetings, reviews, meetings, discussions, etc. at my company. It would probably take a year to decide to do it and 5 days to actually do it (Get all IT groups into a large war room, make the change on dev servers and then everyone has to completely test all their apps and sign off on it. Then do it again on staging. Then again on prod. It would probably require a bunch of all nighters. I wish I was joking.)

How do you get your work done when 1 line change takes so long.

Changing a version number is a special type of one line change because it only appears to be a one line change. In reality, that could end up being potentially millions of lines of code being changed in dependencies.

We had nginx running as a proxy for one of the apps. It runs on RHEL 7 because that's the standard for the enterprise. The stock nginx available in nginx did not support http/2.

* There is no chance someone will approve this server to run a nginx instance someone compiled themselves

* There is no chance someone will approve this server to run anything but nginx as that's the company standard for proxy servers.

* There is no chance someone will approve this server to install software from a 3. party yum repository. (And that's even a much bigger chance than someone allowing the firewall in front of that server to allow outgoing connections to the internet, so installing form 3. party repos could even be performed)

In the end there was likely 2 ways to get http/2 support for that service: * Pay some 3.d party to make it happen and be responsible for that server. * Wait until nginx in RHEL (the epel repository, which was approved and mirrored internally) supported http/2.

We did the latter, which happened many months later.

One thing I’ve ran into is misconfigured native apps who accidentally treat headers as case sensitive. In particular the usual HTTP client in iOS handles header case sensitivity for you, unless you use newer versions of Swift where it converts the custom header dictionary into a vanilla Swift one that doesn’t treat header lookups as case-insensitive :/

Supporting http2 at an nginx reverse proxy doesn't help the problem in the original post, which is mostly about internal connections between microservices, e.g. going from your nginx proxy to your node or rails server.

Putting http2 here is a pain because you probably don't want https. You'd have to have nginx decrypt and reencrypt all the traffic, and you'd have to deal with certificates etc.

Server push can’t be free. You need the web server to somehow know what resources will be required by the page and I still don’t understand how it doesn’t defeat browser caching but I presume it must involve some non trivial configuration.

Server push also isn’t required to reap most of the benefits. In the places where I’ve tried it I’ve not seen any benefit over link preload headers + HTTP2 without push. Many CDNs that support HTTP2 haven’t bothered to support server push at all, I suspect due to the limited advantages compared to the extra complexity.

Pretty sure server push is being deprecated - current implementation is 'only half a feature', as clients lack the ability to tell the server what's already in cache.

In theory the client can cancel the response for a resource it's already got but by the time the response bytes reach the client it's really too late

Yeah and if the client volunteers the list of all it has in cache it would result in some massive requests and a kind of quasi-cookie.

I always found this feature interesting but weird.

Random thought. But isn't this a potential use for a bloom filter?

Yes bloom filters were a potential for Cache Digests, original prototype for Cache Digests used Golomb coded set as memory representation is smaller than Bloom filter


But then Cache Digests moved onto Cuckoo Filters - https://github.com/httpwg/http-extensions/pull/413

Your webapp needs to start the push, browsers will interrupt it if they find the resource is already in cache after it has parsed the HTML.

It's also a huge amount of complexity.

Well, mod_http2 is incompatible with mpm-itk, and while it’s possible to run nginx as a front-end proxy, such a solution has its own complexities making it not really worth it in most cases where speed is not a top requirement.

Not when you consider that the majority of linux distributions haven't picked up support for it yet in their versions of nginx, apache et al.

Like which?

RHEL7/CentOS 7 is one of the more significant distributions out there, you can yum install version 1.12 of nginx. It wasn't until 1.13 shipped that nginx picked up any support for http/2.

On the Apache front, mod_http2 didn't ship until 2.4.17, again CentOS 7 and other RHEL7 based distributions lags behind on 2.4.6.

Sure, that doesn't mean you couldn't compile / install your own version, but for a lot of people that's just not likely to happen. Sticking with the distribution version keeps you within any support contracts, gets you security patches etc. and all the information you need to keep auditors and the like happy.

nginx is not shipped by RHEL so you are probably pulling from epel. you can also pull the latest stable version directly from nginx's repo http://nginx.org/en/linux_packages.html#RHEL-CentOS which has supported http2 since RHEL 7.4 when they released alpn support in openssl.


you can also install the latest version of apache from red hats software collections repo that supports http2 but it throws everything into /opt/rh/rh-httpd24/ which is a bit weird.

Is it easy for Django/Rails/ExpressJS to implement Http2 in their next release?

In many cases there's nothing frameworks need to do, it's just a matter of swapping out the HTTP server (e.x. https://github.com/expressjs/express/issues/2364#issuecommen...) or maybe even just sticking an HTTP/2-compatible reverse proxy close to the app servers.

Now, if you want to take advantage of HTTP/2 features like server push that's another story.

Django doesn't implement HTTP (except for the development server), that's up to whatever's handling WSGI, like uWSGI or Gunicorn or something.

Unfortunately neither of those currently support HTTP2.

For serving Python, I think your best bet right now is uWSGI behind NGINX.

Better yet.. switch to http3 - head of line blocking is no fun.

We don't need http, we need a better way of distributing and caching data.

Do you have a concrete suggestion? Or, a link to a page with a concrete suggestion?

It's easy to complain about almost anything - it tends to be a lot harder to make a proposal for its replacement.

A proposed solution shouldn't be required to complain about something. Just as submitting a patch isn't required when filing a bug report.

A justification for the complaint is necessary to be taken seriously, though. And the OP didn't provide one.

of course, because http2 is stateful and whenever the server or client whishes, they can send a close message.

but stateful connection management comes with a cost, especially on tcp.

HTTP2 is stateless. It follows the same semantics as HTTP to be a generic stateless protocol.

TCP has stateful connections, but both HTTP versions are being sent over TCP anyway so in that sense the transport was always stateful.

> HTTP2 is stateless. It follows the same semantics as HTTP to be a generic stateless protocol.

It is not currently. HTTP/2 header compression is stateful.

HTTP/2 is a protocol that appear stateless to the end user.

What do you mean by stateful in this context?

Http/2 is multiplexed, unlike http/0.9-1.1, and while that has some overhead, it being a binary protocol probably makes up for it.

The whole protocol has state, because that's the only way to multiplex multiple data streams over a single connection.

i.e.: https://http2.github.io/http2-spec/#StreamStates of course the "user layer" is stateless, but the whole connection handling is a state machine (which actually http/0.9-1.1 wasn't)

> but stateful connection management comes with a cost, especially on tcp.

Keep-alive isn't any better. In Apache bad nginx, keep-alive and http/2 parallel requests are handled at a separate thread and hardly adds any noticeable load.

> Better

No, http2 is not better. We actually did the tests. We're not quite Google-scale, but having to handle tens of thousands of requests per second put us in the 'high load' camp.

What was the issue? Why was it worse?

The extra CPU load for our balancers (nginx) didn't translate into any benefits for the user in performance or user experience. We ended up spending CPU cycles for no benefit.

Basically, HTTP/2 is tuned for a very specific case of Google traffic which pretty much never happens in places that aren't Google.

this is not a useful statement without context or data.

Neither is the parent comment.

The hidden danger, mentioned in the article, is if the client sends a second request while the server closes an idle connection. Until http/2, the client can't tell if the server closed the connection before or after it received the request. Many servers send a hint about the idle time out, but few client libraries process it (that I've seen). The larger the latency between server and client, the bigger deal this is.

This is always an issue: you send an HYTP POST request (even on http/0.9) - and connection closes before you saw a response. Did the server receive it? You don’t know.

Pipelining might amplify it, but it is always there, especially with unreliable mobile connections.

If it's the first request on a connection, and it appears that the server closes the connection, I have a reasonable expectation that the server doesn't care for my request. When the request times out, who knows -- most tcp stacks won't tell me it the server acked it, but many networks will fake acks these days anyway.

On pipelined requests it's not too bad, you're not supposed to pipeline requests that aren't safe to retry. But pipelining ends up being somewhat rare in practice. Reusing an inactive connection is actually pretty risky, the server may be able to shut it down, your network may have silently dropped the connection already (some NAT timeouts are really short, I've seen cases in real mobile networks where the timeout was under a minute!).

I'm not thrilled with multiplexing in http/2, but the sensible stream closure would be really nice to have. If you see a goaway, you know it it saw your request or not, so you can resend it with a clear concensce.

“Reasonable” to a human, maybe.

If you are trying to build a robust system, in which requests don’t get lost, the difference is in quantity but not in quality - you must robustly handle the uncertainty in both cases.

Sure -- when I build a system, I make sure I can always retry all the requests. Because there's never a guarantee that the client got the response, or stored the results successfully. It could make a follow up request, but lost power or been killed or whatever before the results were stored, and next time start over. Sometimes the server failed to store, but told the client it did -- that's fun too, but thankfully I control the servers and can usually limit the damage of that.

But, most people don't realizing the byzantine hell we all inhabit; and http(s) client library defaults for retrying apparently idempotent requests will often work well enough; but server idle configured less than client idle is much easier to trip over.

I wonder if it might be a minor performance issue in the worst case.

Without keepalive, you create a new connection and pay the latency costs of doing so. With keepalive, there's a chance you try to reuse an old connection, and it fails, which requires round trips to learn about, and you still have the latency of creating a new connection. So more total latency in that case.

It seems it would improve the average case but make the worst case slightly worse. If your keepalive timeout is short, maybe it would come up often enough to matter.

This is only partially true. http/1.1 has well defined semantics for persistent connections. The server can send the header "Connection: Close" to indicate to the client it is closing the idle connection. All http/1.1 clients should respect that since it's in the RFC.

The problem is many servers don't send this header when closing idle connections. nginx is a notorious example. But well behaving servers should be sending that header if they intend to close the connection after a request.

The server can certainly send that header on a response when it intends to close the connection immediately after the response. I wouldn't consider that connection to be idle.

However, when the server holds the connection open for some amount of time and then decides to close it, it's not permitted for the server to send a response header, because there's no request to respond to. I would love to be wrong, but I don't think I am, because this scenario is mentioned in the RFC, "For example, a client might have started to send a new request at the same time that the server has decided to close the "idle" connection. From the server's point of view, the connection is being closed while it was idle, but from the client's point of view, a request is in progress." [1]

An example chain of events is:

t0 client opens connection (syn)

t1 server accepts connection (syn+ack)

t2 client sends first request

t3 server sends response and keeps connection open


t63 client sends second request

t63 (simultaneously within a margin of the one way trip time), server closes connection because it's been idle for 60 seconds

t64 client receives FIN

t64 server receives data on closed socket and sends RST

t65 client receives RST

http/2 improves this greatly because in this example, a compliant server will send goaway with last-stream-id 1 prior to closing the connection, and the client will know the second request was not processed and should be retried. It still suffers a latency penalty because it has to start a new connection, and it already wasted somewhere between a one way trip and a round trip.

[1] https://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8....

Why do you think so? The server should report a Status Code for your 2nd request. What header would carry this keepalive hint?

I think this because I've seen the pcap traces. The server closed the connection before it received the 2nd request -- it can't possibly send a status code. See my timeline in a sibling comment.

The Keep-Alive header [1] is optional, but has parameters timeout, indicating the idle timeout, and max, indicating the number of allowed requests. Max is useful for pipelining, to avoid sending requests that won't be processed; timeout is very helpful for avoiding sending requests when the server is about to close the socket.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ke... The Connection response header is specificed to have two optional parameters, timeout, and max. timeout

Regarding the first question, IMO the webserver should half-close the socket. So, the on the wire requests are rejected on TCP level, while the final response is being delivered. Of course, the client needs to deal with socket errors in addition to http status codes.

For the keep-alive header, you are right, I wasn't aware of it.

I'm not sure what you mean by half close.

In my scenario at time N, the connection is idle -- both sides have received all data the other has sent, all requests have received a response.

If the server half-closes (through shutdown) and sends a FIN, simultaneously with the client sending a new request; that enables the server to read the request, but not respond to it, so I don't see how that is helpful?

The problem from the client side is it's sent a request, and seemingly in response the socket is closed. That could indicate the server crashed on the request, or the server closed the socket because it was idle. If you have a request that you know or suspect shouldn't be made more than once, you shouldn't retry it on a new connection. Assuming the tcp packets from the server, you can actually take a good guess at causality, because the ACK number and TCP Timestamp indicate if the server saw your last transmission, but that information isn't exposed through normal the normal socket API; you could maybe guess based on round trip time too, but it is nicer in http/2 (or other protocols), where there is an explicit close message.

So I've been writing a web server recently (mostly for learning purposes of HTTP and web stuff again as I've been out of that field for over a decade), and I've discovered that Firefox seems to be the only browser I've tried which still really utilises it fully and respects the headers: Safari sort of does, but only ever once, regardless of the header params, and Chrome never does, again, regardless of what the Connection header item says in the first response from the server.


shows they removed it in Chrome due to issues, and weirdly this Firefox ticket:


(last updated 5 years ago), seems to show they're not going to enable keep alive for HTTP 1.1, but Firefox is most definitely utilising it.

These tickets talk about pipelining, not keep-alive. The chrome link specifically points out why they decided to disable it. Chrome and Firefox still support regular keep-alive and do reuse connections on http 1.1

Fair point, however I've seen no evidence Chrome is doing keep-alive on Linux or Mac OS - it always seems to send a FIN ACK after the response from the server to close the connection.

Firefox most definitely is utilising it fully, and obeys the Keep-Alive params specified, which I can't see any of the other browsers doing.

Just checked with Chromium 71 on Linux against lighttpd - keepalive works just fine, seeing html, css and js requested using the same connection. I'd also suspect something's up with your server's implementation.

FWIW, this is the reply header lighttpd sends:

    HTTP/1.1 200 OK
    Vary: Accept-Encoding
    Content-Encoding: gzip
    Last-Modified: Wed, 16 Sep 2015 08:50:41 GMT
    ETag: "713114043"
    Content-Type: application/javascript
    Accept-Ranges: bytes
    Content-Length: 9743
    Date: Fri, 08 Mar 2019 10:34:12 GMT
    Server: lighttpd/1.4.35

I've seen no evidence Chrome is doing keep-alive on Linux or Mac OS

That sounds like a bug with your server implementation. The web would melt down if Chrome's keep-alive support didn't work.

Possibly, and that's what I assumed at first, but I don't really see how: after sending the response the socket waits for more data in the form of another request from the user agent (I've tried with and without a poll / socket timeout).

Firefox does this perfectly. Chrome sends a FIN ACK in response to the response from the server, and it closes the connection its side, triggering the next recv() within the server to return 0 (or poll() to fail, depending on how I do it). But then it opens another connection after that with another request which could have been within the previous connection.

I also can't see Chrome doing it on non HTTPS websites in general when monitoring through wireguard, which is why I don't believe it's an issue with my server's implementation: I haven't actually observed Chrome utilise it at all on several computers on both Linux and Mac OS.

I don't know why you're seeing what you're seeing but your testing/monitoring setup sounds like it's at a bit too low level for the thing you're trying to figure out. A few basic things you might want (or should, really, if you're implementing HTTP from scratch) to try:

- Serve real content, say, a basic HTML page with a couple of external resources.

- Serve same and monitor (turn on all teh logs) with apache.

- Use the Chrome dev tools. In the network tab, right click on the column headers in the request list and enable the 'Connection ID' column. Keep in mind any modern browser will open concurrent connections in the base HTTP 1.1 case.

Thanks, but I've done this (although used Nginx instead).

The web server I've written is now serving several websites I'm hosting, some of which have thousands of images, so there's plenty of scope for connection re-use.

Each connection ID is different within dev tools in Chrome - although errors have 0 - (often sequential, sometimes there are huge gaps). And this is the same with HTTP websites on the net (just tried www.briscoes.co.nz and am still seeing the same thing in Chrome 72).

Using Wireshark I can see that Firefox is doing the correct thing, so I'm not sure why you think what I'm looking at is too low-level: I think it's pretty clearly something the client's doing at the TCP level to close the connection its end after its received the content length.

It may well be that I'm not sending a header Chrome's looking for, but I'm not sure what it is, as in its request it's sending "Connection: keep-alive", and the response has the same, and Firefox works fine.

It's almost like it's a client configuration or something, but I don't know what that would be, as I've tried on several machines.

'Too low level' is Wireshark before Chrome dev tools. I didn't know if you'd tried the dev tools since you didn't mention it. As to the other stuff, I'm not sure what portable Chrome-breaking field you emit - www.briscoes.co.nz instantly shows connection reuse on my end.


People seem to always get confused by this - http/1.1 is persistent by default and the vast majority of servers/clients use it.

The "Keep-Alive" header was something tacked onto http/1.0 and doesn't really mean anything these days.

Unfortunately, depends on the server. Node says this:

"Sending a 'Connection: keep-alive' will notify Node.js that the connection to the server should be persisted until the next request."

The article seems to confirm this behavior.

So clients have to account for non RFC compliant servers.

Are you sure about that? That seems to violate the http/1.1 RFC. I think the node.js docs are talking about http/1.0 there. http/1.1 uses the "Connection" header to indicate whether to persist the connection.

Sure, if Apple have fixed their issues with Keep-Alive for Safari:


According to https://tools.ietf.org/html/rfc2068#section- HTTP/1.1 defines the "Keep-Alive" header for use with keep-alive parameters but does not actually define any keep-alive parameters. I don't see a definition of this header in any of the RFCs that obsolete this one (just some mentions of issues with persistent connections and HTTP/1.0 servers, and the fact that Keep-Alive is a hop-by-hop header).

MDN's documentation on this header references https://tools.ietf.org/id/draft-thomson-hybi-http-timeout-01... for the parameters, but this is an experimental draft that expired in 2012.

Which is to say, I can't really fault Safari for not respecting keep-alive parameters that never made it out of the experimental draft phase.

You and eridius are talking about the Keep-Alive: header. The article at hand appears to be talking about the Connection: header.

A common mistake I see when people use libcurl is to create a new handle for each request, which pretty much guarantees no connection reuse. Reuse your handles for profit.

One more reason to do so: if you use client side TLS auth with a cert (or moreover a slow smartcard,) reauthenticating every connection will grind your performance to pieces.

What's the advice these days for when things are behind a reverse proxy or a load balancer, or both? I ran into issues where things end up balancing unevenly when multiple things are mixed.

Be aware of the maximum connection bottlenecks in your stack, especially if you have an interactive app.

I've had apache refuse new requests because old connections were holding slots.

I switched servers to Keep-Alive, but then I found out that it would introduce race conditions, at least with the Apache HTTP Client.

The client would in the middle of sending a new request, but the server would have already decided to close the connection and the request would fail.

I believe this is a common problem, can and yet the spec has nothing to address this obvious race condition.


Assuming the client and server HTTP implementations don't have any bugs, and there aren't any network devices (proxies, etc.) in between, sure. Are modern clients able to start at HTTP/2, and gracefully degrade through HTTP/1.1 with keep-alive down to HTTP/1.0 if necessary? That would be really cool if so.

My understanding was that with http/1.1 all connections are kept alive until a close is emitted. How is this different?

Based on https://en.wikipedia.org/wiki/HTTP_persistent_connection#HTT... it sounds like your statement is correct.

But the fact that the underlying HTTP connection is kept-alive by default doesn't necessary mean that the client is going to actually re-use that connection for multiple HTTP requests. And, in fact, in Node.js the connection is not reused by default.

Although the keep alive periods may vary:


Or better yet, use gzip and inline all images as base64 encoded. The file size is very similar to raw data, and the number of requests with associated http headers is reduced.

That has some major downsides. No caching, so any shared images need to be re-downloaded on every page. And the size of base64 if about 40% larger in my experience.

Request count is really not a big deal with HTTP/2 multiplexing.

Don't do this. Browsers are very optimized for subrequests and especially parsing image data.

By forcing base64, you're eliminating all the caching and using much more CPU power to parse that back into a binary image. You're also making the page load slower as the initial payload is bigger and image data has to be handled in line rather than asynchronously.

Might be a good idea for statically generated pages, or the statically generated parts of pages, but I’d be wary of adding any more load to dynamic pages.

Do any static site generators rewrite image links as data URIs?

> Do any static site generators rewrite image links as data URIs?

I'm using Hugo and this is at least no default behaviour, not sure though if there is a switch for that.

you don't need that npm library. just `new Agent({ keepAlive: true })` works

I believed keep alive was enabled by default on browsers with HTTP1.1

The benchmarks say "to make 1000 HTTP requests", but I wanna know for sure if it was `https` (SSL) or not.

The SSL connection init costs are real, although SSL session re-use can help there even without keep-alive.

These metrics were collected for HTTPS specifically. I've omitted the URL I used in the benchmark scripts, but am using an HTTPS agent. Scripts live here: https://github.com/mgartner/node-keep-alive-benchmark

Unfortunately most people's primary computing devices are smart phones and smart phone radios cannot keep a TCP connection open, or won't, because of power usage.

While this might be true for users connection to services there are also a lot of intra-serices connections that can keep TCP connections open and greatly benefit from doing so. For example, a microservice architecture where you have APIs communicating through HTTP with each other.

This article is talking about server-side services using keep-alive. Which are pretty unlikely to be running on a smart phone ;)

Keeping connection open does not require any action. Also radios have nothing to do with connections. Connections are abstractions from a different layer. Radios are shut down much more frequently than you'd think, to save power.

I thought the primary use case for this is you could load all of your websites resources on a single connection so you can make 100 file requests at the start and it only makes one connection.

Why does the radio have to be on just because there is an open TCP connection?

Are you asking why the radio has to be on or implying there are other reason that the radio could be on for? If you have a TCP connection open but the radio is off how are you going to receive incoming packets?

If it's a keep alive connection, you can just get the packets when you next turn on the radio. It's not a big deal to get the close right away; not that there's a good way to tell the system that. Anyway, you probably have a tcp connection open for the system push channel; if that requires the radio to stay on, it will be on.


too bad 99% of routers drop keepalives

TCP keepalives != HTTP keepalives.

Also, in my experience at least, it's not necessarily that routers drop TCP keep-alives, but rather that the keep-alive interval for most OSes is way longer than the router's connection timeout for idle entries in the NAT table.

I was burned hard by this in Azure. It seems that the default expiry time is around 4 minutes for the TCP load balancers. You can bump it to 30 min, but if I recall the default interval on Linux is 2 hours. Any long-standing idle TCP connections would get into a state where both sides believed they were connected, but the packets would get dropped to the floor. When the LB timed out, it didn't emit any FIN or RST packets, so neither side knew it had been torn down.

Fun debugging on that one. During the day there was enough activity to keep the connections alive, but at night they'd break. The overall behaviour was that the service worked great all day, but the first few actions out-of-business-hours would fail due to application-layer timeouts, and then everything would work great again until it had sat idle for a while.

You send heartbeats! There might be a max-connection-time but I haven't run into it, my connections being dropped through amazon infrastructure was solved by sending a few bytes (': <3' or '<!-- <3 -->') every 5 seconds or so.

TCP keepalive should solve the problem too. Rather than HTTP keepalive.

(i.e. To handle the case of "HTTP-Request", "huge delay", "final response". Rather than a streaming/chunking reply that is very long/slow.)

See my sibling post. TCP keep-alive can work, but you probably need to fiddle with OS-default settings for modern network equipment. I personally find the behaviour abhorrent, but my beard has more grey in it every day and I've accept that "this is how it is now"

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact