Let me clarify a couple things I've seen in a lot of the comments.
First of all, we do load balance by request, not connections and we do not use sticky sessions.
Secondly, we are aware that our application has underlying problems with large numbers of concurrent requests, which was exposed by using HTTP/2 and we are working on fixing them. The point of the post is not "HTTP/2 sucks because it broke our app". It is "moving to HTTP/2 isn't necessarily as easy as just flipping switch".
If you read the article, it's clearly discussing the problems that they experienced. I don't see why they need to qualify everything they said with IMO, IMX, for us, in our case, etc. At no point in the article itself does the author assert that their experience will match yours. Quite the opposite; the author goes out of their way to describe their environment and why they were impacted in particular. At some point, the reader needs to assume that the author is only talking about what they're talking about -- their specific situation -- and not asserting some omnipotent will to obliterate your opinion or disagreement because they didn't add "(For Us)" to the title.
In other words, complaining that the title isn't explicitly subjective is probably the weakest criticism you could make because, at best, it's a criticism of a rhetorical style rather than any criticism of the substance of the piece. Not only is the reader more than capable of coming to that decision, not only can they be expected to do so if they actually read it, you're not actually disagreeing with anything the article says nor are you presenting any alternative opinion and you're certainly not providing any contrary evidence. You're disagreeing, but there's no substance to your disagreement. "I had to read the article to understand what it was actually about," is not really a criticism!
HN is aimed at an engineering audience, not a political one. I, and others like me (at least used to) come here for enlightenment, not rhetoric.
Moreover, the article itself is not argumentative. It didn’t need to be spiced up with a provocative title to be valuable.
It doesn't matter if you're writing an opinion piece, a political piece, or a technical piece, if you want to structure your writing in a way that can be easily read, understood, and followed, you want to obey the principles of rhetoric.
i.e. strict pedants with no comprehension of/tolerance for ambiguity or emotion?
Yes it did. HN title-fu is the same appeal to emotion as any political blog and is just as effective at surfacing stories, thanks to people just like you.
You came to this thread and made your noise for pedantic reasons and helped surface it higher so the correct audience could actually see it
Besides, I think it's quite reasonable to argue that it's a majority bad idea, even if it's not a universally bad idea. I think many people are probably in the same boat.
A better title would be "why turning on http/2 was a mistake (for us)." The current one implies failure due to http/2 itself, and not architectural decisions that lead to making the upgrade to http/2 not as easy as imagined.
“Turning on HTTP/2 increased request burstiness, breaking our application”
If one enables HTTP/2 and production goes down, someone could quite rightly point out that "you performed action A, causing impact B, which broke the app". Determining in root cause analysis that impact B stemmed from underprovisioned peak demand compute resources in no way contradicts the usage of "broke".
I'm having some trouble picturing this. Can you add some numbers? Like, how many nodes is the load balancer spreading the load over, and how many simultaneous requests were you seeing from a browser?
Designing infrastructure for concurrent requests is definitely not. I've worked on shared hosting systems with high concurrency requirements and it definitely was more complicated than just installing an Apache MPM—we had to think about balancing load across multiple servers, whether virtualizing bare-metal machines into multiple VMs was worthwhile (in our case it was for a very site-specific reason), how many workers to run on each VM, how much memory we should expect to use for OS caching vs. application memory, how to trade off concurrent access to the same page vs. concurrent access to different pages vs. cold start of entirely new pages, whether dependencies like auth backends or SQL databases could handle concurrency and how much we needed to cache those, etc. At the end of the day you have a finite number of CPUs, a finite network pipe, and a finite amount of RAM. You can throw more money at many of these problems (although often not a SQL database) but you generally have a finite amount of money too.
I would be surprised if most people had the infrastructure to handle significantly increased concurrency, even at the same throughput, as their current load. It's not a sensible thing to invest infrastructure budget into, most of the time.
(You can, of course, solve this by developing software to actively limit concurrency. That's not a given for exactly the reasons that developing for concurrency is a given, and it sounds like Lucidchart didn't have that software and determined that switching back to HTTP/1.1 was as good as writing that software.)
Yes, most such cases should be rearchitected to not go through a single choke point. But my claim is that this isn't automatic merely by developing for the web, and going through a CP database system is a pretty standard choice for good reason.
In seriousness, though, I'm both curious and a little bit skeptical of what user experience benefit that architecture would give over a server-side request queue and a single worker against the queue. That would allow you to pay the cost of networking for the next request while the mainframe is working. You could even separate the submission of jobs from collecting the result so that a disconnected client could resume waiting for a response. Anyway, I'm not saying you needed all that to have a well-functioning system, I'm just not convinced that a single threaded architecture is ever actually good for the user unless it gives a marked reduction in overhead.
If you do expect the time to process requests to be multiple minutes in some cases, then you absolutely need a queue and some API for polling a request object to see if it's done yet. If you think that a request time over 30 seconds (including waiting for previous requests in flight) is a sign that something is broken, IMO user experience is improved if you spend engineering effort on making those things more resilient than building more distributed components that could themselves fail or at least make it harder to figure out where things broke.
Migrating to HTTP/2 delivers business value.
Responsive servers are more satisfying to customers than slow servers. That has business value.
Improving tangible metrics is not done to satisfy your emotions, it's done because of engineering rigor.
But to suddenly hit 50 for 1 second, then nothing for 9 seconds, well, that’s a tough spot to be in.
There must be some hard to find sequencing happening there, that they were not really exposed to before.
This is an extremely common issue with Apache configurations, which often default to accepting hundreds of simultaneous requests without regard for memory capacity. If peak demand causes enough workers to spin up that Apache starts swapping, the entire server effectively goes offline.
Depending on the specific characteristics of the application, this could occur when load increases from 50 concurrent requests to 51 concurrent requests, or from 200 to 201, or from any integer A to B where A was fine but B causes the server to become unresponsive.
Saying that their A is 1 seems unnecessarily dour, given how common this problem has been over the past couple decades due to Apache's defaults alone.
"API server not fast enough, how do we fix it?"
"More threads! More connections!"
The problem, of course, is that HTTP2 is behaves like having infinite connections, so the "more threads" on the server are almost always detrimental to performance.
Less is more is the mantra I have unsuccessfully tried to drill. If your API (assuming a basic rest like service) is running at 100% cpu utilization, you've likely over provisioned it.
For example, 200 threads with 200 connections to a single service is insane and likely causing you to be slow already. Increasing that will negatively impact performance.
Going from 16 -> 32? That's more reasonable.
The actual post seems perfectly reasonable though (essentially “you might think you can just turn on HTTP/2 as a drop in on your load balancer a but if your server code hasn’t been written to rapidly handle the quick bursts of requests that enable HTTP/2 to provide faster overall loads to the client then this can cause issues; you should test first and make sure your server systems are able to handle HTTP/2 request patterns.)
I appreciate when people share war stories; I like to think that wisdom is knowledge survived.
Presumably that's not the case for you?
A typical non-tuned Rails deployment, for instance, is gonna have queueing built in, with really not as much concurrency as one would want (enough to actually fully utilize the host; the opposite problem). So I'm guessing you aren't on Rails. :)
Curious what you are on, if you're willing to share, for how it effects concurrency defaults and options and affordances.
(I know full well that properly configuring/tuning this kind of concurrency is not trivial or one-size-fits all. And I am not at all surprised that http/2 changed the parameters disastrously, and appreciate your warning to pay attention to it. I think those who are thinking "it shouldn't matter" are under-experienced or misinformed.)
Sure. We use the Scala Play framework (https://www.playframework.com/). And it does have some queuing built into, but we have tweaked it to meet certain application needs.
Even then you would be handling more requests in parallel than the number of cores you have, but your concurrency would be limited by the cost of context switching and your memory capacity (having to allocate a sizable stack for each thread in most threading implementations).
Queueing is usually required for a stable multi-threaded server, but if you were doing async I/O you wouldn't need it. The extra memory overhead for each extra concurrent request (by means of lightweight coroutine stacks, callbacks or state machines) is not much different from the size it would take on the queue, and there is no preemptive context switching.
In most cases, you'll get the same behavior as having a queue here. Cooperative task-switching happens only on async I/O boundaries, so if you're processing a request that requires some CPU-heavy work, your application would just hog a core until it completes the request and then move to the next one.
It is not so easy. The article said it timed out (on the client).
When the queue is on client, the client start the timer when the request start.
That speed observed by clients should come from somewhere. In their case, they did not have a large reserve of performance to tap.
the problem is not making the pizzas in time but trying to get all pizzas started at once when there is not enough table space to even roll out that much dough, and then trying to squeeze all the pizzas into the oven at once, whereby several of them got messed up.
The logic here is not dissimilar at all: if the backend has no ability to queue and prioritise the requests, then the same function needs to be done elsewhere to safeguard quality of service.
This is only true when you look at a single client. If you look at a larger number of clients accessing the service at the same time, you would expect similar numbers of concurrent requests on HTTP/2 as on HTTP/1.1. Clients send larger numbers of requests at the same time, but they are done sending requests earlier so there are requests from fewer clients being processed concurrently. It should average out.
If you have, say, a 1000 clients accessing your service in one minute, I doubt the number of requests/second would be very different between both protocol versions. It would only be an issue if the service was built with a small number of concurrent users in mind.
Under HTTP/1.1 requests may have been hitting the LB and then being scattered across a dozen machines. Each of those machines was in a position to respond on their own time scale. Some requests would get back quickly, others slowly, but still actively being handled.
Under HTTP/2 with multiplexing, if the LB isn't set up to handle it (and they often aren't) they can be hitting the LB and _all_ ending up on a single machine, which is trying to process them while some of those requests might be requiring more significant processor resources, dragging the response rate for all the requests down simultaneously.
According to the author “we do load balance by request, not connections and we do not use sticky sessions.” (Source: https://news.ycombinator.com/item?id=19722637)
But it didn't, unless you're saying that Lucidchart made an incorrect analysis. Is that your argument?
>Clients send larger numbers of requests at the same time, but they are done sending requests earlier so there are requests from fewer clients being processed concurrently. It should average out.
Again, it didn't average out. And you assume it 'will average out' at your peril. Maybe it will, maybe it won't. Lucidchart engineers thought that too and it turns out that was wrong in a way that wasn't foreseen.
>It would only be an issue if the service was built with a small number of concurrent users in mind.
I doubt Lucidchart 'was built with a small number of concurrent users in mind'.
This comment suggests otherwise: https://news.ycombinator.com/item?id=19722637
> all existing applications can be delivered without modification....
> The only observable differences will be improved performance and availability of new capabilities...
Lucidcharts may have an inadequate backend, but it wasn't a problem until they moved to HTTP/2, so those statements weren't true for them. For anyone else rolling out HTTP/2, that is worth bearing in mind.
The change in traffic patterns http/2 imposes was.
Hence the blog post.
Most would not think about the fact that your spikiness could increase 20x.
> And secondly, because with HTTP/2, the requests were all sent together—instead of staggered like they were with HTTP/1.1—so their start times were closer together, which meant they were all likely to time out.
No, browsers can pipeline requests (send the requests back-to-back, without first waiting for a response) in HTTP/1.1. The server has to send the responses in order, but it doesn't have to process them in that order if it is willing to buffer the later responses in the case of head-of-line blocking.
Honestly, over the long run, this is a feature, not a bug. The server and client can make better use of resources by not having a trivial CSS or JS request waiting on a request that's blocked on a slow DB call. Yes, you shouldn't overload your own server, but that's a matter of not trying to process a large flood all simultaneously. (Or, IDK, maybe do, and just let the OS scheduler deal with it.)
Also, if you don't want a ton of requests… don't have a gajillion CSS/JS/webfont for privacy devouring ad networks? It takes 99 requests and 3.1 MB (before decompression) to load lucidchart.com.
> If you do queue requests, you should be careful not to process requests after the client has timed out waiting for a response
Browsers can pipeline requests on http/1.1, but I don't think any of them actually do in today's world, at least that's what MDN says.  And from my recollection, very few browsers did pipelining prior to http/2 either -- the chances of running into something broken were much too high.
1) The browser was sending a "streaming data follows" header flag followed by a 0-byte DATA packet in the HTTP/2 stream to work around an ancient SPDY/3 bug.
2) The load balancer was responding to the HTTP/2 "streaming data follows" header packet by activating pipelining to the HTTP/1.1 backend.
3) The backend was terminating the HTTP/1.1 connection from the load balancer with a pipelining-unsupported error.
The browser removed the workaround, the load balancer vendor removed the HTTP/2 frontend's ability to activate HTTP/1.1 pipelining, and after a few months we were able to proceed.
Diagnosing this took weeks of wireshark, source code browsing, and experimental testing. We were lucky that it broke so obviously that the proximity to enabling HTTP/2 was obvious.
On the other hand, a quick search found evidence of some very special HTTP servers doing bizarre things with HTTP: https://github.com/elastic/elasticsearch/issues/2665
Go added cancellation support to the standard library at 1.7. I don't like its coupling with contexts, but the implementation is solid and supported throughout most blocking operations, so this statement is patently untrue for Go.
It's also curious to me that the load balancer doesn't smooth this out. If you have ten application servers and a client makes ten requests over a single HTTP/2 connection, I'd expect each server to respond to one request each. The details are a little fuzzy, but it sounds like the load balancer is only distributing connections, not requests. That seems wrong.
High CPU load should be fine, really, if your application servers are processing requests. If the load is unbalanced, then by definition you need a load balancer to balance the load. If you have one and the load is unbalanced, something is misconfigured.
> How many requests does your page make on initial load (that can't be handled by a CDN)? If you're making more than six XHRs to your application servers concurrently, this sounds like a problem that would have existed anyway had it not been for the browser's (rather arbitrary) connection limit.
I don't know the exact number, but definitely higher than six. And I certainly agree that is a failing in our application. The point is that the browserd _did_ arbitrarily limit connections, and our application (unknowingly) depended on that.
As with anything involving concurrency and hard-to-predict-exactly usage patterns, it can easily get complicated.
Who else remembers the [Heroku routing debacle of 2013](https://blog.heroku.com/routing_performance_update)?
This stuff ain't easy. Anyone who thinks this would only happen to an unusually "wrong" app, I think, hasn't looked into it seriously. This post was good information, I think it's unfortunate that so much of the discussion seems to be trying to shame the author (making it less likely people in the future will generously share such post-incident information!).
It can also be affected a lot by what platform you are on, the language/framework and server architecture(s). They each have different request-handling concurrency patterns, built-in features, and affordances and standard practices. Node is gonna be really different from Rails. I am curious what platforms were involved here.
If the OP had been framed as a cautionary tale about how the devs did not realize their traffic patterns were throttling their requests, the reactions probably would have been more positive.
This is useful notice, and post-mortem. Because I agree some discussion around HTTP/2 seems to have the assumption that it will be basically a transparent freebie.
Some people just like to feel superior. shrug. I was hoping for more interesting discussion about HTTP request concurrency and queueing from those who had been in the trenches, which is what you get from HN technical posts at their best. Instead of a reddit-style battle over who was wrong and who is too smart to make that mistake, which is what you get from HN technical posts at their worst. :)
I'm not exactly blaming HTTP/2, just saying the claim that switching to HTTP/2 is easy, safe, and only brings benefits is false.
Eh. Technically anything could cause problems. I don't think you'll find much in the way of claims that swapping out subsystems could only bring benefits.
Ah, the good old "unrealised infrastructure dependency" - nice to see you my old friend. People that have never been bitten by one of these never built anything worth talking about :)
It's worth observing that Gatling (load testing tool) supports HTTP/2.
Once you've got the hang of it, you can fairly easily build load profiles to simulate situations like these. Probably wouldn't have helped you prevent the situation - unknown-unknowns being what they are; but you might find it helpful during remediation.
They may be using sticky sessions or affinity in some regard, having the load balancer hold each client connection intentionally to a server. It's not necessarily wrong, entirely depends on what you need to accomplish.
This might not be such a problem with one client artificially limited to a single application server. But in practice, it means that individual servers will be overloaded when they are chosen to handle multiple clients concurrently (while other servers are idle).
A lot of people set up their load balancers with session pinning (i.e. always choose the same backend based on the session id). This can improve things like cache performance.
Not sure if this is the case here, but it sounds like it.
Did the ALB open a new TCP connection for each request, or does it use a pool of connections?
$ nghttp -v https://www.google.com | grep -C5 SETTINGS_MAX_CONCURRENT_STREAMS
[ 0.076] send SETTINGS frame <length=12, flags=0x00, stream_id=0>
[ 0.091] recv SETTINGS frame <length=18, flags=0x00, stream_id=0>
If so, I don't quite see why queueing is discussed as an option at all. Queueing means extra latency and worse user experience (not to mention DoS potential).
What you should be discussing instead is how to (auto-)scale your app and infrastructure to handle your users' requests.
That said, if this is a traditional web app, this smells to me of a poorly designed application. It sounds like they’re doing on the fly compilation of static assets or something crazy like that, and in any event need to reduce the total number of requests per page or resource and look for opportunities to make things static or cached?
> Sometimes "thundering herd" is a feature, not a bug.
To expand: HTTP/1.1 naturally caused the latency of the Internet to pace requests. A sort of implicit intrinsic rate limiting. HTTP/2 intentionally avoids that "problem" by batching/pipelining requests.
Thanks for this, because honestly I hadn't thought about the implications myself and it'd be good not to accidentally walk into this problem.
People might recognise the birthday problem here; for d=365, k=2 (days a year, single share) the well known answer is 23.
Wikipedia gives a formula for a rough approximation for n for p=0.5 and k < 20.
Our application is Lucidchart (www.lucidchart.com). It is a very sophisticated web-app with significant amounts of dynamic data,running on hundreds of servers. I would imagine applications with less dynamic data and requests that require substantial amount of compute wouldn't run into this problem.
On a decent connection, kinda. Anything mobile or worse mobile and moving will suffer terribly.
Second. turning it on when it is not mature yet is.
I suspect you're just saying things to cover for the fact that you don't have anything meaningful to say.