
Why Turning on HTTP/2 Was a Mistake - sciurus
https://www.lucidchart.com/techblog/2019/04/10/why-turning-on-http2-was-a-mistake/
======
thayne
Author here.

Let me clarify a couple things I've seen in a lot of the comments.

First of all, we do load balance by request, not connections and we do not use
sticky sessions.

Secondly, we are aware that our application has underlying problems with large
numbers of concurrent requests, which was exposed by using HTTP/2 and we are
working on fixing them. The point of the post is not "HTTP/2 sucks because it
broke our app". It is "moving to HTTP/2 isn't necessarily as easy as just
flipping switch".

~~~
otterley
Perhaps the title, then, should have been "Why Turning on HTTP/2 Was a Mistake
(For Us)" so as not to imply that doing so is a universally bad idea.

~~~
geofft
I think it was clear enough - it didn't say "Why Turning On HTTP/2 Is A
Mistake" or "Why HTTP/2 Was A Mistake," either which would have implied
universal badness. The phrasing had me expecting a specific war story, which
is what it ended up being.

Besides, I think it's quite reasonable to argue that it's a _majority_ bad
idea, even if it's not a universally bad idea. I think many people are
probably in the same boat.

~~~
otterley
Without data, one cannot say either way. What we have before us is a single
anecdotal tale. And even that tale, even if you believe it rises to the level
of "data," doesn’t provide clear guidance to others, because the details of
their stack and capacity are unspecified.

~~~
ricardobeat
Your argument makes sense as a generic statement, but we don’t need a
statistically significant sample here. The behaviour described is a logical
consequence of how the network and application layers interact and will be
seen by anyone with a similar setup, not a natural event that demands more
data.

~~~
otterley
I’m not denying the probability that others similarly situated could
experience similar problems. But without the data, there’s no way to know
whether the author’s situation would be experienced by a majority of website
operators, or even a significant minority.

~~~
geofft
Sure there is—it is my experience as a website operator and as someone
familiar with the field of website operation that architectures like theirs
are more common than architectures unlike theirs. Professional experience is a
(highly compensated) form of gathering data.

~~~
otterley
And I’ve got 20 years of such experience that concludes their architecture is
less important than the fact that they simply ran out of peak capacity; and
that we do not know how many sites operate so near capacity for us to
conclusively determine whether a majority of them are at risk.

Who’s right?

------
dstaley
There's a lot of reasons I expected to see as their justification, but "our
application can't handle concurrent requests" wasn't exactly one of them.

~~~
thayne
Author here. Our application can handle concurrent requests just fine. The
problem was actually that our application was partly that our application was
trying to handle too many requests in parallel instead of queueing them, and
partly that later requests were timing out because our load balancers were
configured to expect clients to make a request, wait for the response then
send the next request, not send all the requests, then wait for all the
responses (which means the last response will take longer to complete from
when the request was first sent).

~~~
mholt
So, your application can handle concurrent requests just fine, as long as
clients make one request at a time?

~~~
floatingatoll
This misconstrues the comment it replies to. The post author's comment says
that their application was tuned for a certain level of concurrency, and that
when the level of concurrency to the load balancers increased due to the
HTTP/2 change, their load balancers increased the level of concurrency to the
backend, causing issues.

This is an extremely common issue with Apache configurations, which often
default to accepting hundreds of simultaneous requests without regard for
memory capacity. If peak demand causes enough workers to spin up that Apache
starts swapping, the entire server effectively goes offline.

Depending on the specific characteristics of the application, this could occur
when load increases from 50 concurrent requests to 51 concurrent requests, or
from 200 to 201, or from any integer A to B where A was fine but B causes the
server to become unresponsive.

Saying that their A is 1 seems unnecessarily dour, given how common this
problem has been over the past couple decades due to Apache's defaults alone.

~~~
cogman10
I've seen this mistake first hand.

"API server not fast enough, how do we fix it?"

"More threads! More connections!"

The problem, of course, is that HTTP2 is behaves like having infinite
connections, so the "more threads" on the server are almost always detrimental
to performance.

Less is more is the mantra I have unsuccessfully tried to drill. If your API
(assuming a basic rest like service) is running at 100% cpu utilization,
you've likely over provisioned it.

~~~
ec109685
If the api server is blocked on IO, more threads is a possibly fine solution.

~~~
cogman10
Possibly, just depends on what you are doing and where you are going from/to.

For example, 200 threads with 200 connections to a single service is insane
and likely causing you to be slow already. Increasing that will negatively
impact performance.

Going from 16 -> 32? That's more reasonable.

------
deathanatos
Really, this is an issue in the library/server: the library/server needs to
expose HTTP/2's controls on maximum permitted streams.

> _And secondly, because with HTTP /2, the requests were all sent
> together—instead of staggered like they were with HTTP/1.1—so their start
> times were closer together, which meant they were all likely to time out._

No, browsers can pipeline requests (send the requests back-to-back, without
first waiting for a response) in HTTP/1.1. The server has to send the
_responses_ in order, but it doesn't have to process them in that order if it
is willing to buffer the later responses in the case of head-of-line blocking.

Honestly, over the long run, this is a feature, not a bug. The server and
client can make better use of resources by not having a trivial CSS or JS
request waiting on a request that's blocked on a slow DB call. Yes, you
shouldn't overload your own server, but that's a matter of not trying to
process a large flood all simultaneously. (Or, IDK, maybe do, and just let the
OS scheduler deal with it.)

Also, if you don't want a ton of requests… don't have a gajillion
CSS/JS/webfont for privacy devouring ad networks? It takes 99 requests and 3.1
MB (before decompression) to load lucidchart.com.

> _If you do queue requests, you should be careful not to process requests
> after the client has timed out waiting for a response_

This is a real problem, but I've suffered through that plenty with synchronous
HTTP/1.1 servers; a thread blocks, but it's still got other requests buffered,
sometimes from that connection, sometimes from others. Good async frameworks
can handle these better, but they typically require some form of cancellation,
and my understanding is that that's notably absent from JavaScript & Go's
async primitives.

~~~
toast0
> No, browsers can pipeline requests (send the requests back-to-back, without
> first waiting for a response) in HTTP/1.1. The server has to send the
> responses in order, but it doesn't have to process them in that order if it
> is willing to buffer the later responses in the case of head-of-line
> blocking.

Browsers _can_ pipeline requests on http/1.1, but I don't think any of them
actually do in today's world, at least that's what MDN says. [1] And from my
recollection, very few browsers did pipelining prior to http/2 either -- the
chances of running into something broken were much too high.

[1] [https://developer.mozilla.org/en-
US/docs/Web/HTTP/Connection...](https://developer.mozilla.org/en-
US/docs/Web/HTTP/Connection_management_in_HTTP_1.x#HTTP_pipelining)

~~~
floatingatoll
When we first tried to enable HTTP/2 on our load balancers a few years ago, we
ended up breaking several applications built on (iirc) gunicorn. We eventually
determined the root cause to be:

1) The browser was sending a "streaming data follows" header flag followed by
a 0-byte DATA packet in the HTTP/2 stream to work around an ancient SPDY/3
bug.

2) The load balancer was responding to the HTTP/2 "streaming data follows"
header packet by activating pipelining to the HTTP/1.1 backend.

3) The backend was terminating the HTTP/1.1 connection from the load balancer
with a pipelining-unsupported error.

The browser removed the workaround, the load balancer vendor removed the
HTTP/2 frontend's ability to activate HTTP/1.1 pipelining, and after a few
months we were able to proceed.

Diagnosing this took weeks of wireshark, source code browsing, and
experimental testing. We were lucky that it broke so obviously that the
proximity to enabling HTTP/2 was obvious.

~~~
toast0
If you can recollect more details, I would love to know what happeend, but I'm
not sure about 3) I'm not aware of a pipelining-unsupported error in http (it
is a thing in SMTP). It would take a very special HTTP server to look for
another request in the socketbuffer after the current one and respond with
failure.

On the other hand, a quick search found evidence of some very special HTTP
servers doing bizarre things with HTTP:
[https://github.com/elastic/elasticsearch/issues/2665](https://github.com/elastic/elasticsearch/issues/2665)

~~~
floatingatoll
I looked it up and I remembered incorrectly: the bug was due to the load
balancer activating chunked transfer encoding to the backend nodes due to
receiving the described HTTP2 request. It did not involve pipelining.

~~~
toast0
Thank you, that makes more sense. Chunked transfer encoding is also a hidden
danger!

------
bastawhiz
How many requests does your page make on initial load (that can't be handled
by a CDN)? If you're making more than six XHRs to your application servers
concurrently, this sounds like a problem that would have existed anyway had it
not been for the browser's (rather arbitrary) connection limit.

It's also curious to me that the load balancer doesn't smooth this out. If you
have ten application servers and a client makes ten requests over a single
HTTP/2 connection, I'd expect each server to respond to one request each. The
details are a little fuzzy, but it sounds like the load balancer is only
distributing connections, not requests. That seems wrong.

High CPU load should be fine, really, if your application servers are
processing requests. If the load is unbalanced, then by definition you need a
load balancer to balance the load. If you have one and the load is unbalanced,
something is misconfigured.

~~~
thayne
Author here.

> How many requests does your page make on initial load (that can't be handled
> by a CDN)? If you're making more than six XHRs to your application servers
> concurrently, this sounds like a problem that would have existed anyway had
> it not been for the browser's (rather arbitrary) connection limit.

I don't know the exact number, but definitely higher than six. And I certainly
agree that is a failing in our application. The point is that the browserd
_did_ arbitrarily limit connections, and our application (unknowingly)
depended on that.

~~~
twblalock
I've seen the same kind of thing happen when people switched from one load
balancer to another: people were unaware they were dependent on a particular
kind of queueing or rate limiting to protect their backend services, and they
got hit hard when their new load balancer did not protect them in the same
way.

~~~
jrochkind1
Concurrency/rate limiting/queuing for HTTP apps, is, I agree, certainly not a
trivial thing. You want to be maximizing utilization of your available host
resources, while minimizing latency even under unexpected loads (for both
median and upper percentiles). Dealing mostly with CPU resource limits, but
other issues can be IO contention or contention for limited shared resources
like an rdbms, while also not maxing out your RAM.

As with anything involving concurrency and hard-to-predict-exactly usage
patterns, it can easily get complicated.

Who else remembers the [Heroku routing debacle of
2013]([https://blog.heroku.com/routing_performance_update](https://blog.heroku.com/routing_performance_update))?

This stuff ain't easy. Anyone who thinks this would only happen to an
unusually "wrong" app, I think, hasn't looked into it seriously. This post was
good information, I think it's unfortunate that so much of the discussion
seems to be trying to shame the author (making it less likely people in the
future will generously share such post-incident information!).

It can also be affected a lot by what platform you are on, the
language/framework and server architecture(s). They each have different
request-handling concurrency patterns, built-in features, and affordances and
standard practices. Node is gonna be really different from Rails. I am curious
what platforms were involved here.

~~~
twblalock
I think the reactions are somewhat justified because HTTP/2 was presented as
the problem -- or at least it seemed obvious to read the OP in that way.

If the OP had been framed as a cautionary tale about how the devs did not
realize their traffic patterns were throttling their requests, the reactions
probably would have been more positive.

~~~
jrochkind1
Turning on HTTP/2 led to a problem for them. That's what they said, that's
true, and it's a good warning for others, I don't think it will be at all rare
for others to have similar experiences, if they have a high volume. You can't
necessarily just turn on HTTP/2 without paying attention to how it will effect
your performance characteristics (which you _may_ never have paid much
attention to before). The nature of the potential problems that can arise with
concurrency/routing/queueing can make them not that obvious to diagnose/debug.
Your stack may have been tuned (by you, or by the open source
authors/community that established the defaults and best practices for
whatever you are using) for pre-HTTP/2 usage patterns.

This is useful notice, and post-mortem. Because I agree some discussion around
HTTP/2 seems to have the assumption that it will be basically a transparent
freebie.

Some people just like to feel superior. shrug. I was hoping for more
interesting discussion about HTTP request concurrency and queueing from those
who had been in the trenches, which is what you get from HN technical posts at
their best. Instead of a reddit-style battle over who was wrong and who is too
smart to make that mistake, which is what you get from HN technical posts at
their worst. :)

------
iforgotpassword
This is a neat little writeup. Although the issues were not fundamental and
relatively easy to spot and fix, it's valuable input especially since http/2
advocates seem to insist that you just need to put your webapp behind a http/2
capable proxy and you won't even notice a difference. We didn't enable it yet
on our servers and now there's definitely something to test first before we
roll out.

------
jrockway
This is why Envoy exists. It will take HTTP/2 requests from the user and shard
the actual requests out for backends to handle. It appears that what happened
to the author is that their web server only balanced TCP connections, which
indeed no longer works.

~~~
thayne
We were using AWS ALBs which load balance requests not connections (although
it should be noted they do not handle HTTP/2 prioritization). j

~~~
jrockway
I see. So it sounds like the issue was one of timing, where a bunch of
converted HTTP/1.1 requests all arrived at your application at the exact same
instant.

Did the ALB open a new TCP connection for each request, or does it use a pool
of connections?

~~~
thayne
I think it opens a new connection for each request.

------
tantalic
The underlying lesson that I have learned (the hard way) repeatedly: anything
that may change the traffic pattern can result in difficult to predict
infrastructure issues. This can be as related as a changing protocol (as shown
here) or a seemingly unrelated like a UI change.

------
stuff4ben
We experienced issues when we enabled H2 on our HAProxy 1.8 reverse proxies
into our K8s cluster. Didn't anticipate the increased memory consumption and
we ran into a few memory-related defects with older versions of HAProxy that
were fixed with more recent versions. We'll re-enable it at some point, but
we've upgraded our reverse proxies in anticipation of it.

------
gregoriol
What I understand is that their setup/app was broken but mitigated by HTTP/1
which is not that efficient?

------
thecompilr
There is a setting in HTTP/2 called SETTINGS_MAX_CONCURRENT_STREAMS, if set to
1 it works like HTTP/1.1, with no multiplexing. Setting it to 4~8 would make
it behave in a similar way a browser actually does with HTTP/1 (creating
multiple connections in parallel).

~~~
j16sdiz
This is a client side setting and default to 1000 in chrome. The http/1.1
equivalent was 8 or something like that.

~~~
floatingatoll
While it is correct to say that it is a client setting, it is _also_ a server
setting. The HTTP/2 specification uses the word "peer" since
SETTINGS_MAX_CONCURRENT_STREAMS (0x3) is negotiated in both directions as part
of the client/server handshakes.

    
    
      $ nghttp -v https://www.google.com | grep -C5 SETTINGS_MAX_CONCURRENT_STREAMS
      [  0.076] send SETTINGS frame <length=12, flags=0x00, stream_id=0>
                [SETTINGS_MAX_CONCURRENT_STREAMS(0x03):100]
      [  0.091] recv SETTINGS frame <length=18, flags=0x00, stream_id=0>
                [SETTINGS_MAX_CONCURRENT_STREAMS(0x03):100]

------
tie_
Surely your application/serving process is able to handle the request burst
coming from any single user ? (If not, you have a bigger problem to solver
first).

If so, I don't quite see why queueing is discussed as an option at all.
Queueing means extra latency and worse user experience (not to mention DoS
potential).

What you should be discussing instead is how to (auto-)scale your app and
infrastructure to handle your users' requests.

------
iamleppert
It’s hard to tell because there isn’t anything concrete provided.

That said, if this is a traditional web app, this smells to me of a poorly
designed application. It sounds like they’re doing on the fly compilation of
static assets or something crazy like that, and in any event need to reduce
the total number of requests per page or resource and look for opportunities
to make things static or cached?

------
schmichael
I think a glib attempt at a tl;dr would be:

> Sometimes "thundering herd" is a feature, not a bug.

To expand: HTTP/1.1 naturally caused the latency of the Internet to pace
requests. A sort of implicit intrinsic rate limiting. HTTP/2 intentionally
avoids that "problem" by batching/pipelining requests.

~~~
j16sdiz
There is a client-imposed limit. In chrome, it is 8 for http/1.1 and 1000 for
http/2

------
StopHammoTime
Great article on approaching massive technical change. Honestly, I think a lot
of people general think that most things are just a "switch flip". Even
something has implementing SSL on internal apps can cause a big change, let
alone the underlying protocol for managing your requests.

Thanks for this, because honestly I hadn't thought about the implications
myself and it'd be good not to accidentally walk into this problem.

------
emmelaich
It's actually a somewhat difficult thing to predict spikes. Say you have n
clients, d timeslices. When does the probability exceed 0.5 that you get more
than k requests concurrently? Unfortunately the solution is exponential in k.

People might recognise the birthday problem here; for d=365, k=2 (days a year,
single share) the well known answer is 23.

Wikipedia gives a formula for a rough approximation for n for p=0.5 and k <
20.

------
StreamBright
Isn’t it happening because the load balancer does not distribute the http
requests evenly? We used to use advanced loadbalancers that took the actual
http req from a client and used fix number or tcp connections to backend and
distributed the http requests through those. Maybe http/2 does not allow this
style of load balancing?

~~~
jrockway
HTTP/2 does allow this style of load balancing. Whether or not your load
balancer does it, however, is another thing completely.

------
zxcvbn4038
The article is short on detail but it sounds to me like they are balancing
their traffic by connection instead of by request. Either nginx or haproxy
should be able to spread those multiplexed requests across a number of servers
and give more the desired backend behavior.

~~~
lxe
Unless the application/request/session state is pinned to a host.

------
nhumrich
Do you terminate http/2 at the load balancer and convert it to http1.1? Or do
you support http/2 all the way to the end service? I would imagine the former
would solve these issues.

~~~
unscaled
If your service can't handle well a bunch of requests coming at once, it
doesn't matter if it gets them as individual HTTP/1.1 requests or as
multiplexed requests in HTTP/2 coming directly. It only makes a difference if
the bottleneck is HTTP/1.1 parsing logic.

------
xmichael999
Not sure what the authors application is, but we run a dozen servers behind
http/2 load balanced and dozens of sites and haven't seen anything similar to
what he is describing.

~~~
thayne
Author here.

Our application is Lucidchart (www.lucidchart.com). It is a very sophisticated
web-app with significant amounts of dynamic data,running on hundreds of
servers. I would imagine applications with less dynamic data and requests that
require substantial amount of compute wouldn't run into this problem.

------
KaiserPro
> decreases latency by multiplexing requests on the same TCP connection

On a decent connection, kinda. Anything mobile or worse mobile and moving will
suffer terribly.

------
manigandham
Unless they're making hundreds of requests per visitor I fail to see how the
load is shifted so drastically to make such an impact.

~~~
layoutIfNeeded
Spoiler: they are making hundreds of requests per visitor. Welcome to “modern”
webdev!

~~~
macspoofing
What's wrong with that? Lucidchart is a fully-featured application that runs
in a web browser with a cloud-backend. I don't understand what you're trying
to argue. That you have to optimize for request count? Why?

~~~
DCoder
There's an argument to be made that this [0] (generated via [1]) is _not_
something to be celebrated. But that's just me, not necessarily what the
parent poster had in mind.

[0]: [https://i.imgur.com/LhEahvi.png](https://i.imgur.com/LhEahvi.png)

[1]:
[https://www.evidon.com/solutions/trackermap/](https://www.evidon.com/solutions/trackermap/)

~~~
macspoofing
This is what's known as 'moving the goalpost'. OP never mentioned that their
issues with lucidchart were due to them using ad trackers and analytics
libraries. They talked about how terrible it was that an application made a
ton of requests. If they have an issue with the former, I take their point.

------
Izmaki
I was hoping to experience this as a lucidchart visualisation of "sweaty
spikes" and "work spreaders" because of too many "Internet tubes". Oh well.
Another time :D

------
stevefan1999
First of all, turning on HTTP/2 was not a mistake.

Second. turning it on when it is not mature yet is.

~~~
floatingatoll
The maturity of HTTP/2 is not a causative factor here. They removed a
previously-unaware limit on the number of concurrent backend requests, which
overflowed their backends. They could have experienced this same outage by
simply removing that limit _without_ enabling HTTP/2, and then hitting a peak
demand period that was sufficient to cause the outage. Yes, HTTP/2 changes
traffic patterns, but the issue could easily have occurred with HTTP/1 as
well.

------
EugeneOZ
Pathetic attempt to get some users from HN.

------
melan13
This is where micro-services shine.

~~~
macspoofing
Why? This has nothing to do with microservices.

~~~
melan13
It does, think twice about the CPU flow.

~~~
macspoofing
Uh huh. What is this, exercise for the reader?

I suspect you're just saying things to cover for the fact that you don't have
anything meaningful to say.

