
HPACK: the silent killer (feature) of HTTP/2 - jgrahamc
http://blog.cloudflare.com/hpack-the-silent-killer-feature-of-http-2/
======
byuu
This really seems like overkill to me. We've made HTTP/2 that much more
complex to implement (a custom compression and dictionary not used anywhere
else) to compress headers. And complexity always leads to fun new
vulnerabilities. I guarantee at some point we'll see at least a DoS come out
of this.

I would add that it also consumes more resources to compress these headers (I
know huffman is fast; but it's not as fast as plaintext), but you can probably
cache a good portion of it, and then just encode the per-connection values at
the end. Although I'd be surprised if Cloudflare has nginx doing that.

Right now, my domain's HTTP header is 192 bytes:

    
    
        Content-Type: text/html; charset=utf-8
        Content-Length: 3884
        Connection: keep-alive
        Content-Security-Policy: upgrade-insecure-requests
        X-Content-Type-Options: nosniff
        X-Frame-Options: deny
    

And more than half of it is just optional security enhancements.

So now I'm going to compress that header so that it's ... what, 100 bytes?
Even on my site that's incredibly lean and lacks images, it's around 20-30 KiB
per page of content. Modern web pages now average being larger than the
original DOOM video game (2301 KiB.)

So we're doing all of this to chase roughly a 0.05% reduction in bandwidth per
page? We're worrying about the spot on the carpet while ignoring the elephant
in the room.

~~~
JoshTriplett
Many of your responses should be "304 Not Modified", in which case you don't
send anything back but the headers. Or, for API calls, you may have a tiny
response, or no response other than headers. In all of those cases, going from
192 bytes to (checking) 148 bytes the first time, or 6 bytes for a subsequent
request, might make the difference between fitting entirely in the initial
response packet or requiring an additional packet.

In addition, hpack also allows the client to compress their request headers,
which decreases your incoming traffic. And in that case, the majority of
requests will have very little data in them; ideally, only the path will
change between requests.

~~~
byuu
> Many of your responses should be "304 Not Modified", in which case you don't
> send anything back but the headers.

A 304 should be able to fit into a single frame, unless you're doing things
_very_ wrong. You don't even need headers in that case.

> In addition, hpack also allows the client to compress their request headers,
> which decreases your incoming traffic.

Yeah, that was the DoS concern I had. Lots of web server software checks the
incoming header sizes don't exceed a certain length. But with huffman
expansion, you can't quite make an INFLATE bomb, but if the implementation
doesn't have a backout function during the decompression, you could consume a
huge amount of memory by abusing this.

> ideally, only the path will change between requests.

Can you put the path at the bottom of an HTTP/2 request? Otherwise it would be
tricky with lots of bit-shifting to reuse existing compressed headers but
insert new paths onto the top of each one.

------
j_s
Does anyone have any practical experience with client certificates and HTTP/2?

[https://news.ycombinator.com/item?id=13022596](https://news.ycombinator.com/item?id=13022596)

 _I had to abandon [client certificates] because of HTTP /2\. If one site on
the web server uses TLS client auth, and then you go to another site on the
same server you receive HTTP 421 Misdirected Request, because of connection
reuse. And almost none browser can deal with them correctly (or could not few
months ago) - I'm looking at you Chrome, mobile Opera etc..._ -samsk
2016-Nov-28

~~~
jlgaddis
Chrome doesn't deal with (client) certificates well regardless of the HTTP
version in use, to be honest.

------
matt4077

        Response #1:
        date:Wed, 07 Sep 2016 21:41:23 GMT 
        expires:Wed, 07 Sep 2016 21:41:53 GMT
    
        Response #2:
        date:Wed, 07 Sep 2016 21:41:23 GMT 
        expires:Thu, 07 Sep 2017 21:41:23 GMT
    

> Although the two expires headers are almost identical, they can only be
> Huffman compressed, because they can't be matched in full.

It seems like it'd be be an easy win to round expiry times to the nearest
minute/hour/day. At least for requests such as these where that value is one
year in the future.

~~~
niftich
This. For HTTP 1.1 clients, Cache-Control already overrides anything set in
Expires (a header dating to HTTP 1.0), so losing some precision on an
approximate guess far into the future that's bound to be ignored by most
clients should be a sensible trade-off.

------
bsder
No offense. But if you had a magic way to reduce HTTP headers to _zero_ , it
would have no impact on me as a user.

The fact that a web page loads 40+ domains? Yeah, that's a problem.

The fact that a web page prioritizes serving me an ad so everything blocks
until it gets that ad? Yeah, that's a problem.

The fact that a web page downloads 25 Megabytes of data? Yeah, that's a
problem.

HTTP headers getting compressed? Not even on my radar.

~~~
schmichael
Your complaints about the web are valid but totally out of scope for the HTTP
spec.

~~~
bsder
Asking why everybody is wasting time on something _THAT WILL HAVE NO IMPACT ON
END USERS_ is quite in scope, thanks.

This benefits a very small number of companies who manage to serve very small,
highly optimized payloads and _nobody else_.

And _everybody else_ pays the cost in terms of complexity, security holes, and
implementation bugs.

~~~
byuu
> This benefits a very small number of companies ... And everybody else pays
> the cost in terms of complexity, security holes, and implementation bugs.

Yes, _exactly_!! SPDY, QUIC, et al are designed by Google for the Googles and
Facebooks of the world. It is not prioritizing what most sites need most.
Hell, most sites don't need this period. Optimizing even the most generous
1.4% that Cloudflare claims isn't going to have any meaning to sites ranked
below 100,000 in Alexa (eg 99% of them.) They're not going to run out of
bandwidth, and their traffic means they're only using ~3% of a $5/mo VPS.

You want software and website monocultures? This is how you get it.
Unbelievable complexity creep. All the fun we've had with OpenSSL
vulnerabilities? Get ready for them in the new QUIC-HTTP/2 servers.

TLS alone already basically makes it impossible to write your own web server
anymore, but at least that's a necessary evil due to ISP and state actor
spying abuses. If you're okay with a world where everyone has the choice
between nginx or Apache, I guess that's fine. I'm not, though.

------
merb
the worst thing of the spec is that it writes stuff like:

    
    
        An endpoint might choose to close a connection without sending a GOAWAY for misbehaving peers.
    

But it does not define how a misbehaving peer could look a like. Or

    
    
        WINDOW_UPDATE or RST_STREAM frames can be received in this state for a short period after a DATA or HEADERS
    

Actually it's really really hard to get such a time right. Since it's extremly
depending on the client. Also the http/2 state machine is extremly complex vs.
the statefulness of a http/1 client (no state machine at all). Also the spec
gives a lot of weight to the server/client where they can treat some errors
either as a stream only error or as a full error (i.e. goaway,
reconnect/disconnect). Not sure if it is good at all.

Push Promises are probably not even useful to the most servers/users. And also
have a flaky specification. It's okai to send them between data frames, but it
makes no sense to do it.

WebSockets don't work over http/2 and even with a extension they would
disconnect every 31-bit/2 streams (or less).

I think that http/2 is a good thing, but at the moment I would rather count on
http/2.1 that will get everything correct and made stuff significantly easier.

P.S.: Hpack is probably the easiest to implement/get right

~~~
Matthias247
I don't think performing a clean shutdown for misbehaving clients is that
important, as they are misbehaving anyway. The only thing were it helps is for
debugging the misbehaving application: The developer can see the error message
and see something got wrong instead of only seeing a closed connection. In
doubt closing the connection is probably always the simplest way to handle
invalid situations, as libraries will always have to deal with that situation.
Handling close/shutdown/goaway messages is much harder. I also don't consider
frames which are received for already reset streams as a big issue. Just
ignore them if they are for an unknown stream ID which is lower than the
highest established one.

However I agree there are a few rough edges in the HTTP/2 spec. The worst one
for me are race conditions around SETTINGS negotiations. If one
implementations wants to use a low window size or HPACK buffer size it must
still cope with the fact that the remote might send more data before it knows
about it - which it can only answer with a stream reset. This in turn might
cause many clients to assume that the server is not working or shutting down
and to create a new connection and try the same again. The only sensible way
for SETTINGS in my opinion is to use the default settings or higher ones -
which are unfortunatly so high that it makes HTTP/2 a poor fit for embedded or
IoT applications, where it could otherwise shine due to the less demanding
binary protocol.

The other thing that I don't like too much are the headers in special header
frames without flow control. Getting overall flow control correct (and
avoiding that attackers can exploit this) is hard. And from a layering
perspective it would have been much clearer if there would have been a lower
layer which allows to create flow-controlled multiplexed byte streams and a
higher layer which defines on how to transport headers and body data inside
these streams. Now we have one big HTTP/2 layer which covers everything. But I
guess that was done to get HPACK working. HPACK itself is IMHO ok.

------
rmdoss
I recommend watching this video by Fast.ly on some real metrics comparing
HTTP/2 and HTTP/1:

[https://www.youtube.com/watch?v=0yzJAKknE_k](https://www.youtube.com/watch?v=0yzJAKknE_k)

Overall, the performance gain is not as much (if any) compared to the hype we
hear.

~~~
youngtaff
If it's the video I'm thinking of they ran a huge number of tests, introduced
packet loss and saw some issues.

There's a few challenges here…

WebPageTest uses DummyNet to simulate packet loss and in a HTTP/1.x scenario
the not all TCP connections in the test will experience packet loss, whereas
all the HTTP/2 ones will. Now in the real world if there's packet loss on
route how likey is it that TCP connections experience packet loss?

Our H2 implementations are still young, this Cloudflare work is one example,
until they made this patch nginx didn't support HPACK and there are other
issues in other browsers and servers too.

As the implementations mature and we understand how to make the most of H2
then performance gains will appear - there's already evidence from people like
the FT that there are gains to be made now.

~~~
jgrahamc
Quite.

Real world experience beats synthetic tests. Shipping counts for a lot.

In the time that some other competitors have been talking about HTTP/2 we've
had it out for a year and have been experimenting and improving (server push,
HPACK, ...). Clearly, HTTP/2 is not the one-size-fits-all-answer-to-
everything, but it amuses me to see competitors not shipping, yet talking a
good game. See this sort of thing with HTTP/2, IPv6, ...

------
ape4
I wonder if HPACK will discourage the creation of new headers... since they
won't be in the static dictionary. Instead existing headers will be overloaded
(not so pretty).

~~~
kevincox
I doubt that because the information added would be similar in size. Contrast
`Foo: old-value unrealted-option=bar` to `Unrelated-Option: bar`. There is
almost no difference (only the overhead of another header in the protocol
actually. They would have to get way more creative in the representation to
get any real gains.

Also the HTTP authors haven't been too worried about header size in the past
so I doubt this will change just because we have a new protocol.

------
ape4
Whats with those header fields that start with a colon:

    
    
      :authority:blog.cloudflare.com
      :method:GET
      :path:/assets/images/cloudflare-sprite-small.png
      :scheme:https
    

Isn't that going to confuse every header parser.

~~~
pilif
That's just a way to spell out the HTTP 2 request-line (its http/1 equivalent
would be the very first line of the request like `GET / HTTP/1.0`) in human-
readable form.

No parser is ever going to parse these strings.

HTTP2 supporting libraries see the well-known binary values and libraries
without support for HTTP2 will fall-back to HTTP1 anyways where the human-
readable form of the request-line is, well, just the request line itself.

~~~
ape4
Thanks, now I know.

------
heisenbit
Significant smaller headers means significant smaller requests. That allows
the client to inform the server in a shorter time more comprehensively which
set of files are needed. Especially in the mobile use case where bandwidth in
the up-link is limited and the channel is not always stable this may really
improve user perceived performance.

I don't worry about compression effort too much - deflate has been used for a
long time.

------
grey-area
I see a lot of patches for nginx coming from cloudflare, how much of their
infrastructure is based on nginx, anyone know any good articles on their
internal setup?

~~~
jgrahamc
I don't think we've every done a blog post on the overall architecture of a
machine. But as of today it's a mixture of nginx (our own fork since nginx
folks won't accept all our patches), rrdns (our own DNS server written in Go),
a whole load of Lua (because we use OpenResty), a whole load of other services
written in Go, ...

Depending on the protocol (or version) traffic may or may not be served via
nginx.

~~~
garblegarble
>nginx (our own fork since nginx folks won't accept all our patches)

Is there any interest in publishing this whole fork? I know you already
publish separate modules to github, beyond that is there crossover with nginx
plus functionality they don't want to accept?

~~~
jgrahamc
The problem with publishing it is that we have to maintain it and sometimes we
put stuff in there that we wouldn't normally want to make public or it would
be a pain to extract because it's linked to our infrastructure.

Open sourcing stuff has a real cost. We do as much as we can.

------
aorth
Great savings. It would be great to know which upstream version of nginx these
"full" HPACK patches were merged to. Does anyone know?

~~~
aorth
I had asked the same question on the blog post, and the author responded there
saying the patches were not integrated into vanilla nginx yet, only
"upstreamed" to the nginx-devel mailing list in December, 2015.

[https://blog.cloudflare.com/hpack-the-silent-killer-
feature-...](https://blog.cloudflare.com/hpack-the-silent-killer-feature-of-
http-2/)

