
Real-world HTTP/2: 400GB of images per day - jhealy
https://99designs.com.au/tech-blog/blog/2016/07/14/real-world-http-2-400gb-of-images-per-day/
======
rgbrenner
Founder of NuevoCloud here. If I read this right, you guys used Cloudflare for
http 2. So let me ask you this, when you did your comparison, were all of the
images cached (ie: x-cache: hit) at the edge?

The reason I ask is because cloudflare, last I checked, still hasn't
implemented http2's client portion. So when a file is not cached, it does
this:

client <\--http2--> edge node <\--http 1.1--> origin server.

Http2 is only used for the short hop between the client and edge node.. then
the edge node uses http 1.1 for the connection to the origin server, which may
be thousands of miles away.

In other words, in your test, depending on the client location and the origin
server location.. your test may have used http 1.1 for the majority of the
distance.

If you guys want to rerun this test on our network, we use http2 everywhere...
your test would look like this on our network:

client <\--http2--> edge node (closest to client) <\--http2--> edge node
(closest to server) <\--http2--> origin server.

So even if your origin server doesn't support http2, it'll only use http 1.1
over the short hop between your server and the closest edge node.

You're welcome to email me if you want to discuss details you don't want to
post here.

Edit: I should also mention, that we use multiple http 2 connections between
our edge nodes and between the edgenode and origin server... removing that
bottleneck. So only the client <\--> edge node is a single http 2 connection.

~~~
xzyfer
To the best of my knowledge you are correct about how CloudFlare works. For
context this data was collected over the period about a month on real
production pages with significant traffic.

The edges were well and truely primed.

~~~
rgbrenner
although cloudflare doesn't manage caches on a per account basis. Each PoP has
a single LRU cache that's used for all customers. In other words, even if
you've primed it, your files may have been pushed out of the cache by a larger
customer.

In order to know this hasn't occurred, you really have to check the hit rate
cloudflare is reporting (for static files that rarely change, this should be
near or at 100%)... and when you're doing side-by-side comparisons (like the
speed index), you have to actually check the x-cache headers to verify that a
cache miss hasn't occurred. Otherwise, you wouldn't actually know that a
significant portion of traffic isn't being sent over http 1.1 (because of
cache misses).

~~~
brobinson
>Each PoP has a single LRU cache that's used for all customers.

Is this true for all tiers of paid accounts? Can someone from CF chime in
here?

~~~
mgw
As recently mentioned by a CloudFlare employee in this post
([https://news.ycombinator.com/item?id=11439582](https://news.ycombinator.com/item?id=11439582)):

> We cache as much as possible, for as long as possible. The more requested a
> file, the more likely it is to be in the cache even if you're on the Free
> plan. Lots of logic is applied to this, more than could fit in this reply.
> But importantly; there's no difference in how much you can cache between the
> plans. Wherever it is possible, we make the Free plan have as much
> capability as the other plans.

This does not confirm the exact statement but at least points in this
direction.

------
dpc_pw
I did not do any real tests and I might be completely wrong etc. but it seems
to me that http2 is going to perform poorly over wireless links like 3g.

With http1 one had N tcp connections, and with the way tcp slowly increases
the bandwidth used, and rapidly decreases it when packet is lost, even if any
packet were dropped (which will happen quite a lot on 3g) other tcp streams
were not delayed, or blocked, and can even utilize the leftover bandwidth,
yielded by the stream that lost the packet.

With http2 however there's one tcp connection, so dropped packets will cause
under-utilization of the bandwidth. On top of that dropped packets will cause
all frames after them, to be delayed in kernel receiving buffer until the
dropped packet is retransmitted, while in http1 case they would be available
at the app level right away.

HTTP2 being implemented on top of TCP always seemed like a weird choice. It
should have been UDP, IMO. That's why network accelerators like PacketZoom
make so much sense. Note: I work in PacketZoom, I did not do any in-depth
research on HTTP2, and this is my opinion, not necessarily of the company.

~~~
Lukasa
This is a real worry, but as with all things the actual behaviour of H2 on
lossy networks is more complex than that.

TCP's congestion control algorithms don't work that well when you have many
TCP streams competing for the same bandwidth. This is because while packet
loss is a property of the _link_ , not an individual TCP stream, each packet
loss event necessarily only affects one TCP stream. This means the others
don't get the true feedback about the lossiness of the connection. This
behaviour can lead to a situation where all of your TCP streams try to over-
ramp.

A single stream generally behaves better on such a link: it's getting a much
more complete picture of the world.

However, your HOL blocking concern is real. This is why QUIC is being worked
on. In QUIC, each HTTP request/response is streamed independently over UDP,
which gets the behaviour you're talking about here, while also maintaining an
overall view of packet loss for rate limiting purposes.

~~~
bsdetector
> _can lead to a situation where all of your TCP streams try to over-ramp. ...
> A single stream generally behaves better on such a link: it 's getting a
> much more complete picture of the world_

Multiple HTTP connections work better for exactly this reason, because they
are 'stealing' bandwidth from streaming video by 'not playing fair'. For
example, 6 connections ramping back up bandwidth at 6x the rate of a single
connection or sometimes only scaling back on 1/6th of the streams at once.

...which is fine because multiple parallel HTTP connections is usually a
browser doing so for short-lived data transfers for active users and it is not
the bulk of the data on the network.

------
xzyfer
Hey all, I'm the author of that blog post. I'll be floating around for a
while, happy to answer any questions.

~~~
xzyfer
I'm out y'all. Thanks for all the support. Feel free to follow up on Twitter
at @xzyfer

------
wongarsu
I don't think that the server is in charge of priorisation here. The server
can do it, but there is no reason to push this responsibility onto the server
when the browser can do it much better (for example the server can't know
what's in the viewport).

I expect this will be quickly sorted out by more mature HTTP/2 implementations
in browsers. Downloading every image at once is obviously a bad idea, and I
expect such naive behaviour will soon be replaced by decent heuristics (even
just downloading eight resources at once should be better in nearly all cases)

~~~
foota
I think the real solution hear is for the browser to be able to communicate
some sort of priority to the server, without having to download a limited
number of files at once.

~~~
xzyfer
Browser do currently do this. H2 has two types of prioritisations: weighted,
and dependency.

All browsers implement weighted resource prioritisation and weigh resources by
content type. This is a hold over from what they do for HTTP/1 connections.

Firefox has dependency resources prioritisation.
[https://bitsup.blogspot.com.au/2015/01/http2-dependency-
prio...](https://bitsup.blogspot.com.au/2015/01/http2-dependency-priorities-
in-firefox.html)

The spec purposely leave how these heuristics should work to the implementor.
Things will change and implementations will diverge over time.

The server ultimately being in control means we can tell the server what
resources are important for specific pages with absolute knowledge of the
page.

~~~
foota
Oh wow, that's cool. Do you know if servers currently support this? Would this
mostly be useful on a network level or do you think it would also be useful
for like trying to be more intelligent about scheduling?

~~~
xzyfer
It's hard to say for sure. Server implementations can vary wildly, make sure
to test any implementation closely. I know from talking to CloudFlare that
their implementation respects browser hints. Their implementation is also open
source.

------
joobus
One way to "solve" the time to visual completion would be to make all the
images, but especially the larger images, progressive scan. For very large
images, the difference in visual quality between 50% downloaded and 100%
downloaded on most devices isn't noticeable, so the page would appear complete
in half the time.

~~~
MichaelGG
If there's a way to tell it not to render until x% downloaded, sure. Otherwise
slower connections see the low-q versions for a while and it can
disconcerting. Either to some users or some PMs.

~~~
xzyfer
This is correct. Visually completion will not be achieved until the entirety
of the images within the viewport are downloaded.

However progressive jpegs could improve initial paint times. These are dynamic
so each page would have it's own unique (although related) profile.

------
mkj
This seems to bode badly for CDNs versus own tuned servers, unless there's a
way for origin websites to provide hints on response ordering?

~~~
hendry
damn, CloudFront already doesn't expose many tunables, so yeah, this isn't
going to work.

Starting to wish Appcache _manifest_ actually was made to work and that could
use used as a queue somehow to prioritise important assets on a Webpage.

~~~
taf2
ServiceWorkers are the app cache done right.
[https://github.com/w3c-webmob/ServiceWorkersDemos](https://github.com/w3c-webmob/ServiceWorkersDemos)

~~~
hendry
My point is that you can't map Service Workers onto a simple manifest, i.e. a
list of resources the httpd needs to push as a priority.

You "kindof" can with a Appcache "manifest". Stretch of the imagination, I
know.

------
cagenut
Did I read this right that http1 was with cdn A (unnamed?) and http2 was with
cdn B (cloudflare)?

If so, you really can't draw any conclusions about the protocol difference
when the pop locations, network designs, hardware and software configurations
could easily have made the kinds of differences you're seeing.

~~~
xzyfer
You read it correct.

By not moving our render blocking assets like CSS, JS and fonts over to the
http/2 we rule out performance changes due to improvements to head of line
block.

Our images were always on a separate hostname so the DNS lookup over is the
same. We also did some initial benchmarking and found the new CDN to be more
efficient than the old one.

------
thinkMOAR
Comparing two protocols using different providers, isn't that a bit comparing
pears and apples? And i have a doubt, which could be bad assumption, but that
it is on hardware you control or own and what exactly runs on it, and
potentially which other parties use it.

------
jjcm
Great writeup, and interesting seeing where http2 performed worse. Definitely
going to refer to this as I update my backend to http2.

~~~
xzyfer
Thanks mate, really appreciate it.

------
runeks
I'm really looking forward to see how much HTTP/2 will increase performance
for my Bitcoin payment channel server:
[https://github.com/runeksvendsen/restful-payment-channel-
ser...](https://github.com/runeksvendsen/restful-payment-channel-server/)

Just now I finished separating the front-end and back-end - by a RESTful
protocol - and this roughly halved performance compared to using a native
library (from ~2000 payments/second on my laptop to ~1000). I expect HTTP/2 to
make a greater percentage-wise difference here, although I admit I really have
no idea how much, say, ZeroMQ would have reduced performance, compared to
cutting it in half using HTTP/1.x.

I expect HTTP/2 to make a much greater difference in high performance
applications, where overhead becomes more important, which static file serving
doesn't really hit. So I think RESTful backend servers will see a much more
noticeable performance increase, especially since, if you use Chrome at least,
as an end-user you already get many of the HTTP/2 latency benefits through
SPDY.

~~~
arca_vorago
Aren't there more performant options than restful these days? Any particular
reason why you didn't choose those?

Would that be a good case for websockets?

~~~
runeks
There are definitely more performant options. I considered ZeroMQ for a while,
and almost decided on it, but went for HTTP/REST because of the built-in error
handling, and request-response style (as far as I can see, I would have to
implement all of this if I chose to use ZeroMQ).

I also chose HTTP because, at the end of the day, I still get almost ~1000
payment per second after the change (on a laptop). VISA handles 200k
payments/second at peak, so that's peak VISA levels on 200 MacBook Pros.

I see I might have misspoke when I said "front-end". It's really the front-end
(logic part) of the backend server, which now comprises two parts: stateless
logic ("front-end") and database (stateful) backend. So I haven't considered
Websockets.

------
manigandham
Some solutions:

\- Serve less data. The best speedup is when there's no more data to download
and if the throughput for clients is maxed out, then decreasing page weight
helps.

\- Use async bootstrap JS code to load in other scripts once images are done
loading or other page load events have fired.

\- Load less images in parallel, use JS to load one row of images at a time.

\- Use HTTP/2 push (which CloudFlare offers) to push some of the images/assets
with any other response. Push images with the original HTML and you'll start
getting the images to browser before it even parses the HTML and starts any
(prioritized) requests.

~~~
dalore
Wouldn't the standard solution of lazy loading images (and prioritizing
critical css) help. Since they are now trying to load everything on a big
page, they should only be trying to load everything above the fold.

~~~
manigandham
Yes, the basic approach is the same. There's limited bandwidth and they have
too many image assets going through the pipe at the same time. They can easily
control this by just loading a few at a time, whether that's the first page or
row or whatever (probably based on testing to see what "feels" the fastest).

It's a tried and tested approach and much better than just sending everything
in the HTML in a single blast. There are hundreds of image-based sites out
there, they all do this as an optimization.

------
pedalpete
We've recently moved to Google Cloud Storage from AWS because of http/2\. We
had a bottleneck of the browser waiting when serving multiple large (8+files *
10mb+each).

I'm wondering if 99designs looked at any sort of domain sharding to get around
the timing issues. If I understand correctly, wouldn't this get around the
priority queue issue? Your js,fonts, etc. coming from a different address than
your larger images, would create completely separate connections.

I'm not completely sure this would get around the issues mentioned, but I'm
curious if it was looked at as a solution.

~~~
xzyfer
The priority queue isn't the issue. In fact the priority queue is what kept
our first paint times tanking because browsers prioritised render blocking
resources instead of images.

The issue was due to the variance of image size. An image that is
significantly larger than the page average will be loaded slower since all
images get an equal share of bandwidth (priority). Adding sharding wouldn't
help since the client only has a fixed amount of bandwidth to share and all
images would still get the same share of it. Sharding could help if the
bandwidth bottle neck was at the CDN but that's rarely going to be the case.

------
diegorbaquero
Excellent and in-depth article. Thank you for sharing!

Hopefully we'll see a follow up with future changes and tweaks both from
webservers and browsers.

~~~
xzyfer
Thanks mate, glad you enjoyed it.

------
muteor
I thought that HTTP/2 didn't fix head-of-line blocking and this was why QUIC
([https://www.chromium.org/quic](https://www.chromium.org/quic)) existed.

From the project page:

Key features of QUIC over existing TCP+TLS+HTTP2 include

* Dramatically reduced connection establishment time

* Improved congestion control

* Multiplexing without head of line blocking

* Forward error correction

* Connection migration

~~~
Matthias247
IMHO HTTP/2 solves HOL blocking partly. It will allow other streams to proceed
if one stream is blocked due to flow control (receiver doesn't read from the
stream). E.g. if you have multiple parallel downloads over a single HTTP/2
connection one blocked/paused stream won't block the others.

However it doesn't have abilities that will allow individual streams to
proceed if some packets are lost that only hold information for a single
stream.

------
noahcollins
Thanks for posting your findings - very useful data. It would be interesting
to see the Webpagetest waterfalls in greater detail if you're able to share
that.

You planning to use your resource hints to enable server push at CDN edge?

~~~
xzyfer
Server push at the edge is problem atm. Current push semantics require the
HTML document say which resources to push. That's an issue if you're serving
assets off a CDN domain.

Asset domains make less since with h2 from a performance perspective but there
are still security concerns that need to addressed.

~~~
noahcollins
Good point if you're using push for page content that varies, like images in
the the 99designs portfolio and gallery. That gets into dynamic caching
territory.

As a first step, I'm focused on using push to cut latency between TTFB and
processing of render-blocking static assets. Serving those from same domain as
the base page, it should be easy for origin to supply edge with the list of
resources push. Either in the HTML or with the `link` response header. It also
means my critical assets are not longer behind a separate DNS lookup.

In the design gallery, this type of push approach could help you regain
control of loading priority and get your fonts loading before that wall of
images.

~~~
xzyfer
The priority queue isn't the issue. In fact the priority queue is what kept
our first paint times tanking because browsers prioritised render blocking
resources instead of images.

The issue was due to the variance of image size. An image that is
significantly larger than the page average will be loaded slower since all
images get an equal share of bandwidth (priority).

We could further improve first paint times by pushing render blocking
resources but we'd need to be serving those resources off the 99designs domain
(with current push implementations). This opens us up to a class security
issues we avoid by having an asset domain i.e. types of reflected XSS and
serving cookies on assets.

For now we'll wait for the webperf working group to address the limitations
with server push semantics.

~~~
noahcollins
Interesting note on the impact of image size variation on queue, thanks for
elaborating.

Serving those resources from the 99designs domain is worth a look. I
considered the cookies and security trade offs as well. I found H2 compressed
cookies enough to perform better than a separate cookieless domain for static
assets, due to the DNS savings. DNS times can be bad at high percentiles.
Reflected XSS addressed with a Content Security Policy. But I'm fortunate to
have user base that supports CSP well.

------
esher
we also did a far less sophisticated HTTP/2 reality check:
[https://blog.fortrabbit.com/http2-reality-
check](https://blog.fortrabbit.com/http2-reality-check)

about the same result: real world performance boost was not soooo big.

------
schallertd
And still no OpenSSL 1.0.2 nor ALPN on most distros such as Debian Jessie...
kinda sucks

~~~
secure
It’s available in jessie-backports since 2016-07-02, see
[https://packages.debian.org/jessie-backports/libssl-
dev](https://packages.debian.org/jessie-backports/libssl-dev)

Given that jessie is stable, OpenSSL will not be updated to a newer version,
only security updates will be made available.

What I’m trying to say: you’re never going to get it on jessie, unless you
enable backports, at which point you’ll have it readily available.

Hope that helps

~~~
schallertd
oh nice, thanks for the hint. didn't check the jessie-backports within the
last two weeks :-)

------
mmel
This is petty and inconsequential, but I really wish they used a section
header called "Conclusion" instead of "Take Aways".

~~~
benschwarz
You're right.

