
Performance Testing HTTP/1.1 vs. HTTP/2 vs. HTTP/2 and Server Push for REST APIs - treve
https://evertpot.com/h2-parallelism/
======
takeda
I really miss old Opera which correctly implemented HTTP pipelining[1] :/

[1]
[https://en.wikipedia.org/wiki/HTTP_pipelining](https://en.wikipedia.org/wiki/HTTP_pipelining)

Edit: article also doesn't mention about keep-alive, in fact it explicitly
states that HTTP/1.1 opens one connection per request which is not true. Makes
me think that the HTTP/1.1 demo disables KA to make effect even more dramatic.

Edit2: demos don't make actual network requests.

~~~
3xblah
It appears the author did not test HTTP/1.1 pipelining

Something like

    
    
        n=1;while :;do
        test $n -le 7||break;
        printf "GET /articles/"$n" HTTP/1.1\r\nHost:  api.example.org\r\nConnection: ";
        if test $n = 7;then printf "close\r\n\r\n";else printf "keep-alive\r\n\r\n";fi;
        n=$((n+1));
        done|openssl s_client -connect api.example.org:443 -ign_eof

~~~
cryptonector
Pipelining has been removed. Sadness.

~~~
3xblah
I have never seen it removed from a _server_. It is an option that some
website operators may disable. However most sites enable it, or the httpd they
use has it enabled by default.

~~~
cryptonector
Ok, sure, but if the clients don't implement it... does it matter that some
servers do?

EDIT: Well, maybe it does matter. E.g., if it creates request smuggling
vulnerabilities, say, in the presence of reverse proxies that don't interpret
the spec the same as the servers.

~~~
3xblah
I like pipelining as a 1.1 feature. I have successfully used TCP/TLS clients
like netcat and openssl to retrieve text/html in bulk for many years. The
servers always worked great for me.

However I never liked chunked encoding as a feature. In theory it sounds
reasonable but in practice it is a hassle. As TE chunked became more
widespread eventually I had to write a filter to process it. It is not perfect
but it seems to work.

Not surprised chunked encoding can cause problems on the server-side. NetBSD
has a httpd that tries to be standards-compliant but still has not implemented
chunked requests.

~~~
cryptonector
HTTP/2 only has chunked encoding (it's not called that, but it's what it is).
Chunked is much much nicer than not, because sometimes you don't know the
length a priori.

~~~
3xblah
Keyword: "sometimes"

Chunked is also unnecessary sometimes. For me, that happens to be most of the
time. Sometimes I can avoid it by specifying HTTP/1.0 but not all servers
respect that.

------
littlecranky67
I call BS on the benchmarks AND the theoretical analysis. Every time I read
those HTTP/X benchmarks, people don't mention TCP's congestion control and
seem to just ignore it in their analysis. Well, you can't. At least if you
call "realistic conditions". Congestion control will introduce additional
round trips, depending on your link parameters (bandwidth, latency, OS
configuration like initcwnd etc.) and limit your bandwidth on the transport
layer. And based on your link parameter, 6 parallel TCP connections _might_
achieve a higher bandwidth on cold start because the window scaling in tcp
slow start is superior than in a single tcp connection used by HTTP/2.

Additionally, the most common error people do while benchmarking (and i assume
the author did too) is to ignore the congestion control's caching of the cwnd
in the tcp stack of your OS. That is, once the cwnd is raised from usually 10
mss (~14,6kb for most OSes and setups), your OS will cache the value and re-
use the larger cwnd as initcwnd when you open a new tcp socket to the same
host. So if you do 100 benchmark runs, you will have one "real" run, and 99
will re-use the cwnd cache and produce non-realistic results. Given the author
didn't mention tcp congestion control at all and didn't mention any tweaks to
the tcp metrics cache (you can disable or reset it through changing
/proc/sys/net/ipv4/tcp_no_metrics_save on linux) I assume all the measured
numbers are BS.

~~~
jpambrun
I have use http2 and http1 with domain shardind extensively and the advantage
of http2 multiplexing cannot be denied. TCP connection overhead is negligible
(other than the tsl negotiation maybe) and is not even really a factor.

~~~
littlecranky67
TCP congestion window scaling is not negligible, especially not in high-
latency environments. The initcwnd is usually 14,6kb(10x MSS of 1460 byte in
common DSL setups), meaning a server can only send those 14,6 kb until the
connection is stalled and an ACK is received (=1x roundtrip). Google realized
this and tried to get a kernel patch into Linux where the browser (or any
userspace program) can manipulate the initcwnd. The patch got rejected, and
thats basically why they came up with QUIC: To be able to build on UDP and
implement congestion control and other stuff TCP usually does (ordering,
checksumming etc) in userspace.

------
ricardobeat
Out of curiosity, I implemented the same using websockets. You'll obviously
need a real implementation of a request-response interface, but this will do
for the test:

    
    
        // server
        ws.on('message', msg => {
          switch (msg) {
            case 'collection':
              return ws.send([...generate 500 links ...])
            case 'item':
              return ws.send(...generate item...)
          }
        })
    
        //client
        ws.send('collection');
        ws.onmessage = e => {
          const msg = JSON.parse(e.data)
          switch (msg.type) {
            case 'collection':
              collection.forEach(item => ws.send('item'))
            case 'item':
              items.push(item);
          }
        }
    

It finishes in 140-150ms, around the same time as the best HTTP version.
That's without any caching and the client requesting each of the 500 items,
when in fact the server could just push them. I'm surprised WS is not used
more often for data exchange.

EDIT: actually ~50ms after removing the console.log() I added for each
message...

~~~
hliyan
Do load balancers and reverse proxies play well with websockets?

~~~
londons_explore
Load balancing websockets can be tricky because an individual socket could
last many hours. That means when you do a software update, you need to leave
the old version running for many hours.

Same applies with http, but an individual request tends not to take many
hours.

------
yoz
Fascinating work, thank you!

Possible issue: have I misread it, or is the explanation of HTTP/3's
advantages entirely missing from this section[1]? It's in the section header,
but HTTP/3 isn't mentioned again until "The perfect world"[2] and that section
doesn't clarify how HTTP/3 matters to the discussion.

[1] [https://evertpot.com/h2-parallelism/#http2-and-
http3](https://evertpot.com/h2-parallelism/#http2-and-http3)

[2] [https://evertpot.com/h2-parallelism/#the-perfect-
world](https://evertpot.com/h2-parallelism/#the-perfect-world)

~~~
treve
Yea you are right. HTTP/3 is not really yet, at least for Node.js. Maybe worth
revisiting this in a year or so. I appreciate the criticism though.

------
mcguire
" _Compound requests are by far the fastest. This indicates that my original
guess was wrong. Even when caching comes into play, it still can’t beat just
re-sending the entire collection again in a single compounded response._ "

TCP is heavily optimized for shoving a single stream of bits across the wire.
The other options have more overhead from multiple requests.

------
time0ut
Many of my clients are behind MITM proxies that force a downgrade to 1.1, so I
end up having to optimize for 1.1 and minimize total requests anyway. We've
started down the path of GraphQL to aggregate requests, but it feels so
counter productive (with respect to performance/effort).

------
userbinator
I think the most interesting thing about this test is that it shows HTTP/2
isn't always faster than 1.1 --- in the first test, h1 compound is just as
fast as h2, and in the second, even slightly faster. Considering all the
additional complexity that HTTP/2 introduces, that doesn't seem like any
improvement.

Also, "relative speeds" should really be "relative times" because otherwise
the percentages imply the exact opposite. I was a little confused for a bit by
that.

~~~
rumanator
> in the first test, h1 compound is just as fast as h2

That's because HTTP/2 auto-compounds multiple requests to send in the same
connection, thus there is no surprise there.

The main advantage of HTTP/2 is that it ensures that we can get the same
benefit of compound data transfers in a completely transparent manner, without
bothering to rearchitect how clients and servers operate.

> Considering all the additional complexity that HTTP/2 introduces, that
> doesn't seem like any improvement.

This assertion completely misses the whole point of HTTP/2, and why it's a
huge improvement over HTTP/1.1.

~~~
treve
> The main advantage of HTTP/2 is that it ensures that we can get the same
> benefit of compound data transfers in a completely transparent manner,
> without bothering to rearchitect how clients and servers operate.

Yea this assertion is spot-on, but sadly HTTP/2 still has enough overhead that
if speed really is the most important requirement, compounding data is still
better. I was hoping the difference would be at least smaller.

------
snek
One of my favourite demos is
[https://http2.akamai.com/demo](https://http2.akamai.com/demo)

~~~
mathisonturing
HTTP/1.1 is constantly ahead when I run it on mobile connected to my Home
WiFi.

Edit: /2 is faster when on first load. When I refresh, /1.1 is almost always
faster.

~~~
EugeneOZ
ios, Safari, wifi (59 mbps on fast.com):

http/2 is 10 times faster on first load, 2-3 times on subsequent.

~~~
codefined
Windows, Firefox, Ethernet (220mbps on fast.com):

https/2 is ~0.1s (1.02s /2 vs 1.11s /1.1) faster on first load and ~0.2s
faster on future loads. I'm tempted to say there are diminishing returns as
your internet gets faster (latency is ~11ms atm).

------
alexghr
Very interesting article! I wonder how compound requests scale as body size
increases. Would we see individual HTTP2 requests become faster after items
become "large enough"?

------
candu
The thing that jumps out at me about HTTP/2: even if your browser can send N
parallel requests for different entities, you're still making N database
queries and incurring N request overheads.

This means that every layer of your stack has to be reorganized and optimized
around HTTP/2 traffic patterns...or you can just batch requests as before, and
save that overhead. It makes me think that the N entities problem isn't
actually the most promising use case for HTTP/2...

~~~
littlecranky67
I don't think most resources in todays web are database-dependent. It is
mostly static content (images, videos, scripts, stylesheets etc.) that are
large in filesize. A single resource, however, might trigger a multitude of db
queries (think calling a REST service endpoint).

Anyways, HTTP/2 allows multiplexing and waiting a single resource won't stall
any other requests, which can be sent whenver they are ready.

------
3xblah
Is there such a thing as an "HTTP connnection". Maybe the author means "TCP
connection"

Maybe "Quick UDP Internet Connection" (QUIC) in the case of HTTP/2

QUIC appeared after the publication of another UDP-based, congestion-
controlled protocol, CurveCP and used ECC by the author of CurveCP

QUIC seems tied to HTTP, whereas CurveCP, initially used for DNS, is
application-agnostic

What are some other applications besides web browsers and web servers that are
using QUIC

~~~
tialaramex
Yes there is such a thing as an HTTP connection. So now you learned something
valuable. Each connection consists of one or more HTTP requests. In HTTP/1.1
in practice you must complete an entire request and response before beginning
another.

(Edited to add, since you did) There are two things you might mean by QUIC.
Google's test protocol QUIC (these days often called 'gQUIC') was developed
and deployed in-house, and has all sorts of weird Google-isms, it's from the
same period as their SPDY which inspired HTTP/2\. gQUIC is no longer under
further development. They handed the specification over to the IETF years ago,
and the IETF's QUIC Working Group is working on an IETF standard QUIC which
will replace TCP for applications that want a robust high performance
encrypted protocol.

HTTP over QUIC will be named HTTP/3 and will offer most of the same benefits
as HTTP/2 (which is HTTP over TLS thus over TCP/IP) but improve privacy and
optimise some performance scenarios where head-of-line blocking in TCP was
previously a problem - probably some time in 2020 or 2021. The HTTPbis (bis is
a French word which has similar purpose in addresses as the suffix letter a
would in English e.g. instead of 45a you might live at 45bis) working group is
simultaneously fixing things in HTTP/2 and preparing for HTTP/3.

~~~
3xblah
"... where head-of-line blocking in TCP was previously a problem..."

Has anyone ever shared a demo where we can see this occuring with pipelined
HTTP/1.1 requests

I have been using HTTP/1.1 pipelining -- using TCP/TLS clients like netcat or
openssl -- for many years and I have always been very satisfied with the
results

Similar to the demo in the blog post, I am requesting a series of pages of
text (/articles/1, /articles/2, etc.)

I just want the text of the article to read, no images or ads. Before I send
the requests I put the URLs in the order in which I want to read them. With
pipelining, upon receipt I get automatic catenation of the pages into one
document. Rarely, I might want to split this into separate documents using
csplit.

HTTP/1.1 pipelining gives me the pages in the order I request them and opens
only one TCP connection. It's simple and reliable.

If I requested the pages in separate connections, in parallel, then I would
have to sort them out as/after I receive them. One connection might succeed,
another might fail. It just becomes more work.

~~~
zlynx
TCP blocking is seen the most on mobile data connections. The LTE could be
delivering whole megabytes of correct data but if that TCP connection is
missing just one packet it must buffer all the rest until it can get that SACK
/ ACK back to the server and get the missing packet.

It happens on WiFi too but not as badly.

~~~
3xblah
How common would it be on a wired connection.

~~~
tialaramex
If there's congestion, there is a random chance of any packets being dropped,
because that's how you signal congestion reliably. If there's neither
congestion nor wireless links on the route between you and the server neither
this, nor most other performance considerations matter to you, that's nice
you're already getting the best possible outcome.

------
ggm
Story told well, comprehensible. Thanks!

------
mgraczyk
Possibly so obvious it's not worth pointing out, but client side caching is
also important for server side scaling. If your clients cache X% of responses,
you can generally get away with ~X% fewer API servers.

So even if client performance doesn't improve much you should probably still
cache.

------
mehrdada
If you care about performance, gRPC/Protobufs would be something to look at
and compare against.

------
beders
Maybe I missed it in the article, but is gzip-compression enabled?

If so, the total amount of data transmitted will be quite different when a
compounded response is compared to individual ones.

------
throwawaysea
Great article; the demos were very intuitive.

What about alternatives to HTTP entirely? Why not rely on lower level
protocols/abstractions if performance is such a concern now?

~~~
hatch_q
They would be more intuitive if they were actually demos, not animations that
don't actually do any requests.

------
The_rationalist
Has the performance gain that bring http3 on mainstream workloads been
quantified/benchmarcked?

------
ncf_
neat work! didn't notice it mentioned in the article but, would there be a
processor load difference between the two?

