
Why We Love QUIC and HTTP/3 - kickdaddy
https://www.fastly.com/blog/why-fastly-loves-quic-http3
======
drewg123
QUIC costs something like 2x to 4x as much CPU time to serve large files or
streams per byte as compared to TCP. This is because the anti-middlebox
protections _also_ mean that modern network hardware and software offloads
that greatly reduce CPU time cannot work with QUIC. When combined with the
fact that QUIC is userspace, that's just deadly for performance. I'm talking
about TSO, LRO (aka GRO), kTLS, and kTLS + hw encryption.

Let's compare a 100MB file served via TCP to the same file served via QUIC.

    
    
      TCP:
      - web server sends 2MB at a time, 50x times, via async sendfile (50 syscalls & kqueue notifications)
      - kernel reads data from disk, and encrypts.  The data is read once and written once by KTLS in the kernel.
      - TCP sends data to the NIC in large-sh chunks 1.5k to 64k at a time, lets say an average of 16k.  So the network stack runs 6250 times to transmit.
      - The client acks every other frame, so that's 33333 acks.  Let's say they are collapsed 2:1 by LRO, so the TCP stack runs 16,666 times to process acks
    
      QUIC:
      - web server mmaps or read()'s the file and encrypts it in userspace and sends it 1500b at a time (1 extra memory copy & 66,666 system calls)
      - UDP stack runs 66,666 to send data
      - UDP stack runs 33,333 number of times to receive QUIC acks (no idea what the aggregation is, lets say 2:1) 
      - kernel wakes up web server to process QUIC acks 33,333 times.
    

So for QUIC we have:

    
    
      - 4x as many network stack traversals due to the lack of TSO/LRO.
      - 1000x as many system calls, due to doing all the packet handing in userspace
      - at least one more data copy (kernel -> user) due to data handling in userspace.
    

Some of these can be solved, by either moving QUIC into the kernel, or by
using a DPDK-like userspace networking solution. However, the lack of TSO/LRO
even by itself is a killer for performance.

Disclaimer: I work on CDN performance. We've served 90Gb/s with a 12-core
Xeon-D. To serve the same amount of traffic with QUIC, you'd probably need
multiple Xeon Gold CPUS. I guess that Google can afford this.

~~~
toast0
In addition to those downsides, the QUIC spec points out middleboxes tend to
time out UDP streams pretty agressively, so it recommends a ping timer of 10
seconds.

Additionally, since QUIC streams allow for client IP mobility, that creates an
additional challenge for IP level load balancing as well as handling at the
host level. In a well configured host, TCP packets for a given stream will
always arrive at the same nic queue, on the same CPU, allowing the TCP data
structure to be local to that CPU and avoid cross-cpu locks. In QUIC, the next
packet can come from a new IP, which could be ECMP routed to a different host,
or arrive on a different NIC queue and a different CPU. Perhaps, your ECMP
router and NIC can be taught to look for the QUIC connection IDs, but that
doesn't seem at all certain.

~~~
Misdicorl
That's not really a fair comparison. In the case that the IP changes for quic,
tcp would have to completely re-establish the connection. A cross core memory
access is tiny in comparison.

------
move-on-by
As far as I’ve been able to determine, QUIC suffers from the same SNI data
leak that existing TLS versions with TCP has. I understand that ESNI is being
(or is already?) included in the TLS 1.3 spec, but it’s obviously optional at
this point.

Anyways, since QUIC is being touted everywhere as being very secure:

> [QUIC] protects both the data and the transport protocol itself

It seems like missing ESNI as a required feature is a bit of a glaring
omission. Does anyone have a better understanding? To me, it seems like a
great opportunity to make ESNI required for HTTP/3\. Much like how browsers
made TLS required for HTTP/2\. I would love further insights if anyone has
any.

~~~
toast0
The ESNI spec I've seen has clients request DNS TXT record to get the public
key for the encryption. My pessimistic assumption is that a majority of
clients are configured to use recursive DNS servers that will be unable to
serve TXT results because or network issues.

In any case, it's hard for a client to determine if a TXT record is not
present, or unavailable because the authoritative server has no such record,
or because something in the middle has blocked it (due to network incompetence
or active malice), so if you want that to work, you're going to need to
specify dns over https to a trusted third party, and de-decentralize DNS.

That said, from a brief look at the spec, the ESNI extension includes a digest
of the key record, so while an observer can't directly read the SNI, given a
sufficient effort to find the keys, they could correlate the digests with
matching hostnames.

Reaching key agreement to exchange identity information without disclosing the
identities in the clear is somewhere between really hard and impossible.

~~~
tialaramex
> That said, from a brief look at the spec, the ESNI extension includes a
> digest of the key record, so while an observer can't directly read the SNI,
> given a sufficient effort to find the keys, they could correlate the digests
> with matching hostnames.

I think you've probably misunderstood what's going on here

ESNIKeys, the data structure you're talking about, isn't a key for a specific
name, it's the key for the frontend server that's agreed to do ESNI and can be
used for ALL names offered on that frontend server.

Whatever name you're asking about, you get the same ESNIKeys values, whether
you wanted cat-photos.example.org or nazi-death-squad.example.net or boring-
corporate.example.com if they are all hosted on 10.20.30.40 (or that's a TLS
load balancer for perhaps different backends that don't face the outside
world) they all have the same ESNIKeys.

The fact they're shared is why there's a length field. We can only safely
protect names by padding them. Otherwise it doesn't take a genius to spot that
this-very-long-name.subdomains-matter-too.example.com encrypt to a far larger
structure than short.example. The length field says I promise all the names
I'm protecting with ESNI will fit in a name structure this long, just pad the
shorter ones.

The digest is in the ESNI setup because this way a server can go "Oh, you've
got last week's keys somehow. No, those won't work" or equally "Those are my
Cloudflare keys! We only use Cloudflare in North America, this is a European
server, we do AWS here, why have you got those?". Without a digest you have no
clue why this idiot client is sending you gibberish and you can't do
diagnostics.

------
felixhandte
This is a timely post, since IETF 104 is happening this week in Prague[1]. The
QUIC working group will be meeting on Tuesday and Wednesday to make progress
on standardization[2].

[1]
[https://datatracker.ietf.org/meeting/104/agenda.html](https://datatracker.ietf.org/meeting/104/agenda.html)

[2] [https://datatracker.ietf.org/doc/draft-ietf-quic-
transport/](https://datatracker.ietf.org/doc/draft-ietf-quic-transport/)

~~~
jabl
What about QUIC and L4S / TCP Prague? Are people working on something
equivalent for QUIC as well, or are they reimplementing TCP Reno in QUIC?

~~~
patrickmcmanus
tl;dr; congestion control is basically pluggable.

Much like in TCP, congestion control really isnt something required for
interoperation between peers. Given the userspace nature of QUIC I would
expect to see a lot of iteration on this front - for good and bad. (but
hopefully the bad iterates quickly).

The current drafts describe newreno is detail, but also explicitly call out
the ability to run other things. I've seen reno, cubic, and bbr all run with
quic and anticipate others to happen as well. That's one of the exciting
things here.

------
scurvy
QUIC will also usher in a new era of volumetric DDoS attacks. No longer can
content providers use upstream ACLs to block udp garbage and fragments. The
only option will be to use Fastly, AWS, or Cloudflare to ride out attacks.

QUIC is the tool to bring about the next phase of Internet centralization by
the mega players.

~~~
lclarkmichalek
QUIC actually requires that request packets must be larger than the responses,
until a handshake has been performed, in order to prevent reflection attacks.

~~~
skybrian
Do you mean larger or smaller? I thought the problem was amplification.

~~~
lclarkmichalek
Woops, fixed, thanks :)

------
collinf
> These interposing network elements, called middleboxes, often unwittingly
> disallow changes to TCP headers and behavior, even if the server and the
> client are both willing.

There is nothing worse than finding out that someone not even at the company
anymore decided years before to deploy some crap like this. Drives me
absolutely crazy to impose stuff like this where silos in companies means
transitioning involves on the order of 4-5 different "components" need to
change.

~~~
marcosdumay
Oh, modern networks are basically just a single huge middlebox with servers on
one side and intra|internet on the other side.

There isn't much opportunity for people to plug random stuff between your
server and the middlebox (the main middlebox would disallow it, like anything
else), but there is still plenty of crappy rules everywhere and nobody knows
why they exist or what they are. And you can't even call your ex-coworker and
ask for help, because it's an ex-employee of the middlebox company, not yours.

------
mcguire
" _TCP Fast Open is a stellar example of one such modification to TCP: eight
years after it was first proposed, it is still not widely deployed, largely
due to middleboxes._ "

Anyone remember TTCP?

~~~
londons_explore
Fast Open is a bad idea for a bunch of other reasons, mainly the client
spoofing their address yet still being able to use a lot of resources on the
server.

~~~
tialaramex
Where would the client get a valid cookie from if they are "spoofing their
address" ?

If they don't have a valid cookie Fast Open costs the same as regular TCP in
the face of adversaries trying to DOS you. You examine the packet, it doesn't
have a valid cookie, you discard it. No further work, just like ordinary TCP.

------
ioquatix
aka let's just put everything in the application layer because solving it at
the protocol layer is too difficult.

~~~
cdmckay
The article mentions gives an example of why it’s difficult to improve TCP
further.

TCP Fast Open was standardized 8 years ago and is barely used. This is because
updating TCP requires kernel updates, which just isn’t going to happen on most
mobile devices.

Thus, moving the protocol to userspace makes a lot of sense.

~~~
zamadatix
> Thus, moving the protocol to userspace makes a lot of sense.

Raw IP sockets are accessible from the same userspace facing APIs as e.g. UDP
sockets and don't require climbing up the stack. Unfortunately operating
systems started to consider custom protocol implementations security risks but
rather than reverse that thinking we've just continued to abstract up past it.

In reality I think "where it is implemented in code" was a small portion of
QUICs design choices compared to "IPv4 NAT & external firewalling has ossified
protocols" which is a similar story of "just abstact up to avoid the issues".
Unfortunately in that case I don't think abstracting up isn't as permanent a
solution as it was on the OS side.

~~~
bdonlan
Raw sockets don't really allow for multiple applications to use the same
custom protocol. If, for example, chrome and firefox were both running, which
gets packets destined for the QUIC transport protocol? The kernel wouldn't
know; without the UDP header it can't distinguish flows.

Likewise NAT devices typically support UDP flows today due to their prevalence
in games, but if you introduce a new transport protocol at the IP layer, they
wouldn't be able to identify which flow (and therefore which NATed endpoint)
the packet is destined for.

~~~
vbezhenar
Chrome and Firefox could develop standardized system service which will
deliver package to a proper application. NAT is not needed in a bright IPv6
world of the future.

Though I don't know what's wrong with UDP. 8 bytes of overhead for 1450 bytes
IP payload is 0.5% bandwidth. Checksum overhead should be negligible.

~~~
brozaman
> system service which will deliver package to a proper application

That's expensive. Even if you avoid copying the packages by sharing memory
between processes, there are still a lot of context switches...

------
tyingq
Hopefully vendors like Forcepoint are trying to keep up. The first rollout of
QUIC worked terribly in a lot of corporate environments because these MITM
content filtering solutions didn't pay attention.

~~~
windexh8er
It will be interesting because it's a problem for all middleboxes that do any
sort of deep packet inspection. Most of the devices that fall into this
category today leverage many performance gains made by the assumption that 1)
the majority of network layer traffic is TCP and 2) they have access to
certain levels of metadata for free.

Things are changing and getting a lot more difficult with HTTP/3 (IETF QUIC)
and TLS1.3. Many vendors are claiming TLS1.3 support today, but the
interesting thing is nobody is talking about the dismal performance
implications it has on packet processing. With TLS1.3 and without HTTP/3 all
sessions must use PFS for transform selection. And on top of it with 0RTT if a
client gets to the server before the middlebox does - then, I believe, it
becomes a failure scenario at the end user experience. Security vendors like
Fortinet, Forcepoint, Palo Alto Networks and Cisco are all up against the wall
long term. Consider they sell these devices for millions of dollars per device
in larger variations. Now we're moving back to taking a device that claims
tens to hundreds of gigabits of deep packet processing to, hundreds? They
won't share the performance impact with customers - because that will impact
financials, which will flow down to stock price, etc, etc. I feel as though
companies that bank on the middlebox (ie NGFW) know of the impending
apocalypse but are choosing, collectively, to stay quiet. Cisco did have an
article that indirectly admitted this but only in context of TLS1.3 and not
HTTP/3 [0].

What is the general consensus of others as we see HTTP/3 gain popularity? None
of the aforementioned vendors do MitM decrypt with Google properties riding
Google QUIC today, as ultimately they can't. The "security" coverage then
moves to software / endpoints to pick up the pieces (where plaintext traffic
is still available). But in the meantime I feel like the consumer of these
products is being told nothing for the sole upside of financials. I used to be
a huge proponent of NGFW and the visibility they brought. However I feel as
though those devices now give a very high false sense of security as they are
only able to catch very low hanging fruit and are simple to bypass [1]. I'm
curious what the collective here thinks about the futures of hardware network
security, and with that even SaaS based (eg ZScaler).

TL;DR If you're a CISO/CSO is it now a fools errand to continue to invest
money in middleboxes with the strong stronger crypto enforcement on the
horizon?

[0] [https://blogs.cisco.com/security/tls-1-3-and-forward-
secrecy...](https://blogs.cisco.com/security/tls-1-3-and-forward-secrecy-
count-us-in-and-heres-why) [1] [https://http-evader.semantic-
gap.de/](https://http-evader.semantic-gap.de/)

~~~
tialaramex
In the cases I've noticed the middlebox vendor claims TLS 1.3 only meaning
that now their product isn't critically insecure in the face of TLS 1.3. It
can't actually speak TLS 1.3 it just knows to say "Sorry, TLS 1.2 only"
without breaking everything.

In my country we had many televisions labelled HD Ready when HD television
first became available. Were these actually ready to play HD television? Er,
no. They could however tolerate existing in a world with HD while not being HD
themselves and this was what they marketed as "HD Ready".

Do you have examples where they actually do TLS 1.3?

~~~
tyingq
Good article on that topic, _" TLS 1.3 and Proxies"_:
[https://www.imperialviolet.org/2018/03/10/tls13.html](https://www.imperialviolet.org/2018/03/10/tls13.html)

HN discussion here:
[https://news.ycombinator.com/item?id=16564935](https://news.ycombinator.com/item?id=16564935)

~~~
tialaramex
Months after that post, at least two famous brand middleboxes were found to be
incompatible with the finished TLS 1.3 because somebody cut corners as
follows:

The specification says: YOU must choose RANDOM numbers otherwise bad things
could happen.

[ TLS 1.3 final hides a downgrade signal in those random numbers if you appear
to only speak TLS 1.2. The TLS 1.2 specification says nothing about a
downgrade signal, so if you recognise the signal that means you wanted TLS 1.3
but the server has been told you wanted TLS 1.2, a downgrade attack is being
attempted. Abort! ]

These famous brand middleboxes were too lazy to make random numbers, they'd
just take the exact numbers the real server picked and use those. Those are
random right? What could go wrong?

The result was that the TLS 1.2 Downgrade signal would get copied into
supposedly "fresh" TLS 1.2 connections and trip the abort mechanism.

Just an incompatibility right? Nope. For the years that this idiocy was in
those products they weren't actually delivering security, the requirement that
you pick RANDOM numbers is there for a good reason - if sophisticated bad guys
knew this "bug" was present in the famous brand middleboxes they could
definitely have exploited this to snoop connections.

------
Improvotter
I'm currently working with HTTP/2 (more specifically HAS with HTTP/2 Server
Push) and it's just a huge pain to find a high-level library that can help
with this. I fear that it'll take even longer for HTTP/3 to be adopted or
HTTP/2 might just be skipped altogether. Why are there so many server-side
implementations available for a variety of languages though many still lack
some features or a client-side implementation altogether?

~~~
vbezhenar
You can implement HTTP with a few hundreds LoC. It's an extremely simple
protocol. TLS is not simple, but it's independent of HTTP, so you can use a
separate implementation. HTTP/2 seems much harder.

------
ignoramous
/offtopic

Good folks at fastly, I hope you're reading... I've been waiting very
patiently for part 3 of this series for a good part of 2yrs now:
[https://www.fastly.com/blog/building-and-scaling-fastly-
netw...](https://www.fastly.com/blog/building-and-scaling-fastly-network-
part-2-balancing-requests)

------
ex3ndr
Are there ready to use mobile libraries for QUIC?

~~~
charleslmunger
[https://developer.android.com/guide/topics/connectivity/cron...](https://developer.android.com/guide/topics/connectivity/cronet)

------
cagenut
so the arista's are gonna be able to ecmp on it?

~~~
wmf
It still has UDP headers with port numbers that can be used for ECMP.
[https://tools.ietf.org/id/draft-ietf-quic-
manageability-00.h...](https://tools.ietf.org/id/draft-ietf-quic-
manageability-00.html)

------
http333
It seems that QUIC is a new transfer protocol created to replace TCP. QUIC
uses UDP and LTS/3 and solves the head of line problem addressed in HTTP2.
Furthermore, sending the data encrypted allows QUIC to begin transfering data
earlier. An experiment by google shows that in connections with high latency
or loss QUIC gives a 15% reduction in highest latency.

