Curl HTTP/3 Performance

hlandau · 2024-01-28T10:01:45 1706436105

Author of the OpenSSL QUIC stack here. Great writeup.

TBQH, I'm actually really pleased with these performance figures - we haven't had time yet to do this kind of profiling or make any optimisations. So what we're seeing here is the performance prior to any kind of concerted measurement or optimisation effort on our part. In that context I'm actually very pleasantly surprised at how close things are to existing, more mature implementations in some of these benchmarks. Of course there's now plenty of tuning and optimisation work to be done to close this gap.

apitman · 2024-01-28T16:15:05 1706458505

I'm curious if you've architected it in such a way that it lends itself to optimization in the future? I'd love to hear more about how these sorts of things are planned, especially in large C projects.

hlandau · 2024-01-28T16:42:34 1706460154

As much as possible, yes.

With something like QUIC "optimisation" breaks down into two areas: performance tuning in terms of algorithms, and tuning for throughput or latency in terms of how the protocol is used.

The first part is actually not the major issue, at least in our design everything is pretty efficient and designed to avoid unnecessary copying. Most of the optimisation I'm talking about above is not about things like CPU usage but things like tuning loss detection, congestion control and how to schedule different types of data into different packets. In other words, a question of tuning to make more optimal decisions in terms of how to use the network, as opposed to reducing the execution time of some algorithm. These aren't QUIC specific issues but largely intrinsic to the process of developing a transport protocol implementation.

It is true that QUIC is intrinsically less efficient than say, TCP+TLS in terms of CPU load. There are various reasons for this, but one is that QUIC performs encryption per packet, whereas TLS performs encryption per TLS record, where one record can be larger than one packet (which is limited by the MTU). I believe there's some discussion ongoing on possible ways to improve on this.

There are also features which can be added to enhance performance, like UDP GSO, or extensions like the currently in development ACK frequency proposal.

Matthias247 · 2024-01-28T20:14:40 1706472880

Actually the benchmarks just measure the first part (cpu efficiency) since it’s a localhost benchmark. The gap will be most likely due to missing GSO if it’s not implemented. Its such a huge difference, and pretty much the only thing which can prevent QUIC from being totally inefficient.

apitman · 2024-01-30T14:51:36 1706626296

Thank you for the details!

benreesman · 2024-01-28T16:45:39 1706460339

Thank you kindly for your work. These protocols are critically important and more the more high-quality and open implementations exist the more likely they are to be free and inclusive.

Also, hat tip for such competitive performance on an untuned implementation.

spullara · 2024-01-28T18:56:21 1706468181

Are there good reasons to use HTTP3/QUIC that aren't based on performance?

zamadatix · 2024-01-28T19:53:43 1706471623

I suppose that depends on your definitions of "good" and what counts as being "based on performance". For instance QUIC and HTTP/3 support better reliability via things like FEC and connection migration. You can resume a session on a different network (think going from Wi-Fi to cellular or similar) instead of recreating the session and FEC can make the delivery of messages more reliable. At the same time you could argue both of these ultimately just impact performance depending on how you choose to measure them.

Something more agreeably not performance based is the security is better. E.g. more of the conversation is enclosed in encryption at the protocol layer. Whether that's a good reason depends on who you ask though.

Matthias247 · 2024-01-28T19:52:32 1706471552

We need to distinguish between performance (throughput over a congested/lossy connection) and efficiency (cpu and memory usage). Quic can achieve higher performance, but will always be less efficient. The linked benchmark actually just measures efficiency since it’s about sending data over loopback on the same host

jzwinck · 2024-01-28T20:10:41 1706472641

What makes QUIC less efficient in CPU and memory usage?

drewg123 · 2024-01-28T20:32:13 1706473933

Quic throws away roughly 40 years of performance optimizations that operating systems and network card vendors have done for TCP. For example (based on the server side)

- sendfile() cannot be done with QUIC, since the QUIC stack runs in userspace. That means that data must be read into kernel memory, copied to the webserver's memory, then copied back into the kernel, then sent down to the NIC. Worse, if crypto is not offloaded, userspace also needs to encrypt the data.

- LSO/LRO are (mostly) not implemented in hardware for QUIC, meaning that the NIC is sent 1500b packets, rather than being sent a 64K packet that it segments down to 1500b.

- The crypto is designed to prevent MiTM attacks, which also makes doing NIC crypto offload a lot harder. I'm not currently aware of any mainstream (eg, not an FPGA by a startup) that can do inline TLS offload for QUIC.

There is work ongoing by a lot of folks to make this better. But at least for now, on the server side, Quic is roughly an order of magnitude less efficient than TCP.

I did some experiments last year for a talk I gave which approximated loosing the optimizations above. https://people.freebsd.org/~gallatin/talks/euro2022.pdf For a video CDN type workload with static content, we'd go from being about to serve ~400Gb/s per single-core AMD "rome" based EPYC (with plenty of CPU idle) to less than 100Gb/s per server with the CPU maxed out.

Workloads where the content is not static and has to be touched already in userspace, things won't be so comparatively bad.

tialaramex · 2024-01-28T21:22:49 1706476969

- The crypto is designed to prevent MiTM attacks, which also makes doing NIC crypto offload a lot harder.

Huh? Surely what you're doing in the accelerated path is just AES encryption/ decryption with a parameterised key which can't be much different from TLS?

Matthias247 · 2024-01-28T20:24:53 1706473493

Among others: having to transmit 1200-1500byte packets individually to the kernel, which it will all route, filter (iptables, nftables, ebpf) individually instead of just acting on much bigger data chunks for TCP. With GSO it gets a bit better, but it’s still far off from what can be done for TCP.

Then there’s the userspace work for assembling and encrypting all these tiny packets individually, and looking up the right datastructures (connections, streams).

And there’s challenges load balancing the load of multiple Quic connections or streams across CPU cores. If only one core dequeues UDP datagrams for all connections on an endpoint then those will be bottlenecked by that core - whereas for TCP the kernel and drivers can already do more work with multiple receive queues and threads. And while one can run multiple sockets and threads with port reuse, it poses other challenges if a packet for a certain connection gets routed to the wrong thread due to connection migration. Theres also solutions for that - eg in the form of sophisticated eBPF programs. But they require a lot of work and are hard to apply for regular users that just want to use QUIC as a library.

o11c · 2024-01-28T21:24:01 1706477041

TCP has at least one unfixable security exploit: literally anybody on the network can reset your connection. Availability is 1/3 of security, remember.

foofie · 2024-01-28T13:03:21 1706447001

How awesome is that? Thank you for all your harf work. It's thanks to people such as yourself that the whole world keeps on working.

Obligatory:

https://xkcd.com/2347/

vitus · 2024-01-28T14:15:23 1706451323

It is promising to see that openssl-quic serial throughput is within 10-20% of more mature implementations such as quiche. (Which quiche, though? Is this Google's quiche, written in C++, or Cloudflare's quiche, written in Rust? It turns out that's approximately the only word that starts with "quic" that isn't a derivative of "quick".)

One of QUIC's weaknesses is that it's known to be much less CPU efficient, largely due to the lack of things like HW offload for TLS.

> Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server.

To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests. You'll see connection pooling of, uh, 6 (at least for Chrome and Firefox), so the problems of head-of-line blocking that HTTP/2 and HTTP/3 attempt to solve would have manifested in more realistic benchmarks.

Some questions I have:

- What kind of CPU is in use? How much actual hw parallelism do you have in practice?

- Are these requests actually going over the network (even a LAN)? What's the MTU?

- How many trials went into each of these graphs? What are the error bars on these?

jsty · 2024-01-28T15:52:19 1706457139

Looks like Cloudflare quiche:

https://github.com/curl/curl/blob/0f4c19b66ad5c646ebc3c4268a...

pclmulqdq · 2024-01-28T17:34:41 1706463281

Hardware offload should be protocol-independent, but I suppose most network cards assume some stuff about TLS and aren't set up for QUIC?

Matthias247 · 2024-01-28T19:47:10 1706471230

NICs assume stuff for TCP (segmentation offload) that they can’t do for UDP, or can only do in a very limited fashion (GSO).

TLS offloads are very niche. There’s barely anyone using them in production, and the benchmarks are very likely without

ndriscoll · 2024-01-28T16:37:05 1706459825

> To emphasize this point: no modern browser will open 50 concurrent connections to the same server for 50 GET requests.

They will. You just need to go bump that number in the settings. :-)

secondcoming · 2024-01-28T20:26:09 1706473569

Browsers aren't the only things that connect to servers that speak HTTP.

BitPirate · 2024-01-28T10:49:17 1706438957

The performance difference between H1/H2 and H3 in this test doesn't really surprise me. The obvious part is the highly optimised TCP stack. But I fear that the benchmark setup itself might be a bit flawed.

The biggest factor is the caddy version used for the benchmark. The quic-go library in caddy v2.6.2 lacks GSO support, which is crucial to avoid high syscall overhead.

The quic-go version in caddy v2.6.2 also doesn't adjust UDP buffer sizes.

The other thing that's not clear from the blog post is the network path used. Running the benchmark over loopback only would give TCP-based protocols an advantage if the QUIC library doesn't support MTU discovery.

Etheryte · 2024-01-28T12:59:10 1706446750

I don't think taking shots at the Caddy version being not the latest is a fair criticism to be honest. Version 2.6.2 was released roughly three months ago, so it's not like we're talking about anything severely outdated, most servers you run into in the wild will be running something older than that.

zamadatix · 2024-01-28T15:16:26 1706454986

I think you mixed up what year we're now :). Caddy 2.6.2 October 13, 2022 so it's been not 3 but 15 months since release.

Even more relevantly, HTTP/3 was first supported out of the box in 2.6.0 - released Sep 20, 2022. Even if 2.6.2 had been just 3 months old that it's from the first 22 days of having HTTP/3 support out of the box instead of the versions from the following 3 months would definitely be relevant criticism to note.

https://github.com/caddyserver/caddy/releases?page=2

francislavoie · 2024-01-28T15:54:10 1706457250

This is why I'm not a fan of debian. (I assume OP got that version from debian because I can't think of any other reason they wouldn't have used latest.) They packaged Caddy, but they never update at the rate we would reasonably expect. So users who don't pay attention to the version number have a significantly worse product than is currently available.

We have our own apt repo which always has the latest version: https://caddyserver.com/docs/install#debian-ubuntu-raspbian

diggan · 2024-01-28T16:13:11 1706458391

Stable/tested but not latest version. Or unstable/untested but latest version. Chose one.

The distribution you chose, also makes you make that choice. If you're using Debian Stable, it's because your prefer stable in favor of latest. If you use Debian Testing/Unstable, you favor latest versions before stable ones.

Can't really blame Debian as they even have two different versions, for the folks who want to make the explicit decision.

francislavoie · 2024-01-28T16:18:37 1706458717

I don't call an old version with known bugs to be "stable/tested". No actual fixes from upstream are being applied to the debian version. There are known CVEs that are unpatched in that version, and it performs worse. There's really no reason at all to use such an old version. The only patches debian applied are the removal of features they decided they don't like and don't want packaged. That's it.

diggan · 2024-01-28T16:22:10 1706458930

By that definition, almost no software in Debian could be called "stable", as most software has at least one known bug.

When people talk about "stableness" and distributions, we're usually referring to the stableness of interfaces offered by the distribution together with the package.

> There's really no reason at all to use such an old version

Sometimes though, there is. And for those people, they can use the distribution they wish. If you want to get the latest versions via a package repository, use a package repository that fits with what you want.

But you cannot dictate what others need or want. That's why there is a choice in the first place.

zamadatix · 2024-01-28T17:33:38 1706463218

Stableness of interfaces is supposed to imply the software version is still maintained though. E.g. how stable kernel versions get backports of fixes from newer versions without introducing major changes from the newer versions. It's not meant to mean you e.g. get an old version of the kernel which accumulates known bugs and security issues. If you want the latter you can get that on any distro, just disable updates.

But you're right people are free to choose. Every version is still available on the Caddy GitHub releases page for example. What's being talked about here is the default behavior not aligning with the promise of being a maintained release, instead being full of security holes and known major bugs. It's unrelated to whether Debian is a stable or rolling distro rather about the lack of patches they carry for their version.

diggan · 2024-01-28T21:11:30 1706476290

> Stableness of interfaces is supposed to imply the software version is still maintained though. E.g. how stable kernel versions get backports of fixes from newer versions without introducing major changes from the newer versions. It's not meant to mean you e.g. get an old version of the kernel which accumulates known bugs and security issues. If you want the latter you can get that on any distro, just disable updates.

I'm sure the volunteers working on this stuff is doing the best they can, but stuff like this isn't usually "sexy" enough to attract a ton of attention and care, compared to other "fun" FOSS work.

csande17 · 2024-01-28T23:59:43 1706486383

Do you have an example of a CVE affecting Caddy that's not patched in Debian? In my experience they've been pretty responsive to security reports, including in the "long tail" of obscure / buggy packages.

For example, in December they noticed this CVE and determined it didn't apply to them because it was in one of the features they removed from Caddy: https://security-tracker.debian.org/tracker/CVE-2023-50463

Etheryte · 2024-01-28T15:59:54 1706457594

Oh, right you are, somehow I completely mixed that up. Thanks for clarifying.

nezirus · 2024-01-28T11:52:07 1706442727

Maybe shout out to HAProxy people, like many they've observed performance problems with OpenSSL 3.x series. But having good old OpenSSL with QUIC would be so convenient for distro packages etc

https://github.com/haproxy/wiki/wiki/SSL-Libraries-Support-S...

samueloph · 2024-01-28T13:15:11 1706447711

Nice write-up.

I'm one of the Debian maintainers of curl and we are close to enabling http3 on the gnutls libcurl we ship.

We have also started discussing the plan for enabling http3 on the curl CLI in time for the next stable release.

Right now the only option is to switch the CLI to use the gnutls libcurl, but looks like it might be possible to stay with openssl, depending on when non-experimental support lands and how good openssl's implementation is.

pabs3 · 2024-01-29T04:29:35 1706502575

Any chance of WebSocket being enabled too?

samueloph · 2024-01-29T09:46:38 1706521598

That's still an experimental feature on curl's side so I'm not sure. https://everything.curl.dev/helpers/ws/support

mistrial9 · 2024-01-28T14:54:49 1706453689

maybe the right time to clean up the unexpected and awkward set of libs that are currently installed, too ?

londons_explore · 2024-01-28T14:23:59 1706451839

Anyone else disappointed that the figures for localhost are in MB/s not GB/s?

The whole lot just seems an order of magnitude slower than I was hoping to see.

zamadatix · 2024-01-28T20:09:59 1706472599

A core of the 4770 (curl is single threaded) can't even manage a full order of magnitude more plain AES encryption throughput - ignoring it also has to be done into small packets and decrypted on the same machine.

mgaunard · 2024-01-28T10:07:40 1706436460

HTTP/1 remains the one with the highest bandwidth.

No surprise here.

1vuio0pswjnm7 · 2024-01-28T10:43:21 1706438601

It comes from a time before websites sucked because they are overloaded with ads ads and tracking.

For non-interactively retrieving a single page of HTML, or some other resource, such as a video, or retrieving 100 pages, or 100 videos, in a single TCP connection, without any ads or tracking, HTTP/3 is overkill. It's more complex and it's slower than HTTP/1.0 and HTTP/1.1.

sylware · 2024-01-28T11:56:09 1706442969

I have a domestic web server, I did implement its code, and the most important was HTTP1.[01] to be very simple to implement, that to lower the cost of implementing my real-life HTTP1.[01] alternative (we all know Big Tech does not like that...).

The best would be to have something like SMTP: the core is extremely simple and yet real-life-works everywhere, and via announced options/extensions it can _optionnaly_ grow in complexity.

BitPirate · 2024-01-28T10:17:36 1706437056

It's a bit like drag racing. If all you care about is the performance of a single transfer that doesn't have to deal with packet loss etc, HTTP/1 will win.

vbezhenar · 2024-01-28T13:58:55 1706450335

Yesterday I was trying to track weird bug. I moved a website to Kubernetes and its performance was absolutely terrible. It was loading for 3 seconds on old infra and now it spends consistently 12 seconds loading.

Google Chrome shows that 6 requests require 2-3 seconds to complete simultaneously. 3 of those requests are tiny static files served by nginx, 3 of those requests are very simple DB queries. Each request completes in few milliseconds using curl, but few seconds in Google Chrome.

Long story short: I wasn't able to track down true source of this obviously wrong behaviour. But I switched ingress-nginx to disable HTTP 2 and with HTTP 1.1 it worked as expected, instantly serving all requests.

I don't know if it's Google Chrome bug or if it's nginx bug. But I learned my lesson: HTTP 1.1 is good enough and higher versions are not stable yet. HTTP 3 is not even supported in ingress-nginx.

dilyevsky · 2024-01-28T19:14:08 1706469248

> I moved a website to Kubernetes and its performance was absolutely terrible. It was loading for 3 seconds on old infra and now it spends consistently 12 seconds loading.

My guess is it has more to do with resources you probably allocated to your app (especially cpu limit) than any networking overhead which should be negligible in such a trivial setup if done correctly

vbezhenar · 2024-01-29T05:34:25 1706506465

No, I don't use cpu limits and memory limits are plentiful. Besides, the only difference between working and non-working setups were literally single switch `use-http2` in the ingress controller and it was repeatable experience.

dilyevsky · 2024-01-31T01:27:46 1706664466

That is indeed interesting. I’ve used nginx with grpc (which uses http2) years ago and definitely did not see any such latency issues even at 50k qps

mgaunard · 2024-01-28T18:50:01 1706467801

Kubernetes makes everything slow and complicated.

Why do you even need to have proxies or load balancers in between? Another invention of the web cloud people.

vbezhenar · 2024-01-29T05:39:53 1706506793

That's not my experience. My experience is that kubernetes does not introduce any noticeable performance degradation. My message was about nginx, not kubernetes.

Front TCP load balancer allows to use a single IP address for multiple backend servers. It allows for server maintenance without service interruptions. Or even server crash.

Ingress controller is nginx which does reverse proxying. It handles HTTPS termination and it handles routing to the application which is supposed to serve the request. It's either another nginx instance configured to serve static resources or some kind of HTTP server which implements an actual server logic.

I agree that having two nginxes for static resources seems weird, but that's not a big deal, because nginx does not consume much resources.

mgaunard · 2024-01-29T11:18:26 1706527106

In real life you cannot achieve that level of failure transparency without specifically building your application for it.

If request N goes to server A, then request N+1 from the same session goes to server B, most likely that request will fail because the session does not exist within that server's context.

xyzzy_plugh · 2024-01-28T14:06:53 1706450813

nginx is notoriously bad at not-HTTP 1.1. I wouldn't even bother trying.

Envoy is significantly better in this department.

doublepg23 · 2024-01-28T14:51:57 1706453517

Huh TIL. I had always considered nginx the “fast” http server.

dilyevsky · 2024-01-28T19:15:09 1706469309

Nginx is fast enough for most applications. I wouldn’t bother switching if you dont need the power of “software defined” proxy

xyzzy_plugh · 2024-01-29T02:08:21 1706494101

Performance aside, if you need fully functional HTTP/2 and maybe HTTP/3 support then you're better off not using nginx. It's a pain.

apitman · 2024-01-28T16:21:04 1706458864

What were your reasons for moving the site to kubernetes?

vbezhenar · 2024-01-29T05:29:30 1706506170

Because kubernetes is much easier to operate than dozens of scripts and docker composes thrown around servers. We have like 5 sites, each site operated by like 5-10 services thrown around 4 servers without any redundancy, autoscaling and even consistency. It's truly is a mess.

varjag · 2024-01-28T12:56:28 1706446588

It runs over TCP, you don't need to deal with packet loss.

vlovich123 · 2024-01-28T13:41:32 1706449292

What they’re suggesting is that under packet loss conditions QUIC will outperform TCP due to head of line blocking (particularly when there are multiple assets to fetch). Both TCP and QUIC abstract away packet loss but they have different performance characteristics under those conditions.

mgaunard · 2024-01-28T18:28:08 1706466488

HTTP/1 doesn't have head of line blocking, only HTTP/2 does.

vlovich123 · 2024-01-28T21:18:39 1706476719

HTTP/1 has parallelism limitations due to number of simultaneous connections (both in terms of browser and server). HTTP/2 lets you retrieve multiple resources over the same connection improving parallelism but has head of line problems when it does so. HTTP/3 based on Quic solves parallelism and head of line blocking.

acdha · 2024-01-28T19:26:33 1706469993

The only way the first part of your sentence is correct means that the second part is wrong. HTTP 1 pipelining suffers from head of line blocking just as badly (arguably worse) and the only workaround was opening multiple connections which HTTP/2 also supports.

dilyevsky · 2024-01-28T19:21:55 1706469715

Most browsers only support very limited number of connections so it kinda does

mgaunard · 2024-01-28T19:58:55 1706471935

Limitations of certain implementations are irrelevant.

The protocol does not have any such limitation.

dilyevsky · 2024-01-28T20:13:07 1706472787

Totally, after all it’s not like we live in a real world where these things matter

mgaunard · 2024-01-29T11:19:52 1706527192

Given the alternative, switching to a new protocol, requires much more code changes than fixing bad implements of the old protocol, your argument falls quite flat.

vlovich123 · 2024-01-29T16:08:55 1706544535

There are very real reasons why increasing the number of connections isn’t a thing and that has to do with the expense of creating and managing a connection server side. In fact switching to HTTP/2 was the largest speed up because you could fetch multiple resources at once. The problem of course is that multiplexing resources over a single TCP connection has head of line problems.

Could this have been solved a different way? Perhaps but QUIC is tackling a number of TCP limitations at once (eg protocol ossification due to middleware).

mgaunard · 2024-01-29T20:08:51 1706558931

Server-side there is no difference between a different context per TCP connection for a HTTP1 server and a different context per HTTP2 stream within a HTTP2 server.

If anything the HTTP2 requires more state due to the nesting nature of the contexts.

vlovich123 · 2024-01-30T06:24:50 1706595890

There 100% is. You’re thinking about user space context. But establishing a new socket in the kernel requires a fair amount of under the covers book keeping and the total number of file descriptors keeping those resources alive is a real problem for very popular web services that have to handle lots of concurrent simultaneous connections. At the application layer it’s less of an issue.

HTTP/2 is indeed more complex to maintain in the application layer but that’s less to do with the memory implications of multiplexing requests /responses over a single TCP connection.

mgaunard · 2024-01-30T09:02:23 1706605343

There is no requirement for TCP to be implemented in the kernel; if anything all of the highest-performing implementations are in userland.

frankjr · 2024-01-28T12:04:29 1706443469

It's a mystery, it's almost as if people have spent decades optimizing it.

mgaunard · 2024-01-28T12:09:01 1706443741

Or rather, it was simply designed correctly.

k8svet · 2024-01-28T12:31:10 1706445070

I know you think you're coming off smarter than everyone else, but it's not how it's landing. Turns out things are not that overly reductive to that extent at all in the real world

mgaunard · 2024-01-28T12:53:35 1706446415

Not everyone else, just Google and other HTTP/2 apologists.

nolok · 2024-01-28T13:32:12 1706448732

I'm not sure what you mean by "apologist" or what you're trying to say, I'm not the person you're answering to, and I have no dog in this fight.

But you're talking in a very affirmative manner and apparently trying to say that one is "correct" and the other is not, while absolutely ignoring context and usage.

I recommend you either run yourself, or find on the web, a benchmark about not a single HTTP request, but an actual web page, requesting the html, the css, the js, and the images. Don't even need to go modern web, even any old pre 2010 design with no font or anything else fancy will do.

You will see that HTTP 1 and 1.1 are way, way worse at it that HTTP 2. Which is why HTTP 2 was created, and why it should be used. Also the sad joke that was domain rolling to trick simultaneous request per host configurations.

Overall, your point of view doesn't make sense because this is not a winner takes all game. Plenty of server should and do also run HTTP 1 for their usage, notably file servers and the likes. The question to ask is "how many request in parallel do the user need, and how important that they all finish as close to each other as possible instead of one after the other".

Similarily, HTTP3 is mostly about latency.

otterley · 2024-01-28T15:57:52 1706457472

You can transfer as many requests in parallel with HTTP/1.1 as you like by simply establishing more TCP connections to the server. The problem is that browsers traditionally limited the number of concurrent connections per server to 3. There’s also a speed penalty incurred with new connections to a host since initial TCP window sizes start out small, but it’s unclear whether that initial speed penalty significantly degrades the user experience.

The fact that anyone running wrk or hey can coerce a web server to produce hundreds of thousands of RPS and saturate 100Gb links with plain old HTTP/1.1 with connection reuse and parallel threads (assuming of course that your load tester, server, and network are powerful enough) ought to be enough to convince anyone that the protocol is more than capable.

But whether it’s the best one for the real world of thousands of different consumer device agents, flaky networks with huge throughput and latency and error/drop rates, etc. is a different question indeed, and these newer protocols may in fact provide better overall user experiences. Protocols that work well under perfect conditions may not be the right ones for imperfect conditions.

kiitos · 2024-01-28T22:51:13 1706482273

TCP connections are bottlenecked not just by the browser/client, but also at the load-balancer/server. Modulo SO_REUSEPORT, a server can maintain at most 64k active connections, which is far below any reasonable expectation for capacity of concurrent requests. You have to decouple application-level requests from physical-level connections to get any kind of reasonable performance out of a protocol. This has been pretty well understood for decades.

otterley · 2024-01-29T01:11:21 1706490681

That limitation was overcome over 20 years ago with the invention of Direct Server Return (DSR) technology, since the remote IP becomes that of the actual client. (This also helped relieve pressure on load balancers since they don't need to process return traffic.) Another way to overcome this would be to use netblocks instead of IP addresses on both the load balancers (upstream side) and the server side (so the LB has multiple IPs to connect to and the server to respond from).

The benefit of DSR became mitigated a bit after CGNAT (in the IPv4 space anyway) began to be rolled out, since it can masquerade a large group of clients behind a single IP address. (CGNAT poses other, different problems related to fraud and abuse mitigation.)

kiitos · 2024-01-29T04:20:28 1706502028

It's not a question of IP addresses, it's about connections.

otterley · 2024-01-29T08:03:25 1706515405

Which limit, exactly, are you referring to? Both load balancers and backend servers can juggle millions of concurrent connections nowadays. You mentioned a 64k connection limit but that’s not a hard file descriptor limit, nor does the 65536 port limit apply if the source and/or destination IPs differ.

kiitos · 2024-01-31T05:20:01 1706678401

> Both load balancers and backend servers can juggle millions of concurrent connections nowadays.

Maybe with SO_REUSEPORT, but not in general.

A TCP connection is identified by a 5-tuple that requires a unique port for both the client and server. TCP represents ports as uint16s, which means the max number of possible ports per address is 65536.

tl;dr: 1 server IP address = no more than 64k incoming connections

otterley · 2024-01-31T18:36:56 1706726216

Yes, I'm aware that 4-tuples must be unique. And so, by having a LB and/or server bind to more than one IP address, you can easily overcome that limit.

kiitos · 2024-01-31T23:26:59 1706743619

If you have multiple client-accessible IP addresses available to a server, I guess? But that's in no way common?

otterley · 2024-02-01T18:41:00 1706812860

It's quite common in Kubernetes deployments, where each server in a separate Pod binds to a separate IP address.

And, as I said before, with DSR, there's a broad diversity of client IPs, so a single server address doesn't typically cause concerns with 4-tuple exhaustion.

kiitos · 2024-02-02T00:14:34 1706832874

I think we're speaking from wildly different, and incompatible, contexts.

DSR is niche at best, not something that can be assumed.

throwaway892238 · 2024-01-28T16:31:45 1706459505

That's a lot of mays. One might imagine that before this stuff becomes the latest version of an internet standard, these theoretical qualifications might be proven out, to estimate its impact on the world at large. But it was useful to one massive corporation, so I guess that makes it good enough to supplant what came before for the whole web.

mgaunard · 2024-01-28T18:30:01 1706466601

HTTP/2 or /3 were never about optimizing bandwidth, but latency.

otterley · 2024-01-28T21:01:36 1706475696

Google did a great deal of research on the question using real-world telemetry before trying it in Chrome and proposing it as a standard to the IETF’s working group. And others including Microsoft and Facebook gave feedback; it wasn’t iterated on in a vacuum. The history is open and well documented and there are metrics that support it. See e.g. https://www.chromium.org/spdy/spdy-whitepaper/

foofie · 2024-01-28T13:09:21 1706447361

> HTTP/1 remains the one with the highest bandwidth.

To be fair, HTTP/2 and HTTP/3 weren't exactly aimed at maximizing bandwidth. They were focused on mitigating the performance constraints of having to spawn dozens of connections to perform the dozens of requests required to open a single webpage.

Beldin · 2024-01-28T16:17:30 1706458650

Too bad that the alternative option - not requiring dozens of requests just for initial rendering of a single page - didn't catch on.

GuestHNUser · 2024-01-28T18:02:54 1706464974

Couldn't agree more. So many performance problems could be mitigated if people wrote their client/server code to make as few requests as possible.

Consider the case of requesting a webpage with hundreds of small images, one should embed all of the images into the single webpage! Requiring each image to be fetched in a different http request is ridiculous. It pains me to look at the network tab of modern websites.

foofie · 2024-01-28T16:42:51 1706460171

I don't think it's realistic to expect a page load to not make a bunch of requests, considering that you will always have to support use cases involving downloading many small images. Either you handle that by expecting your servers to open a dedicated connection for each download request, or you take steps for that not to be an issue. Even if you presume horizontal scaling could mitigate that problem from the server side, you cannot sidestep the fact that you could simply reuse a single connection to get all your resources, or not require a connection at all.

eptcyka · 2024-01-28T13:12:12 1706447532

HTTP3 also wants to minimize latency in bad network environments, not just mitigating the issue of too many requests.

kiitos · 2024-01-28T22:56:05 1706482565

Assuming SSL, HTTP/1 does not deliver better throughput than HTTP/2 in general.

I'm not sure why you believe otherwise. Got any references?

drowsspa · 2024-01-28T11:52:15 1706442735

Honestly, one would think that the switch to a binary protocol and then to a different transport layer protocol would be justified by massive gains in performance...

kiitos · 2024-01-31T05:34:43 1706679283

It's definitely not a given that a binary protocol + etc. will yield a "massive gain in performance" versus naive e.g. gzipped JSON over HTTP.

vlovich123 · 2024-01-28T13:43:03 1706449383

The website being tested probably isn’t complicated enough to demonstrate that difference.

drowsspa · 2024-01-28T14:05:40 1706450740

Even then, I remember the sales pitches all mentioning performance improvements in the order of about 10-20%

vlovich123 · 2024-01-28T14:14:31 1706451271

In real world conditions, not loop back synthetic benchmarks.

lttlrck · 2024-01-28T15:25:23 1706455523

I believe his point is 10-20% gain is not massive.

FWIW I don't know if that is what was claimed.

vlovich123 · 2024-01-28T17:43:56 1706463836

20% is what real world data from Google suggested: https://www.zdnet.com/article/google-speeds-up-the-web-with-...

I interpreted his comment as saying “where’s the 20% speed up” which seems like a more reasonable interpretation in context. A 20% speed up is actually quite substantial because that’s aggregate - it must mean there’s situations where it’s more as well as situations where it’s unchanged or worse (although unlikely to be worse under internet conditions).

apitman · 2024-01-28T16:30:22 1706459422

Very nice. I would love to see some numbers including simulated packet loss. That's theoretically an area h3 would have an advantage.

1vuio0pswjnm7 · 2024-01-29T00:01:57 1706486517

Would it be worthwhile to test QUIC using some other TLS library besides OpenSSL, e.g., wolfSSL. I think I read that the the cURL author is working with them, or for them. Apologies if this is incorrect.

jupp0r · 2024-01-28T20:45:13 1706474713

Great writeup, but the diagrams are downright awful. I'd separate the different facets visually to make it easier to see the difference vs those different colors.

superkuh · 2024-01-28T15:16:09 1706454969

Can cURL's HTTP/3 implementation work with self signed certs? Pretty much every other HTTP/3 lib used by major browsers do not. And since HTTP/3 does not allow for null cypher or TLS-less connections this means in order to establish an HTTP/3 connection a third party CA must be involved.

As is right now it is impossible to host a HTTP/3 server visitable by a random person you've never met without a corporate CA continually re-approving your ability to. HTTP/3 is great for corporate needs but it'll be the death of the human web.

adobrawy · 2024-01-28T16:35:10 1706459710

Given that browsers discourage HTTP traffic (warning that the connection is insecure), given how easily free SSL certificates are available, and given that HTTPS is already the standard on small hobbyist sites, I don't expect The requirement for an SSL certificate has been a blocker in HTTP/3 adoption.

ndriscoll · 2024-01-28T17:35:14 1706463314

Do browsers warn for http (beyond the address bar icon)? I don't think they ever have for my personal site. I also don't think you can really say there's a "standard" for how hobbyists do things. I'm definitely in the bucket of people who use http because browsers throw up scary warnings if you use a self-signed cert, and scary warnings aren't grandma friendly when I want to send photos of the kids. The benefit of TLS isn't worth setting up publicly signed certs to me, and I don't want to invite the extra traffic by appearing on a CT log.

Like the other poster said, it all makes sense for the corporate web. Not so much for the human web. For humans, self-signed certs with automatic TOFU makes sense, but browsers are controlled by and made for the corporate web.

jrpelkonen · 2024-01-28T16:00:05 1706457605

I really don’t want to criticize anyone or their hard work, and appreciate both curl and OpenSSL as a long time user. That said, I personally find it disappointing that in 2024 major new modules are being written in C. Especially so given that a) existing Quic modules written in Rust exist, and b) there’s a precedent for including Rust code in Curl.

Of course there are legacy reasons for maintaining existing codebases, but what is it going to take to shift away from using C for greenfield projects?

apitman · 2024-01-28T16:12:48 1706458368

Not saying you're wrong, but it's worth noting that switching to Rust is not free. Binary sizes, language complexity, and compile times are all significantly larger.

zinekeller · 2024-01-28T16:08:44 1706458124

For something like curl (which is also used in embedded systems: a legally-verified (compliant with ISO and other standards, for better or worse) Rust compiler that targets common microarchitectures is a definite first step. Fortunately, the first half of it exists (Ferrocene, https://ferrous-systems.com/ferrocene/). The second one is harder: there are architectures even GCC does not target (these architectures rely on other compilers like the Small Device C Compiler (or a verified variant) or even a proprietary compiler), and LLVM only compiles to a subset of GCC. Even if there's a GCC Rust (currently being developed fortunately), you are still leaving a lot of architectures.

jrpelkonen · 2024-01-28T16:39:57 1706459997

This is a good point: there are many niche architectures where Rust is not a viable option. But in this specific case, I don’t see these system benefiting from h3/Quic. HOL blocking etc. will rarely, if ever, be a limiting factor for the use cases involved.

secondcoming · 2024-01-28T20:30:15 1706473815

I'm personally disappointed you're aware of this issue and have done nothing about it.

teunispeters · 2024-01-28T18:36:44 1706467004

If rust could support all of C's processors and platforms and produce equivalent sized binaries - especially for embedded ... then it'd be interesting to switch to. (as a start, it also needs a stable and secure ecosystem of tools and libraries)

Right now, it's mostly a special purpose language for a narrow range of platforms.

throwaway892238 · 2024-01-28T16:26:44 1706459204

Lol, wait, HTTP2 and HTTP1.1 both trounce HTTP3? Talk about burying the lede. Wasn't performance the whole point behind HTTP3?

This chart shows that HTTP2 is more than half as slow as HTTP1.1, and HTTP3 is half as slow as HTTP2. Jesus christ. If these get adopted across the whole web, the whole web's performance could get up to 75% slower . That's insane. There should be giant red flags on these protocols that say "warning: slows down the internet"

zamadatix · 2024-01-28T20:37:49 1706474269

If the last decade of web protocol development seems backwards to you after reading one benchmark then why immediately assume it's insane and deserves a warning label instead of asking why your understanding doesn't match your expectations?

The benchmark meant to compare how resource efficient the new backend for curl is by using localhost connectivity. By using localhost connectivity any real world network considerations (such as throughput discovery, loss, latency, jitter, or buffering) are sidestepped to allow a direct measurement of how fast the backend alone is. You can't then assume those numbers have a meaningful direct extrapolation to the actual performance of the web because you don't know how the additional things the newer protocols do impact performance once you add a real network. Ingoring that, you still have to consider the notes like "Also, the HTTP/1.1 numbers are a bit unfair since they do run 50 TCP connections against the server." before making claims about HTTP2 being more than half as slow as HTTP1.1.

CharlesW · 2024-01-28T17:15:36 1706462136

> Wasn't performance the whole point behind HTTP3?

Faster, more secure, and more reliable, yes. The numbers in this article looks terrible, but real-world testing¹ shows that real-world HTTP/3 performance is quite good, even though implementations are relatively young.

"…we saw substantially higher throughput on HTTP/3 compared to HTTP/2. For example, we saw about 69% of HTTP/3 connections reach a throughput of 5 Mbps or more […] compared to only 56% of HTTP/2 connections. In practice, this means that the video streams will be of a higher visual quality, and/or have fewer stalls over HTTP/3."

¹https://pulse.internetsociety.org/blog/measuring-http-3-real...

jgalt212 · 2024-01-28T13:47:53 1706449673

Does Curl performance really matter? i.e. if it's too performant, doesn't that increase the odds your spider is blocked? Of course, if you're sharding horizontally across targets, then any performance increase is appreciated.

j16sdiz · 2024-01-28T14:02:32 1706450552

libcurl is the backend for many (RESTful) API library.

Improving upload throughput to S3 bucket would be great, right?

zamadatix · 2024-01-28T20:22:08 1706473328

What if you're not using curl as a spider? Even if you are I'd recommend some other spider design which doesn't rely on the performance of curl to set the crawling rate.