Marten is in the process of making a bunch of optimizations [0], as are we within Caddy, and 2.6 includes some of those. It will only get better with time.
Most users do not operate at nearly the volume required to feel the impact, including enterprises.
Would be interested in repeating your experiments and knowing your real world use case and seeing how similar they are.
Yeah I agree this is a edge case, 99,99% of Caddy users don't push so much data ;)
But Caddy can't be used as a big files download server for instance, unless sticking with HTTP/1.1 (that and sendfile + kTLS not been supported last time I checked?) (at least with a single http instance).
Another issue for HTTP/3 is https://github.com/lucas-clemente/quic-go/wiki/UDP-Receive-B...
Current Linux in WSL for instance has this issue.
This will probably resolves over time as more and more Linux distros will take HTTP/3 requirements into account.
Caddy 2.6 has some optimizations for serving large files, including sendfile, but other optimizations too. No ktls yet, but maybe someday. It's just... kinda gnarly.
with BPS you mean throughput on a single request? On an unconstrained connection (like loopback)? In this case you are mostly measuring CPU efficiency, since for a real world deployment the network connection would likely be the bottleneck. In that real world setup then loss of CPU efficiency becomes a scaling challenge, which actually makes it less RPS.
Btw: A good recommendation for anyone doing benchmarking is to both measure throughput for a single connection, but also throughput for the whole server when using lots of connections (e.g. 100 connections per core, and each connection doing multiple concurrent requests). The latter will point out further scaling issues.
In terms of full system efficiency HTTP/2 vs HTTP/1 (with TLS) is actually not that bad for good implementations - might have 10-50% more cost but in the end the performance critical parts (doing syscalls for TCP sockets) are the same for both. QUIC can be much worse. Well optimized setups (which e.g. use optimized UDP APIs with GSO or even XDP) somewhere between 50-100% more expensive than TCP+TLS setups. And simpler implementations can be 5x as expensive, and might not even scale beyond a single CPU core.
Go implementation of HTTP/2 already took a /5 hit over http/1.1 (Go http/2 implementation is 5x slower than Go http/1.1)
With HTTP/3 our early benchs indicate /2 ot /3 from HTTP/2 (so /10 from http/1.1)