Unfortunately its only for TCP, so usefulness is somewhat limited outside the datacentre.
For us devs, it's sad how little software uses LEDBAT protocol for bulk downloads. All of you should be aware of the advantages and use it wherever possible. It allows bulk data transfer without slowing down priority data streams. And it works without any OS and router support.
For those with slow home connections, OpenWrt with SQM CAKE enabled is incredible. If you turn on the options for per-ip flows, it almost perfectly eliminates bufferbloat. I run OpenWrt x64 edition on one of my old desktops. We have abysmally slow DSL at home and in office and it's a night and day difference. Multiple people torrenting, uploading, watching video and you can still browse the web
I assume it to be equivalent because of LEDBAT's eager backoff, but it'd still be a cool thing to read
The big innovation around these new schedulers is focusing on fairness instead of maximum throughout. For the longest time TCP flow control was judged only by how close it reached to theoretical maximum throughout.
So we ended up with extremely aggressive schedulers. The challenge now is to make a scheduler that limits bufferbloat while not surrendering all it's throughout to old school bully schedulers. This is where BRR really innovates
Also works nicely for p2p protocols that have to connect to peers on other continents.
More (but not much) information is available on Google's BBR repository: https://github.com/google/bbr
The tricky part of HTTP/2 is that you now have 2 flow control algorithms running on top of each other. And they are rather good at harming the other one instead of cooperating. E.g. a too small HTTP window can lead to poor performance as you observed. However a too big HTTP window can lead to excessive buffering in the application for HTTP/2 streams.
It's a rather hard challenge to get to a good compromise solution here (and not a lot of people have arrived there yet).
In doubt the rule of thumb is to use HTTP/2 for small requests where the benefit of multiplexing is biggest. And stick to HTTP/1.1 (a non multiplexed TCP connection) for high throughput.
Alas, if you want to upload the browser doesn't let you choose, so you would have to set up another host or port.
Second - HTTP/3 define some "idle" state of connections like 30 seconds or even more. So you can download something and then nothing. Here HTTP 0.9/1.0/1.1 and /2 closes connections. On HTTP/3 it was set to idle - nor closed. And let say after 15 sec you need new resource. On HTTP/3 connection flow continue where was stopped. On previous versions you must make TLS handshake and then to make new request. This mean performance penalty because congestion control will start with values similar to 0.
You can reuse connections with every HTTP version up from HTTP/1.1. The main difference between HTTP/1.1 and HTTP/2 and /3 is that you can't make concurrent requests on a single connection with HTTP/1.1, but you can with /2 and /3.
For how long a connection will stay open after a request will be determined by keepalive settings on the client and server side. All HTTP versions up from 1.1 allow you not to use a connection for 30s and then make another request.
On a higher level you have the Quic streams, which carry user data. These are subject to flow control windows, and there also exists a shared per-connection window.
Depending on the implementation the layers might not be very much aware of each other, which can also lead to imperfect utilization.
That said the situation is obviously still different than with TCP, since the congestion controller does not limit the amount of user data but packets in general (which includes retransmits, etc). So the outcomes will certainly be different than what we observe with HTTP/2.
Example - i'm still using on some place router Linksys WRT54GL with kernel 2.4 that is almost 15 years old. Because device still works and there was low network requirements.
Damn, I was not aware of this, but sure enough there it is 
I suppose that's just one more thing to reaffirm my belief that SPDY is a horribly over-engineered protocol, that tried to shoehorn fixes for TCP's shortcomings on top of the wrong abstraction (which, ironically, is TCP itself).
Had HTTP2 only introduced the binary frames encoding, instead of bolting this horrible new transport along with it, we wouldn't be so deep in this mess. QUIC can't come soon enough.
Getting this all right is also far from an easy task - and resource wise it might never catch up with HTTP/1.1 + TLS due to missing hardware support.
However I feel like at least in some areas the Quic specification became less complex. E.g. the use of absolute flow control offsets instead of relative increments makes things less error prone. And it's nice that now the defaults for the dynamic table are "off", which means one can skip implementing that part without the risk of not being interoperable.
> A lot of the challenges that are part of implementing HTTP/2 correctly are also present in QUIC
I'd dispute this because it doesn't have nested flight windows, and doesn't have to multiplex and interleave streams over the same tunnel (it does that over the same socket/port, but it's way easier because UDP is a stateless packet machine-gun, so you don't need to keep both TCP and SPDY's concept of a stream at the same time).
It has about the same challenges of implementing TCP correctly (which is what you'd expect really), because it needs to implement congestion control and etc pretty much in all the same places, the only "extra" it shares with SPDY is stream multiplexing (which is still much simpler because it doesn't need to worry about TCP's semantics on top of it).
> resource wise, it might never catch up with HTTP/1.1 + TLS (I think you meant TCP here) due to missing hardware support
I have two comments about this:
1 - This is actually more of a software problem than a hardware problem. Hardware acceleration for QUIC depends mostly on evolving the tooling we have for "talking" to our network cards across different platforms than anything else really.
For a while now, most NICs (especially those you'd see in data-center gear) use a Motorola 68k based chip, or something else equally programmable, and offloading arbitrary programs to those is already something that happens quite often. For example, Linux XDP programs are automatically offloaded to the NIC if it supports that.
This is also not a problem for middle-boxes, since they just treat UDP blindly as UDP and will just do AQM on top of it like they always did, no need to implement flow control.
This might be a problem for legacy consumer hardware, but then again those are already inefficient in all kinds of ways, and for these cases being able to run the entire networking stack from userland is actually a benefit: being able to fallback on full-software implementations means we can make Windows XP support QUIC if we wanted to. Can we even measure how much shit we'd break if we tried to "upgrade" TCP?
2 - This might be underestimating the amortized gains from switching to a more efficient protocol: having to retransmit less data, 0-rtt handshakes, and such things add up.
Specifically for HTTP1.1, being a text based protocol doesn't help either: how many CPU cycles do we waste with parsing CRLFs, lowercasing headers, and crap like that? At least HTTP2 doesn't suffer from these problems, but it still screwed up transport big time.
> I feel like at least in some areas the QUIC specification became less complex
This I agree with, wholeheartedly.
So far, the only thing I've found absolutely detestable about it is mandated encryption, as that prohibits me from outsourcing some L7 intelligence to a proxy/sidecar/service-mesh/whatever, and having to shoehorn some bullshit certificate between my application and the proxy so that it can have some visibility into traffic will be error prone and annoying.
Maybe we'll make a variant of QUIC with TLS disabled for gRPC and other API shennanigans? We'll see when the time comes...
- Is BBRv2 still modeled close to the PID equation, in a way that expanding the equation as shown by BBRv1x  is still possible, and we can tune for different target parameters?
- How tunable is BBRv2? For example, say I'm doing some voice chat app, and in my use case an increased sensitivity to ECN is presumably better (I read somewhere that Facetime does this, but regardless of that, let's just assume it's true for a moment). Can I tune that? Or disable ECN-awareness entirely? Is there any good reference material on the parameters BBRv2 exposes?
- I never saw any version of BBR be compared to TIMELY  which is also from Google. Even though I assume TIMELY would be worse given that it only takes RTT into consideration, it'd still be cool to see
- BBR is great for clients, but ultimately dependent on trusting other clients in the network to be well behaved, else the game-theoretical fairness doesn't work out. What is the SOTA algorithm for AQM on middle-boxes nowadays, and preventing bad clients from hogging link capacity?
Questions aside though, it's great to see more algorithms leveraging ECN as a signal. Good stuff, kudos to the researchers!
fq_pie for constrained systems. fq_codel or cake if you have enough CPU cycles. They can be operated in ECN-aware mode.