Hacker News new | past | comments | ask | show | jobs | submit login
Evaluating TCP BBRv2 on the Dropbox edge network (arxiv.org)
73 points by fanf2 40 days ago | hide | past | favorite | 24 comments

BBR is great. It should be on by default in Linux kernel.

Unfortunately its only for TCP, so usefulness is somewhat limited outside the datacentre.

For us devs, it's sad how little software uses LEDBAT protocol for bulk downloads. All of you should be aware of the advantages and use it wherever possible. It allows bulk data transfer without slowing down priority data streams. And it works without any OS and router support.

For those with slow home connections, OpenWrt with SQM CAKE enabled is incredible. If you turn on the options for per-ip flows, it almost perfectly eliminates bufferbloat. I run OpenWrt x64 edition on one of my old desktops. We have abysmally slow DSL at home and in office and it's a night and day difference. Multiple people torrenting, uploading, watching video and you can still browse the web

SQM with Cake is a life saver in any kind of network with bufferbloat. The per-host/IP fairness is the cherry on top.

Is there any study comparing the impact of BBR vs CUBIC on a LEDBAT stream?

I assume it to be equivalent because of LEDBAT's eager backoff, but it'd still be a cool thing to read

I haven't seen anything. They're both based primarily on RTT round trip time instead of packet loss %. So they should share roughly equally.

The big innovation around these new schedulers is focusing on fairness instead of maximum throughout. For the longest time TCP flow control was judged only by how close it reached to theoretical maximum throughout.

So we ended up with extremely aggressive schedulers. The challenge now is to make a scheduler that limits bufferbloat while not surrendering all it's throughout to old school bully schedulers. This is where BRR really innovates

> Unfortunately its only for TCP, so usefulness is somewhat limited outside the datacentre.

Also works nicely for p2p protocols that have to connect to peers on other continents.

For anyone else who didn't know, BBR stands for "Bottleneck Bandwidth and RTT", and is a congestion control algorithm.

More (but not much) information is available on Google's BBR repository: https://github.com/google/bbr

The improved ss output mentioned in the paper has helped me immensely tracking down what was limiting the upload speed from a corporate net to AWS. Turned out no middleboxes were to blame, just a misconfiguration in nginx (receiving side) limiting the http2 send window (sender side). Yes, http2 has a separate send window on top of tcp. Yet another source of errors.

Yes, it has. And it's one of the prime reasons HTTP/2 is slower for lots of applications than pure HTTP/1.1.

The tricky part of HTTP/2 is that you now have 2 flow control algorithms running on top of each other. And they are rather good at harming the other one instead of cooperating. E.g. a too small HTTP window can lead to poor performance as you observed. However a too big HTTP window can lead to excessive buffering in the application for HTTP/2 streams.

It's a rather hard challenge to get to a good compromise solution here (and not a lot of people have arrived there yet).

In doubt the rule of thumb is to use HTTP/2 for small requests where the benefit of multiplexing is biggest. And stick to HTTP/1.1 (a non multiplexed TCP connection) for high throughput.

> In doubt the rule of thumb is to use HTTP/2 for small requests where the benefit of multiplexing is biggest. And stick to HTTP/1.1 (a non multiplexed TCP connection) for high throughput.

Alas, if you want to upload the browser doesn't let you choose, so you would have to set up another host or port.

This is fixed on HTTP/3

How do you define "fixed"? Quic still defines per stream plus per connection flow control windows, on top of the lower level congestion control algorithm.

HTTP/3 is built with TLS 1.3 and have faster handshake.

Second - HTTP/3 define some "idle" state of connections like 30 seconds or even more. So you can download something and then nothing. Here HTTP 0.9/1.0/1.1 and /2 closes connections. On HTTP/3 it was set to idle - nor closed. And let say after 15 sec you need new resource. On HTTP/3 connection flow continue where was stopped. On previous versions you must make TLS handshake and then to make new request. This mean performance penalty because congestion control will start with values similar to 0.

This is pretty much unrelated to the discussion earlier on (which was about the interaction of different flow control mechansisms). But nevertheless one answer that tries to clear up some misunderstandings:

You can reuse connections with every HTTP version up from HTTP/1.1. The main difference between HTTP/1.1 and HTTP/2 and /3 is that you can't make concurrent requests on a single connection with HTTP/1.1, but you can with /2 and /3.

For how long a connection will stay open after a request will be determined by keepalive settings on the client and server side. All HTTP versions up from 1.1 allow you not to use a connection for 30s and then make another request.

Does it not use UDP?

Right - it does. My point was that Quic still has 2 layers which determine the amount of data that is sent to peers. On the lower level, there is a congestion controller - which can be like Cubic, New Reno, BBR, etc. This part does not know anything about Quic streams yet.

On a higher level you have the Quic streams, which carry user data. These are subject to flow control windows, and there also exists a shared per-connection window.

Depending on the implementation the layers might not be very much aware of each other, which can also lead to imperfect utilization. That said the situation is obviously still different than with TCP, since the congestion controller does not limit the amount of user data but packets in general (which includes retransmits, etc). So the outcomes will certainly be different than what we observe with HTTP/2.

It uses UDP because simply can't change all TCP infrastructure worldwide.

Example - i'm still using on some place router Linksys WRT54GL with kernel 2.4 that is almost 15 years old. Because device still works and there was low network requirements.

Sure, I totally get that. But my point is UDP doesn't have any underlying congestion control you have to fight.

> Yes, http2 has a separate send window on top of tcp. Yet another source of errors

Damn, I was not aware of this, but sure enough there it is [0]

I suppose that's just one more thing to reaffirm my belief that SPDY is a horribly over-engineered protocol, that tried to shoehorn fixes for TCP's shortcomings on top of the wrong abstraction (which, ironically, is TCP itself).

Had HTTP2 only introduced the binary frames encoding, instead of bolting this horrible new transport along with it, we wouldn't be so deep in this mess. QUIC can't come soon enough.

[0] https://www.chromium.org/spdy/spdy-protocol/spdy-protocol-dr...

Maybe the bad news for you: Quic is different, but it's not necessarily a lot easier. A lot of the challenges that are part of implementing HTTP/2 correctly are also present in Quic. You will still have to implement a flow control mechanism for individual Quic streams, apart from the being fair across streams, congestion control and pacing for Quic connections, and being fair across connections on an endpoint.

Getting this all right is also far from an easy task - and resource wise it might never catch up with HTTP/1.1 + TLS due to missing hardware support.

However I feel like at least in some areas the Quic specification became less complex. E.g. the use of absolute flow control offsets instead of relative increments makes things less error prone. And it's nice that now the defaults for the dynamic table are "off", which means one can skip implementing that part without the risk of not being interoperable.

I'm well aware of these things actually lol, it's pretty obvious that if the protocol is built on top of UDP it'd need to implement some FSM to handle retransmission and flow control

> A lot of the challenges that are part of implementing HTTP/2 correctly are also present in QUIC

I'd dispute this because it doesn't have nested flight windows, and doesn't have to multiplex and interleave streams over the same tunnel (it does that over the same socket/port, but it's way easier because UDP is a stateless packet machine-gun, so you don't need to keep both TCP and SPDY's concept of a stream at the same time).

It has about the same challenges of implementing TCP correctly (which is what you'd expect really), because it needs to implement congestion control and etc pretty much in all the same places, the only "extra" it shares with SPDY is stream multiplexing (which is still much simpler because it doesn't need to worry about TCP's semantics on top of it).

> resource wise, it might never catch up with HTTP/1.1 + TLS (I think you meant TCP here) due to missing hardware support

I have two comments about this:

1 - This is actually more of a software problem than a hardware problem. Hardware acceleration for QUIC depends mostly on evolving the tooling we have for "talking" to our network cards across different platforms than anything else really.

For a while now, most NICs (especially those you'd see in data-center gear) use a Motorola 68k based chip, or something else equally programmable, and offloading arbitrary programs to those is already something that happens quite often. For example, Linux XDP programs are automatically offloaded to the NIC if it supports that.

This is also not a problem for middle-boxes, since they just treat UDP blindly as UDP and will just do AQM on top of it like they always did, no need to implement flow control.

This might be a problem for legacy consumer hardware, but then again those are already inefficient in all kinds of ways, and for these cases being able to run the entire networking stack from userland is actually a benefit: being able to fallback on full-software implementations means we can make Windows XP support QUIC if we wanted to. Can we even measure how much shit we'd break if we tried to "upgrade" TCP?

2 - This might be underestimating the amortized gains from switching to a more efficient protocol: having to retransmit less data, 0-rtt handshakes, and such things add up.

Specifically for HTTP1.1, being a text based protocol doesn't help either: how many CPU cycles do we waste with parsing CRLFs, lowercasing headers, and crap like that? At least HTTP2 doesn't suffer from these problems, but it still screwed up transport big time.

> I feel like at least in some areas the QUIC specification became less complex

This I agree with, wholeheartedly.

So far, the only thing I've found absolutely detestable about it is mandated encryption, as that prohibits me from outsourcing some L7 intelligence to a proxy/sidecar/service-mesh/whatever, and having to shoehorn some bullshit certificate between my application and the proxy so that it can have some visibility into traffic will be error prone and annoying.

Maybe we'll make a variant of QUIC with TLS disabled for gRPC and other API shennanigans? We'll see when the time comes...

ISTR that SPDY et al was originally built on UDP, not TCP, so there was a need for a send window. But yes, it's now fighting with the TCP window. Ideally, the HTTP2 send window should be derived from the TCP window.

Here's a few questions I have for any lurking network engineers "in the know", since there doesn't seem to be an official paper for BBRv2 yet (or at least I couldn't find it on https://research.google/pubs/):

- Is BBRv2 still modeled close to the PID equation, in a way that expanding the equation as shown by BBRv1x [0] is still possible, and we can tune for different target parameters?

- How tunable is BBRv2? For example, say I'm doing some voice chat app, and in my use case an increased sensitivity to ECN is presumably better (I read somewhere that Facetime does this, but regardless of that, let's just assume it's true for a moment). Can I tune that? Or disable ECN-awareness entirely? Is there any good reference material on the parameters BBRv2 exposes?

- I never saw any version of BBR be compared to TIMELY [1] which is also from Google. Even though I assume TIMELY would be worse given that it only takes RTT into consideration, it'd still be cool to see

- BBR is great for clients, but ultimately dependent on trusting other clients in the network to be well behaved, else the game-theoretical fairness doesn't work out. What is the SOTA algorithm for AQM on middle-boxes nowadays, and preventing bad clients from hogging link capacity?

Questions aside though, it's great to see more algorithms leveraging ECN as a signal. Good stuff, kudos to the researchers!

[0] https://www.youtube.com/watch?v=PeYPqnLhUuc

[1] https://research.google/pubs/pub43840/

> What is the SOTA algorithm for AQM on middle-boxes nowadays, and preventing bad clients from hogging link capacity?

fq_pie for constrained systems. fq_codel or cake if you have enough CPU cycles. They can be operated in ECN-aware mode.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact