Hacker News new | past | comments | ask | show | jobs | submit login
TCP Fast Open? Not so fast (2021) (apnic.net)
31 points by fanf2 7 hours ago | hide | past | favorite | 8 comments





I sponsored implementing the client side for FreeBSD when I worked at a large service provider. The use is cleaner when you have control of both ends, such as a cache/proxy to origin or peer cache and you aren't going to be subject to packet manglers and can configure a shared cookie out of band.

The typical microservices misarchitecture would really benefit from this kind of thing since the setup time of TCP is substantial.

We did have some ambitions to expose TFO to customers in some way but unfortunately I left before that ever went under test to see how it could be commercialized.


One curiosity: TCP Fast Open requires working path MTU discovery because MSS clamping no longer works as a hack advertise a lower MTU: the Fast Open cookie adds an arbitrary amount of data which counts towards the packet length, but not towards the segment length.

As described in the RFC, TCP Fast Open is supposed to be used from a client IP that connected to the server IP recently. The client could be expected to have discovered the effective MTU during that previous connection and limit it's SYN data to that size.

The server gets the (presumably clamped) MSS option with the SYN+data, and could use that, or possibly include the previous MSS in its syn cookie and use the lesser of the two. Or more conservatively the size of the client SYN packet or 576/1280.

MSS/MTU doesn't have to be the same in both directions, but it's usually a pretty good assumption to assume it is. I've seen much better results with broken networks when the syn+ack sends back min(server mss, client mss) rather than always sending back client mss; and the impact on working networks is very small. sending back min(server mss, client mss - X) where X is 8 (assume PPPoE) or 20 (assume IPIP tunnel) works even better at establishing working connections, although with some additional overhead where the client actually knows its mss. Some devices clamp MSS on outbound SYN but not on inbound SYN+ACK. sigh

Personally, if I were to deploy something like TCP Fast Open, I'd dispense with the cookie... allow clients to speculatively include syn data, if they want. Cap to packets of 576/1280 length to be reasonable. Servers should consider recent results of Fast Open and local capacity to decide if they want to accept it or not. Server response should be limited to the same size as the client sent to avoid amplification. Publish some common heuristics --- if clients sending fast open get to connected in 90% of syn+acks sent, go ahead and process the fast open data, but when success falls below that, don't do it. Have some limit on how many fast opens you want to leave open. Server side done.

Client side, try it, if it doesn't work try without. Every time the retry without works, double a skip counter; every time the try with it works, decrement the skip counter. Max out at trying once every 1024 connections. You can even do happy eyeballs stuff like send out a plain SYN after a short time, use whichever comes back first, but if the fast open does come back a bit after plain syn, you know that fast open is viable on this network / to that destination.


It's one of many network protocol improvements that could never be used effectively due to middle boxes. QUIC is specifically designed to prevent ossification like this.

https://en.wikipedia.org/wiki/Protocol_ossification#Examples


This kind of ossification can be reduced by pressure campaigns.

Apple is very good at these. Mobile carriers were forced into configuring IPv6 and allowing MPTCP because Apple included that as part of certification to sell iPhones. Convince Apple that TCP Fast Open is important, and they'll make it work on mobile carriers through their considerable pressure. Home networks, not so much, so you've always got to have heuristics and detection; which again, Apple is very good at; they've had effective and rapid fallback for bad path MTU on iPhones for a lot longer than Androids, even though the Android kernel had options for it since the beginning --- they were only enabled recently. I'm very much not an Apple fan beyond the Apple II era, but they do client side networking very well.

Google probably can't exert this kind of pressure directly, they don't have the carrier sales volume, IMHO. Maybe Samsung could. Nokia could have before the fall. Google could put it into a PageSpeed type tool though; they've got influence through that kind of tooling. And they control the two ends of lots of traffic, so they could test through changes in ChromeOS and their servers.


> Google probably can't exert this kind of pressure directly, they don't have the carrier sales volume, IMHO.

Google can just push the responsibility to manufacturers with the Play Store certification. That's a huuuuuge leverage they have.

Sadly they don't use it for much else than pushing anti-root crap.


It's a good article but it seemed to leave some questions unanswered for me. (1) Why isn't TFO enabled by default? I guess the answer involves bad middleboxes, but then how is the client/server meant to know any better, and isn't the very conservative fallback supposed to mitigate that? (2) What should the queue size be set to?

Discussed at the time:

TCP Fast Open? Not so fast - https://news.ycombinator.com/item?id=27745422 - July 2021 (20 comments)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: