Hacker News new | past | comments | ask | show | jobs | submit login
Lying to TCP makes it a best-effort streaming protocol (eighty-twenty.org)
213 points by tonyg on Feb 2, 2018 | hide | past | web | favorite | 91 comments

It’s been shown that leaving holes in the sequence space isn’t good for middleboxes [0]. Wanting TCP to behave more like UDP is a reasonable goal (e.g., for reachability), but you throw the benefits away if 20% of connections are disrupted by middleboxes. Other approaches, like Minion [1] or TCP Hollywood [2], see deployability because they adhere to the assumptions made about TCP by middleboxes.

[0]: “Is it Still Possible to Extend TCP?” - http://www0.cs.ucl.ac.uk/staff/ucacmha/papers/extend-tcp.pdf

[1]: Minion - https://tools.ietf.org/html/draft-iyengar-minion-protocol-01

[2]: TCP Hollywood - http://ieeexplore.ieee.org/document/7497221/

Yes, we were surprised by just how prevalent middleboxes were that didn't like gaps in sequence spaces. This significantly influenced the design options for MPTCP.

Another problem is that TCP is a bytestream protocol. Apps that stream over TCP don't usually add packet-orientated framing and resync points, so if you lose a packet, the receiver will often need to discard quite a bit of data after the missing packet before they can start decoding again. Effectively this multiplies the effective loss rate. In the extreme, there's the potential for congestion collapse, where lots of packets are being delivered but none of the are useful, so they're all discarded at the receiver.

Edit: I should add - middleboxes often resegment the datastream, merging multiple packets, or splitting large ones. So even if the sender added a header in each segment sent, those headers may not be at the beginning of the segment when it arrives. After a loss, you may not be able to reliable find the next header again.

By the way, that web server at UCL may well be the oldest on the Internet. It's probably the only server left proudly running CERN/3.0 on Sparc hardware since 1994.

Your point about resegmentation (in combination with loss or out-of-order delivery) is why a framing layer, like the one in Minion, is necessary. The Minion paper [3] does a good job of illustrating the problem.

[3]: https://arxiv.org/abs/1103.0463

Yes indeed. If I could crash middle-boxes by sending them malformed TCP streams, I would do it all day long.

It would be very good for the net if something came up that penalised middleboxes though. Maybe this could be it.

QUIC demonstrates a better approach: encrypt everything, limiting the ability for middleboxes to make assumptions about behaviour.

Even QUIC found that the couple of bytes of "flag" fields in the header were enough for middleboxes to mess with it. It seems incredibly hard to devise a protocol which is extensible without having middleboxes mess it up.

Indeed. Though that issue sort of validated the approach: middlebox vendors will latch on to any unencrypted bits they can. The only way to stop that is to encrypt _everything_.

> It would be very good for the net if something came up that penalised middleboxes though.

What you can do is to have a good protocol that requires no interference from middleboxes but detects it if it happens, and then a less efficient legacy fallback protocol that basically looks as much as possible like HTTPS.

Then if you detect interference from a middlebox, show the user a message that says, "WARNING: MAN IN THE MIDDLE ATTACK DETECTED. Something is modifying connections on this network. This may compromise security and performance."

Then hopefully having multiple different apps show a message like that to every user on the network will get enough users complaining to fix the middlebox so that it stops breaking new things.

What are middleboxes? This is something that I've never heard of before.

Anything on the path between the endpoints that looks at layers above IP. Obvious examples are NATs, firewalls, transparent proxies, traffic normalizers, so-called protocol accelerators. Some boxes I've no idea what they do. Take a look at our paper in reference [0] above, which probes what they actually do to traffic.

Middleboxes are machines somewhere along the path (when you're sending to a particular destination) that drop, filter, bandwidth shape, or otherwise edit your traffic. Basically anything besides packet forwarding.

Middlebox is a technically imprecise term that is not in common usage among technical people anywhere. Why people in this thread are trotting it out is anyone's guess.

Middlebox has quite precise technical meaning: anything that works at higher layer than on which it is visible to neighboring network.

Usualy this refers to various "security" "solutions" which attempt to do deep packet inspection and generally break various things, but it also can mean NAT or L2 bridges that do filtering on L3/L4 headers (for example DOCSIS CMs and CMTSes are such middleboxes)

That's just plain wrong, in the context of protocol development it is a commonly used term, and not exactly uncommon in wider networking space as well. See MPTCP mentions in this thread, presentations and mailing lists around development of TLS, HTTP2, QUIC, ...

What do they use?

So this is an interesting idea, however, I suspect it may be quite problematic on the receive side if implemented.

Once upon a time, I used to write network protocol sniffers. I basically simulated a partial TCP stack, in order to allow higher layer protocols decode traffic that was out of order. If a packet was lost in the capture, it's basically the same thing as the "lying ack" in the article, where the real endpoint would ack a frame the capture engine never saw.

This becomes really difficult to process, because TCP simulates a stream, and if you lose a packet, and skip some bytes in the stream, where you pick up and continue may not be the beginning of a new message. So you also need to create heuristics, on how do you discard received data, until you get to a point where you can understand the stream again, and continue processing.

As others have commented, UDP is likely a better choice here, as ordering semantics can be added, but on packet loss, the next received packet can always be parsed. Or look at SCTP, which allows turning on/off in order deliver, and I believe through an extension allows turning on/off reliability (but I'm not sure if this ever got implemented).

> This becomes really difficult to process, because TCP simulates a stream ...

The irony of TCP is that in almost every use case you end up creating "frames" within the stream. Turtles all the way up I guess.

The frames inside your TCP stream are sized to your data, not to your physical layer, this makes a huge difference in usability. The framing can also be as simple or complex as you like.

Exposing TCP framing to the application at the receiver side does not make much sense. On the other hand I'm somewhat puzzled by the fact that TLS specifications explicitly discourage exposing TLS record boundaries to applications which is exactly the case of turtles framed in turtles all the way down.

Edit: we can all collectively make fun of telco protocols that are HDLC-in-HDLC all the way down, but such things usually do framing only on the outermost layer.

Yes, the receive side would need to have a plan to interpret the data.

One way to do this is to use a format where you can resync. For example, have special values that only occur in certain positions. You look for one of those values and can then enter a known state.

Another possibility is to be selective about what you will allow to be discarded. Suppose you have a framing structure so that you send a byte count followed by a blob of data (like video or audio data). Then you can use these anticipatory acks to express that you can live without that blob. For example, if your protocol says "and now here's 123456 bytes of video data", you can ack with a sequence number that reflects having received those 123456 bytes but no more. Obviously, this limits your ability to skip ahead and may not be as useful.

Sorry, I didn't mean to imply that this couldn't be done. I was trying to indicate that it is hard to do and adds a decent amount of complexity as someone who has written software that does this.

> So you also need to create heuristics, on how do you discard received data, until you get to a point where you can understand the stream again, and continue processing.

This can be trivially easy for some data formats, where you know the absolute or relative offset of everything. E.g. here is a chunk of data I don't care about: 32 bit length followed by that many bytes. Read the length, then hit the llseek system call (that would be supported on sockets just for this feature) to skip the stream position that many bytes ahead. Done.

The problem here is that it fails to take into account why any packets might be missing.

If it's due to congestion, you've just subverted the mechanism TCP uses to relieve it - and the more people that are using the deviant implementation, the bigger the problem becomes, which impacts everyone.

I'm not convinced it matters in this case, lying only prevents the other end from being extra-smart.

We'll stick with the VoIP example. Packets are going to be dropped independently of any information in the TCP headers. If there's too much load, some of them will disappear. If your client says "yeah I got that data" when it didn't actually get the data, it doesn't increase load any more -- if you're sending a real-time 256kbps audio stream, then you need to be able to send 256,000 bits every second regardless of whether or not the network is capable of that. By not retransmitting packets you don't care about, you're decreasing load.

Of course, if you push the information all the way up the stack to the application, you can do interesting things. You can notice that packets are being dropped and switch to a codec that sounds worse but sends fewer bits per second, and maybe you'll get a better quality call going on with fewer dropouts.

Now that I think about it, I'm surprised this isn't how we deal with degrading things over slow connections in general. I would much prefer to only get the "mobile friendly" version of a page if I'm actually on a mobile connection. Right now, the hueristic seems to be "if screen < desktop; send mobile page". That of course is silly because my home WiFi can happily pipe huge images into my phone faster than the web server can send them to me, while my 8 core laptop with a 4k screen tethered to my phone can't magically make 4G faster. Interesting interesting.

It matters.

Suppose there is congestion, and that if the sender(s) don't slow down, it will just get worse. Well, they don't find out about the congestion if the receivers lie (and congestion is asymmetric, so that the ACKs get through just fine).

That's not good.

Now, one way to deal with this is to lie but only for a bit. After a while of not receiving anything, the receiver should stop sending ACKs and the sender should notice the congestion. That might help. But the underlying problem is real: lying ACks -> failure to detect congestion -> worse congestion.

Maybe the client should lie probabilistically, so that there is still a relationship between what the sender sees and the actual congestion.

If you really want this just use SACK and let the sender never retransmit, no?

It's how people deal with video (adaptive bitrate streaming). It also used to be how people dealt with images in the dark days of content rewriting middleboxes. So it's not that nobody's thought about the idea: it's that it's way harder to do once it's a question of really changing behavior rather than just being more aggressive with a quality/size trade-off knob.

Also, mobile friendly versions of sites are at least as much about the user interface and rendering speed as about reducing bandwidth usage.

I don't think it is obvious that this would aggravate congestion. It is true that TCP's speed control mechanism automatically slows the transmission rate when it detects lost packets and the article's proposal would mask lost packets. But, as long as the receiver keeps telling the sender, and nodes along the way, to stop sending some packets, bandwidth would scale down as it should. TCP's default policy of increasing latency in order to match available bandwidth is not appropriate for video streams. What this proposal does instead is tell the sender not to slow transmission in terms of frames processed but to instead don't send all the frames which is just what you want for video. But yeah, just use UDP.

I would like to see a proof of concept before I full buy in to the author's claims. It's difficult to tell what this might do to proxies or how all the various router firmwares on the Net might handle it. I could see a hop along the way having trouble with the receiver claiming it received packets it could not have. For that matter, it's possible for the receiver to claim to have received a packet that the sender had not yet generated. The sending TCP stack may very well consider this an error.

> For that matter, it's possible for the receiver to claim to have received a packet that the sender had not yet generated. The sending TCP stack may very well consider this an error.

I happen to have some experience with this case (receiving an ack of an unsent packet).

Linux since 2009 will silently drop acks of unsent data [1]. FreeBSD follows the RFC and will send ack with current sequence and ack numbers to try to 'resync'. As long as this modified stack doesn't respond to that ack with another ack, it would probably be ok. There's a reviewed and accepted patch for FreeBSD to rate limit the acks it sends in this case, but it doesn't seem to have been committed [2]

[1] https://github.com/torvalds/linux/commit/96e0bf4b5193d0d97d1... (although the comment says this is consistent with the RFC, it actually isn't)

[2] https://reviews.freebsd.org/D11929

> For that matter, it's possible for the receiver to claim to have received a packet that the sender had not yet generated. The sending TCP stack may very well consider this an error.

The article deals with this head on: that's the essence of the ambiguity of "I have received up to X" versus "I am not interested in bytes up to X". The second intent is consistent with not having received bytes up to X, which is consistent with them not having been sent yet at all.

The anti-congestion-control situation is when the receiver is in fact interested in getting all the bytes, and so "I am not interested in bytes up to X" is of course a lie. But so is "I have received up to X".

> The article deals with this head on:

I think you missed the point. The article makes no mention of how existing implementations handle this case. It seems the author had only theorized based on his knowledge of the TCP protocol.

I think that if an implementation treats it in either of the following ways, then it's good: 1) treat an ahead-of-sequence ack as "everything acknowledged so far" or, 2) drop the ahead-of-sequence ack as a bad frame.

Other behaviors, like crashing or messing up the stream, of course, spoil things.

There is a problem if the receiver sends only an ahead-of-sequence ack, without acknowledging frames before, and the sender drops that ack. The sender must acknowledge everything actually received, and respond properly to window probes, to ensure forward progress.

Not everyone is looking at packet loss to deal with flow control. Wouldn't TCP BBR be an example of the contrary, and at that, a good example, considering it appears to do better than loss-based algorithms? Loss-based algorithms seem very prone to buffer bloat.

In theory, you don't have a choice about this; everyone needs TCP-friendly flow control, whether they want it or not, or the network experiences either congestion collapse or persistent unfairness.

You mean to say everyone has to use the same congestion algorithm in order for TCP flow control to work and not collapse the Internet? Apparently not Google, since they deployed TCP BBR and use it even for connections from the Internet, including on YouTube, and things seem to be beneficial rather than detrimental. It's not the only TCP implementation that is using stats other than packet drops to drive flow control. I'm guessing it works because both sides of the TCP connection can employ their own flow control for outgoing packets and the asymmetry doesn't do anything since they only care about controlling their own side.

BBR isn't always fair to other non-BBR flows.

> If it's due to congestion, you've just subverted the mechanism TCP uses to relieve it

Bingo; that precisely the context in which I first saw the technique of 'fake ACKS' described: as a congestion-control-defeating mechanism which provokes senders into sending faster.

I suppose that's the answer to Random Early Drop.

I used to argue that you want to use fair queuing and drop the newest packet in a stream, so all the old packets get delivered. But Random Early Drop, which is trivial to implement, caught on in routers. That means you lose random packets from the stream, and they have to be retransmitted.

There's nothing awful about acknowledging a TCP sequence number for a valid packet received even if some previous packets are missing. You know the missing packets were sent, or you wouldn't have the later packet. If you don't need them, why ask for retransmission? The receiving TCP doing this should notice that packets are being lost and cut down the congestion window a bit, so the sender will slow down.

This is purely a receive-side thing. It shouldn't bother upstream middleboxes. But, as someone pointed out, about 25% of middleboxes don't like it.

Could be quite useful for TCP/TCP VPN type applications. It is widely accepted that TCP/TCP degrades performance, hence TCP/UDP is preferred. But UDP isn't always available.

I have read about this long ago. This was in the context not of ignoring don't-care data, but using premature ACKs as a way of garnering more bandwidth.

The idea is to send ACKs ahead of time to provoke the sender into transmitting faster, defeating some of the congestion control mechanisms in TCP.

The network has to be reliable to make it work (the pipe has to buffer all the excess spew), and the sending stack has to ignore situations when it gets a premature ACK, ahead of the sequence number it has actually sent.

So basically redoing a less useful version of SCTP with the same issues in regards to middle boxes. I wish Microsoft would finally get their thumbs out and put SCTP into Windows.

This, like many other current network flow innovations (see: QUIC), is greedy-algorithm short-sightedness that only works as a result of the improvements to regional/inter-datacenter networking over the last decade. They fall down miserably in the face of actual unreliability/variable latency/packet duplication, which still exist on many less developed networks worldwide.

From that point of view, UDP streaming is also short-sighted. Practically, this is nothing more than a type of "QoS" flag, where of course things would also break down if everything started giving themselves priority 1.

So it depends more upon whether the use case is appropriate than it does about technical implementation.

In TCP if you ACK something that wasn't sent the ACK is dropped.

That's not what's being talked about here. The idea is that you ACK packets that you believe have been sent, but have been dropped or delayed, because you've received a later packet.

Good point. Though that puts this somewhere in a middle ground between a best effort stream and a reliable stream. The ability to throw away data you don't care about is not absolute but is instead conditional upon successfully receiving more data.

In a true best effort streaming system, the sender wouldn't retransmit. With this scheme, the sender might retransmit, sometimes, depending on whether certain packets get through and allow certain acks to get sent.

That seems easy enough to work around. Send two acks, one for what you really received and one for what you are pretending you've received.

If the data has been sent but lost, the sender will accept the aggressive ack. If it hasn't been sent yet, it will accept the less aggressive ack.

For that matter, if you have some way of knowing the expected transmission rate (even approximately as long as you can resync to reality), you can blindly send a stream of periodic acks with steadily increasing sequence numbers.

So, UDP.

I don't think that is quite true. I believe with UDP there is no promise of packets being received in order. I think this article is saying that you still get the benefits of processing the packets in the order received, but you don't have to worry about the latency of waiting for the re-transmission of any packets.

It's so trivial to add ordering to UDP, it's really the right protocol to use here.

Subverting TCP leads to all sorts of problems around congestion(which you can no longer filter on because which TCP streams are being non-compliant?) that it just should not be done.

TCP layer also gives you pacing, congestion control and flow control. Which UDP, you'd have to do yourself?

And its not totally trivial to reorder UDP. Say you receive packets 1,2,3,5. How long to you wait for '4'? Maybe it was dropped; maybe its coming.

Then you get 6,7. Is 4 still out there? You've got 3 packets in your pocket, waiting for 4. That adds up too.

TCP gives you some idea of what packets you SHOULD have received, so you can respond better. UDP doesn't have any windowing etc so you have no idea.

I guess I don't really see the big issue with that. It's not like windowing and congestion control is some kind of black magic. It's spelled out pretty cleanly in the TCP RFC and pretty straight-forward to reimplement.

Generally if you're hitting cases where TCP is causing you grief and you need to reach for UDP you've already got enough context to understand your congestion problems/etc.

We've been doing this in game-dev for decades, ditto the voip space so it's not like you don't have a wealth of knowledge to draw from if you're really stumped.

If you just use TCP again you haven't done anything. The whole point is to avoid latency.

Most folks use some UDP-based protocol package instead of reinventing the wheel. Its not rocket science, but it isn't trivial. Defining your own packets to do all the flow stuff is just work, like any other programming task.

I don't think I was suggesting using TCP, I was suggesting implementing the features you like from TCP into your stack if you really need them. You can do congestion control without retry, etc.

I've built variations of UDP based protocols 4 or 5 different times over my career. I'm literally in the middle of this right now with the radio framing protocol I've been developing. I really think you're making it out to be much harder than it is.

I was delighted when DCCP appeared: https://en.wikipedia.org/wiki/Datagram_Congestion_Control_Pr...

It focuses narrowly on a congestion control protocol, and is intended to be combined which whichever datagram-based protocol you have lying around that might be suffering from congestion issues.

Isn't the pacing and congestion control based on ACK's though? And this is suggesting to ACK everything. I feel like I'm missing something.

> It's so trivial to add ordering to UDP

Specially when compared with creating and using a user space IP protocol like was done here, or adding a new one into the kernel.

I'm not sure I understand the distinction. How could TCP guarantee that packets will be received in order without re-transmissions? Re-transmission is a mechanism for making this a guarantee. If the receiver just ACKs everything it gets, then isn't that effectively making no guarantees about the order?

TCP layer of your IP stack does re-ordering and presents them to the client in order. UDP layer doesn't. So by acking every packet, TCP layer will still present what it DOES receive in order.

Right, but UDP also presents what it receives in order - so what's the advantage of forcing TCP to behave this way? I struggle to think of a practical use case where either a) UDP or another protocol wouldn't be selected [e.g. in a VoIP system] in the design phase b) using TCP in this non-standards compliant way would be nothing more than a short-term bandaid because of other constraints (e.g. can't change Layer 4 to something non-TCP).

Absolutely not. UDP delivers packets as received.

Don't underestimate how often packets are received out of order. There's even a consumer DSL modem that swaps every odd UDP packet with every even one - I had to compensate for this in a VOIP product. Using TCP in this bastardized way would cure that. That said, I tend to agree its a poor idea to use TCP in this way. The famous book on IP used to list 8 protocol possibilities (only 2 commonly survive today, UDP and TCP) of which streaming and packet reordering was a valid combination (without acking/retransmitting). Don't know what it was called, but that's what being attempted here.

I think we’re operating on different definitions of in-order and as received. TCP delivers packets in order, but perhaps not as received, if it had to request retransmission of a dropped packet. UDP delivers packets in order that they were received. Doing what the article suggests would make TCP also deliver packets in the order that they were received. No?

TCP won't nack a packet when one is perceived as 'missed' (but really out-of-order); that's clear from the premise (ack everything).

But reordering also happens simply by examining received packets in a burst, and putting them right.

So its six of one and a half-dozen of the other I guess. Sorry for coming off so abrupt.

No problem, thanks for clarifying! So basically the benefit to the TCP NACK approach is that the TCP layer will also do a sort on the packets received?

Yeah, which is imperfect but there you are. It means sometimes out-of-order packets will be reordered, and sometimes dropped (since TCP acked it, the (existing) TCP code will discard the out-of-order packet as 'duplicate'. Which turns out works pretty well in practice, since out-of-order are almost always in a burst (no delay inter-packet)

No. UDP is connectionless, which makes it harder to use with NAT.

To clarify, what I meant is that with TCP, I can set up a two-way communication channel, even if I'm behind a NAT/firewall I don't control. As far as I understand, with UDP this is harder (i.e., does not work with all NAT types), because UDP does not establish a connection, and it does not provide a two-way communication channel. However, I am not up to date on NAT traversal techniques, so I might be wrong.

Which was maybe a problem 20 years ago.

But probably not.

Not sure why you say that, UDP conntracking has been around for a long time.

I suspect that what andreasvc meant is that the default "NAT" configuration of most consumer-grade gear is such that it will block UDP (unless some other mechanism such as UPnP is used)...

That's true for TCP as well. Almost all consumer-grade routers block all incoming connection attempts, not just UDP.

From a stateful firewall's point of view both UDP and TCP have state.

Right. I was assuming that the context here was connections being initiated by the client (as most are).

... in which case blocking is not an issue. Consumer-grade NAT hardware will no more block client-initiated UDP than it'll block client-initiated TCP, at least not without extra configuration.

But it doesn't. Consumer gear NATs UDP pretty much as nice as it does TCP.

Hmm, perhaps you are right... I switched from consumer gear to more "enterprise-y" gear a few years back, so my data points may be a bit outdated.

I've been using normal consumer routers for NATed home connects since 2004 or so and I've never had an issue with outgoing UDP. It's required for basically every video game after all.

If NAT blocked UDP, the DNS client on home computers would never work.

I think most routers set themselves as the DNS server, so NAT is not in effect (the computer only sends the request to a local address) unless you define a custom DNS server, which isn't common for home users.

That said, I've never seen a router that didn't allow UDP packets to flow back to the origin client.

> I think most routers set themselves as the DNS server

DNS forwarders like dnsmasq are a relatively recent inclusion in home routers. Sure, they've been there for 10 years or so, but they weren't there for the 5+ years before that. Before Linux took over the embedded OS on home routers, the DHCP servers just passed the DNS configuration that the WAN port got from the ISP, and you can still do that now if you want. That's why nslookup.exe and dig still work on your workstation when you specify an external DNS server instead of the one your DHCP server on your home router gives you.

> That said, I've never seen a router that didn't allow UDP packets to flow back to the origin client.

Which is the point I was making.

Most consumer-grade gear can easily handle two-sided NAT transversal with merely a STUN server (no UPnP or relaying required).

Then don’t use NAT, it’s a hack anyway. IPv6 is not exactly a new idea.

Why does it matter that IPv6 is old, if your users are only using IPv4 ?

not exactly:

> If it’s used that way, it makes TCP a reliable, in-order protocol.

the key here is "in-order".

It's absolutely trivial to put a sequence number in UDP packets.

The question is whether you want the other mechanisms of TCP: handshaking, (limited) retransmission, flow control...

Never mind that most sensible streaming formats will include data in packets that allows the receiver to rebuild missing packets (up to a threshold).

A typical rtp stream will have smpte fec on top, allowing burst lost of say 20 packets, or random loss theoretically upto 5%

In the last 10 years of streaming rtp over the Internet the vast majority of failures is bursty. Reordering is more rare than you'd expect.

Looking at one sample from India to Europe over a period of 6 weeks, my 30mbit rtp stream was fine 99.998% with fec. The rest of the time that's why I dual stream, either timeshift (send the same packet again 100ms later - as I said most outages are time based), or dual path (although if the packets traverse the same router en route there are issues), or even both.

Proposed socket-level API: allow llseek on the descriptor, in the forward direction only.

Why would anyone implement VoIP over TCP to then lie to TCP in order ot make it practically UDP?

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact