Hacker News new | past | comments | ask | show | jobs | submit login
Why do UDP packets get dropped? (jvns.ca)
94 points by sebg on Aug 31, 2016 | hide | past | web | favorite | 74 comments

I was excited when I saw the title -- UDP is the workhorse for data transfers on many projects I work on. The info, though, was very basic.

TL; DR version: packets get dropped when some buffers, at your local computer or a router between here and there get full.

I do not want to sound too critical -- the info is good for someone who never heard of UDP.

But I was hoping for more information. More substantiation on why full buffers are the main source of UDP drops (e.g., can smart throttling take some/most of the blame -- given the need to drop a packet, dropping a UDP is usually less painful than dropping TCP, etc.)? any quantitative numbers on sample network / hardware? etc.

Dropping TCP is normally preferable, I'd have thought, as it'll cause the TCP socket to back off. Dropping UDP is less likely to lead to such behaviour.

TCP is rather sensitive to dropping. Very small fraction of package drop cripples TCP connection's throughput. It is a well-known research topic in TCP; and probably is the holy grail in the field.

It's more because UDP is designed to be an unreliable protocol, packet losses are expected to happen and applications will have deal with it - meanwhile dropping a TCP packet may cause the congestion control algorithm to back off, but you're guaranteed to waste more traffic while those TCP sessions figure out WTF happened and retransmit.

UDP isn't designed to be unreliable. It inherits the reliability of what it's built and run on, and doesn't compensate for it.

It's designed to be unreliable in the same way that a car with crumple zones is designed to be driven into a wall.

Of course dropping traffic isn't the intended purpose of the protocol, but it's designed to allow unreliable communication when you're stuck with that situation.

UDP should be considered in the context of IP network and TCP. In this canonical and typical context, it is designed to the unreliable alternative of TCP, as an IP transport layer protocol.

I recall reading that UDP was left as such a thin wrapper over packets so that when another protocol like TCP was found to be the wrong solution for a problem, you could easily build your own reliability and congestion protocols over the top of UDP.

Of course, I think that reference was in a book, so I can't find it now.

Semantics. There was a conscious decision involved.

"Smart throttling"? The point of UDP is that it doesn't impose a flow control discipline.

If a link in the path cannot handle all currently requested packets (in whatever protocol) then one of the endpoints needs to decide what to send and what to delay/drop. It can do this randomly, sequentially (youngest/oldest) or try to be smart about what causes least impact.

A few systems I saw would, in this case, heavily favor dropping UDP on the assumption that the applications using it can handle dropped packets in stride; plus, dropping TCP packets only causes more traffic in the short term due to retransmissions.

In this case the drops could have nothing to do with buffer sizes -- UDP packet can still be dropped even if it is the only UDP packet in a large buffer.

All of these posts are fantastic. I can only presume at some point they're going to be collected and turned into an extremely good book on Unix performance and diagnostics investigation; it'd be a worthy sibling to the older great Unix books like Panic!.

The infectious tone is part of it, but I think the bigger win is just how little Evans cares about what the reader/writer is "supposed" to know. Each post goes from a standing start with almost nothing presupposed all the way into the weedy details.

Regarding this post: it's getting close to a pretty big idea. Once you grok why packets get lost† --- congestion --- you're pretty close to understanding the big idea behind TCP, and how the Internet works and figures out how fast to send things, even though we're on crappy wi-fi connections connected to even crappier DSL lines connected to OC192 backbones.

Fun additional reason we found when trying to build a next-gen "routed" IRC in the 90s: when routes change!

This is not an original opinion, but I love her style of writing. I can feel the pure unadulterated joy at learning these things. Sometimes I learn along with her, sometimes I am seeing an old subject through new eyes. Always, it's worth the read.

Her category 'lost in transit' is perhaps the biggest cause of UDP drops. No matter how big your buffers on the send/receive side, if an intermediary carrier decides UDP is not important or a D/DoS threat to their network, bye bye packet... or in some cases of rate-limiting, see you in a while perhaps.

A few rabbit-holes to dive down:





The app can be at fault too. Some traffic is bursty (video frames, large files, images) and some client stacks have tragically small IP buffers (as little as 128K in some OSs). Apps must be prepared to read to exhaustion without pausing to process in those situations, then process once the buffer-storm is over.

Yep, I once saw a case with syslog over UDP with thousands of servers reporting to one central logging point. All the servers ran a command at the same time, and logged a message within ms of each other. The flood of UDP messages caused completely predictable input buffer overflow.

Otherwise called the stampeding herd effect. The way to address this is to introduce a random sleep on the remote servers before the command executes to spread the load.

Large UDP datagrams are much more likely to be dropped than small ones. A short datagram will fit in a single IP packet. A maximally sized datagram may take about 40. If you have a 1% packet loss rate, then the short datagrams get lost 1% of the time, but the huge ones get lost 33% of the time ( 0.99^40 ). With a 10% packet loss you get almost 99% loss of maximally sized UDP datagrams.

The extra pain is that even though the huge datagram is lost, almost all of its data is transmitted. So you have a congested line and you are hammering it extra hard by resending the same data over and over.

Moral: unless you can guarantee low IP packet loss rates across the entire route, be very careful about large UDP packets.

Counter moral: I built a very nice VPN over TCP solution for a customer who insisted 10% packet drop on their network was fine. Completely fixed their large UDP packet based legacy system… which I had designed a decade before.

Counter-counter moral: detect Jumbo packets and use them. Except maybe still not over general Internet.

Modern OSes have a thing called black-hole detection, it is a good idea to have something similar in your protocol running on top of UDP.

This applies to _ALL_ IP packets, not just UDP. The lack of buffer credits/etc on IP mean that flow control is a function of the higher levels. Your TCP packets are also getting dropped for all the same reasons, its just that TCP backs off, and retransmits, so you don't see it as anything other than a slowdown.

If so many packets get lost in buffer overflows, can you somehow make the system wait for the buffer to empty?

If sending a packet syncronously, the system call would just hang until the buffers have drained enough.

How would you do it while receiving? Does the outside network just bang bits through the cable like a radio station? Then you'd have no choice but to save everything as fast as it comes in, lest you loose packets. Or is there, deep down on the lowest levels of the stack, actually some kind of request/response going on, even for UDP? For example at the level of Ethernet frames, or even individual bits? Like "here are some bytes. got them? - yes. - here are some more. got them? - (waiiiiting....) yes." Then you could just let the next router in the system wait while you drain your input buffer.

Even if there is no request/response going on, you could still view your incoming as the next routers outgoing. Configure the router to wait with sending until your client has drained the router's output buffer enough. (That would require the router to know how much data you can take.)

> can you somehow make the system wait for the buffer to empty? If sending a packet syncronously, the system call would just hang until the buffers have drained enough.

It sounds like you're trying to make UDP lossless. Use TCP instead.

What if I'm running VOIP? Or a real-time video game? I don't want to wait for the dropped datagram to be received before processing the next one. It's too late! It should have been received already. Skip it and use the next one.

> Does the outside network just bang bits through the cable like a radio station?

Basically, yes. Ethernet has no flow control. IP has no flow control. UDP has no flow control. TCP does.

> is there, deep down on the lowest levels of the stack, actually some kind of request/response going on, even for UDP?

No. Ethernet, as commonly deployed, has no need for this, but CSMA/CD is the retransmit part of the specification in case you are using coax tap or a hub. Each Ethernet frame also has a Frame Check Sequence, and the receiving node will discard the frame if it doesn't match up. It's not up to Ethernet to retransmit. Use TCP for that instead.

> Ethernet has no flow control.

Except it does: https://en.wikipedia.org/wiki/Ethernet_flow_control

Whether people use it or not is another matter!

> Ethernet has no flow control

it has: https://en.wikipedia.org/wiki/Ethernet_flow_control.

edit: formatting

What we found when working with UDP (multicast) at a previous gig (its been a while).

The messages can be up to around 64KB. I say "around" because if different OS's based on experiment did different things (this caused us much confusion). I think HPUX would drop them silently on send if they got to 62. at 64 it would return an error. Keep them below 60 to be safe.

Multicast means the router has to be set up correctly. When they muck with the network and add a hop, make sure your TTL (time to live) is set correctly on send. One of our OSs had a strange default to this.

I liked the all or nothing receive nature of UDP messages. Never a waiting and reading for the rest of the data like with TCP. You have no idea if the message got to where it was sent, sometimes you don't need that. Very low overhead too.

Multicast was the selling point for us. Send and everyone attached to the group gets the message. You can subscribe to the groups and see all the messages which makes debugging easier.

UDP is just a thin wrapper on an IP packet that adds port and checksum. This in turn is just a thin wrapper on whatever underlying network packets you use.

So UDP packets are as reliable as TCP packets in principle. TCP just hides the unreliability with flow control and retransmits.

There is also UDT as a reliable UDP-based alternative to TCP. Some day I'd like to hear from someone who used it whether it's worth it and when it makes sense.

Also, if you use Chrome, you're probably already using QUIC. Go to chrome://net-internals/#quic to see for yourself.

Buffers is the putative reason. We'd be tempted to suggest bigger buffers would help. But that contributes to the famous buffer-bloat problem that tanked the internet some years back.

The practical solution is to meter traffic (don't send faster than they can be processed on the receive end or faster than the tightest congestion bottleneck enroute). At the same time process as quickly as possible on the receive side - don't let a single buffer sit in the IP stack longer than absolutely necessary. E.g. receive on one higher priority thread and queue to a processing thread.

   E.g. receive on one higher priority thread and queue to a processing thread.
Doing that you've only created another buffer. But it's worse, because your buffering thread has a higher priority than the processing thread, you've moderated the back pressure, effectively signaling to the sender that you can handle more packets than you actually can.

Yes, the solution to buffer bloat is to only keep buffers at the ends, but whether that's in the kernel or in your application shouldn't matter in most of cases, if any at all. Better to just let the kernel handle it and not recreate the wheel.

Tweak the kernel send/receive buffers if you want. The defaults on Linux are usually too aggressive, but I don't see any need to do much more than that.

UDP has no back pressure, that's the point here.

And as observed elsewhere in this thread, some OSs have tragically small buffers (128K). Its absolutely vital to keep those buffers from filling in ambitious apps.

I wrote audio/video/screenshare communications code for years. In the bursty situation I described, the whole point is to offload the ip stack buffer into the app buffer at high speed.

Ah, right. But with the caveat that there's not any other control flow. Most UDP-based protocols support either some kind of flow control (e.g. RTP/RTCP) or retransmit (e.g. DNS), and that was my frame of mind. To stop buffer bloat it's important for people to stop implementing hacks that make a peer look more responsive than it actually is. It would be a shame if people got the idea that UDP necessarily meant that lessons of buffer bloat don't apply.

I've also written streaming media services for many years. RTP/RTCP, for example, supports adaptive rate limiting, though few implement it. The RTCP sender and receiver reports signal packet loss and jitter so that the sender can, e.g., dynamically decrease the bitrate. If implemented properly, buffering too many RTP packets can hurt the responsiveness of the dynamic adaptation, which can quickly lead to poorer quality. (Modern codecs help to mitigate this issue, but largely because the creators have spent a lot of time putting more adaptation features into the codecs and the low-level bitstream knowing that software higher up the stack is doing it wrong.)

For DNS, because Linux has a default 65KB (or greater!) buffer on UDP sockets, it's trivial to get huge packet loss when doing bulk asynchronous DNS queries. The application will quickly fill the deep UDP buffer; with the deep buffer the kernel will keep the ethernet NIC frame buffer packed, with the result that you'll see a ton of collisions on the ethernet segment and dropped UDP packets once the responses start rolling in. That results in a substantial fraction of the DNS queries have to retransmit, and because the retransmit intervals are so long, that means a bulk query operation that could have finished in a few seconds or less could take upwards of a minute as the stragglers slowly finish or timeout. Without the deep UDP pipelines, the ethernet segment would be less likely to hit capacity, would see fewer dropped packets, and so the aggregate time for the bulk query operation would be several times less.

Reducing the UDP output buffer is substantially easier than implementing heuristics or gating for ramping up the number of outstanding DNS queries. The latter, if well written, might be more performant, but just doing the former would alleviate most of the problem, allowing you to move on to more important tasks.

The the root cause is always buffers? That's it? Nothing ever gets lost for some other reason, maybe some sort of collision?

They explicitly don't go into what happens on the routers between sender and receiver. Congestion control is a huge part of TCP's design. Don't know how common it is these days, but that is another place where packets will just get dropped. TCP packets get dropped too, but detecting that, resending, and then throttling back is built into the protocol. With UDP you either don't care or you handle that differently higher in the stack.

It could also be data corruption. Some bits flip in the header, and either the UDP fails a checksum validation, or gets misrouted.

There are also software reasons. A network stack is configured for some maximum UDP size and a datagram exceeds that size. On Mac OS X, the default maximum is just a little over 9200 bytes; you have to

   sudo sysctl -w net.inet.udp.maxdgram=65535
The buffer space is there; this is an administrative policy reason for packets being tossed.

Datagrams can be dropped due to being unroutable. Some router gets rebooted, and so its peers drop packets. TCP connections keep going after a little pause; UDP sessions lose datagrams.

Lastly, IP datagrams are dropped if their TTL (time to live) decrements to zero as they cross a router, even if they are otherwise routable, not too big for any buffer, and with a correct checksum.

There's all kinds of other reasons:

- Packet gets corrupted. A bit in the right place could cause it to be dropped or delivered to the wrong destination.

- Something goes wrong on the PHY layer, say a dodgy fiber connection between two nodes. It'll then get dropped due to Ethernet CRC mismatches.

- There are network nodes whose primary reason for existing is selectively dropping packets. For example traffic shapers, firewalls.

I think there are other IP/routing reasons why an IP packet could get dropped for example if it needs fragmenting but has the no fragment bit set. Also if the TTL runs out from the packet being forwarded too many times.

I imagine all the fancy routing protocols people use could end up dropping a packet. Maybe that can be lumped in with traffic shapers and firewalls?

Fun story about randomly dropping UDP: comcast did that when they first started doing traffic shaping and it ruined my team fortress classic games. I assume there must be a better way than dropping packets. Maybe buffers :) ?

Queuing tends not to be a great traffic shaping strategy. The problem is that the sender will almost certainly not react to the extra queuing delay in any way (nobody runs RTT-sensitive congestion control in practice). And if the sender doesn't react to the queuing, it'll fill up the buffer and the same amount of packets will be dropped anyway. So you got increased RTTs and no reduction in packet loss.

There is a better way of doing shaping for TCP traffic, by making the sender reduce the rate of transmission by other means than dropping packets. For example via manipulating the receive window. (And other, more sophisticated variations on that). But these methods aren't applicable to UDP in general, at best to some small subset of UDP-based protocols.

Even TCP Reno is sensitive to RTT and if it's ubiquitous. Likewise common Linux BIC and CuBIC.

Plenty of VoIP and video streaming protocols also handle RTT hikes.

They didn't go into the number of ways packets can be lost inbetween the sender and receiver. In my experience it is generally some small corruption at some point that causes the IP or ethernet checksum/CRC to fail and the packet gets dropped. It is extremely uncommon nowdays and generally due to failing hardware, buffers are by far the most common cause.

Except on any kind of wireless network or ADSL (due to unstable high frequency channels over crummy phone cable).

Mostly not. But she wrote that she has no idea about how the internet (or probably communications in general) works: Communications channels can be described by certain channel capacity as well as a signal to noise ratio (SNR). The lower the SNR the higher the chance of losing information (packets or parts of them). Encoding and error correction can help to a certain extent, but will lower the throughput. Whatever you do, Theres always a chance > 0℅ to lose something. On wireless connections much higher. If internet on your mobile doesnt work properly it's much more because of this and not because of overloaded buffers. The extreme case would be a cut cable, which leads to a SNR of 0. Depending on the medium and topology other errors can Happen, e.g. routing errors.

Im surprised that nothing like this is mentioned, isnt that undergraduate university stuff?

"lost in transit

"It's possible that you send a UDP packet in the internet, and it gets lost along the way for some reason. I am not an expert on what happens on the seas of the internet, and I am not going to go into this."

Well. Ok.

Yes that's the most common. Assuming good hardware and application software the one I see second most often is when a UDP packet which is more than one IP fragment goes-out but an ARP has to happen only one of the IP fragments is then sent so the other end can't combine them. This is an old BSD network stack bug that lives on in hardware and software out in the wild to this day.

> The the root cause is always buffers? That's it? Nothing ever gets lost for some other reason, maybe some sort of collision?

Indirectly; lots of collisions put stress on the buffer, so the probability that the buffer will be exhausted will increase.

Of course, if an intermediary router fails after sending the message, a UDP packet will also be silently dropped.

There aren't a lot of true bus topologies out there any more for collisions to occur on. And Wi-Fi retransmits.

Just to be clear, you mean Wi-Fi retransmits in the case of a collision a-la Ethernet, right? Was talking to someone a while ago who was under the impression that a TCP packet that didn't get ack'd within a certain window might be retransmitted by a Wi-Fi router that was closer to the destination. I was pretty certain it was wrong, but since you said, "Wi-Fi retransmits", just checking that's not what you meant.

Yeah he's wrong, routers don't buffer for downstream reasons in any context I've seen.

That being said, Wi-Fi retransmits on more than just collisions. Under the hood, Wi-Fi actually has it's own acknowledgement protocol beneath the Ethernet layer. The idea is that with electrical bus protocols like Thicknet, each tap has a pretty good view of the whole bus. For wireless you might see your own frames just fine, but the destination might not. So you need to retransmit not just when you see collisions, but when the destination hasn't explicitly acknowledged your packets.

So does wired ethernet.

Well, wired Ethernet isn't a bus topology anymore, so there aren't collisions.

So, I'm old. Get away from my thicknet.

> So if you have a network card that's too slow or something, it's possible that it will not be able to send the packets as fast as you put them in! So you will drop packets. I have no idea how common this is.

This is extremely uncommon on 1Gbps NICs but is much more common on 10Gbps+ NICs. Also this type of dropping can happen on both RX and TX.


Also the author missed a set of buffers. The RX/TX rings on the NIC! These store packets on the NIC before they are moved to RAM (for RX) or sent on the wire (for TX). You can see them and configure their size using ethtool on Linux.

   $ ethtool -g ens6
   Ring parameters for ens6: 
   Pre-set maximums:
   RX:		4096
   RX Mini:	0
   RX Jumbo:	0
   TX:		4096
   Current hardware settings:
   RX:		512
   RX Mini:	0
   RX Jumbo:	0
   TX:		512

"This is extremely uncommon on 1Gbps NICs but is much more common on 10Gbps+ NICs. Also this type of dropping can happen on both RX and TX."

Is this backwards, or does the faster NIC really drop more packets? That seems like it would be unfortunate.

Not at "low speeds" but at line rate it requires a lot more CPU power to process all of those packets which often leads to drops. These drops specifically are called RX ring/fifo overflows/overruns and happen when the NIC enqueues more then rx_ring_size packets before the OS initiates a DMA of incoming packets from the NIC to RAM.

I suppose if you point a 10Gbps UDP stream at 1Gbps NIC there will be drops but these drops will happen at the switch not at the interface which is a different type of dropping.

Gotcha, so it's the NIC dropping the packets, but really it's because the host machine fails to process them in time.

Somewhat off-topic, but what always wondered me is that why we don't have something like BitTorrent-IP or FlashGet-IP, where you you can split stream into independent [stream-like] segments and wait for them separately, retaining sequential structure. I.e. TCP with separate channels. This would solve many problems with 'site stuck in load-progress' when some item near </head> failed to arrive and now entire <body> canot be rendered because we wait for that retransmission.

I know HTTP2(?) and/or SCTP moved in this direction, but seems no luck.

Interesting fact, there's an ARP buffer in the Linux kernel, which holds outbound packets waiting for ARP resolution, which holds 3 packets (not configurable last time I checked).

No. Linux uses the socket buffer to buffer packets while ARP is in progress. (The only sane thing to do imo). I believe the "buffers 3 packets" was how it was done it the early 2.0 kernels.

Windows[1], BSDs[2], OSX[3] only buffers 1 packet per socket while it's ARPing

    [1] Empirical testing.
    [2] https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man4/arp.4.html
    [3] http://www.unix.com/man-page/FreeBSD/4/arp/ and similar for other BSDs

"when a few dropped packets here and there don't mean the end of the Universe. Sample applications: tftp (trivial file transfer protocol, a little brother to FTP), dhcpcd (a DHCP client), multiplayer games, streaming audio, video conferencing, etc."


I used to write processing software for stock exchange data feeds such as OPRA (https://www.opradata.com/), they were UDP only at the time.

Of course packets could be lost so there actually was a paper based form you could fill in and re-request the missed data packet by fax (no-one ever did that, though).

I think by now they implemented a TCP/IP connection to request re-transmissions these days.

The guy also misses the point that packets can be dropped because of integrity check failures: if the packet has been altered in some way (cosmic ray, WiFi interfering with microwaves or whatever), then the checksum fails and the packet is dropped.

Actually, the causes of packet dropping are (AFAIK) exactly the same as in TCP. But in TCP, dropped packets are sent again.

One exception though: in UDP, delayed packets won't be waited for long before being dropped.

The ways in which packets are accidentally lost is the same between TCP and UDP, but if you have 150Mbps total coming in from four different ports and you need to route that data over a single 100Mbps port, then the router needs to choose which packets to drop. I bet that TCP packets take priority over UDP in many routers.

> "I bet that TCP packets take priority over UDP in many routers."

Unless you prioritize them differently.

If a something interferes with WiFi, I'm pretty sure WiFi will automatically retransmit.

I just checked, and it does indeed retransmit. I stand corrected.

Julia Evans is not a guy.

The article doesn't even skim the surface on the underlying reasons why packets get dropped. First, unless you're using some snazzy QoS the router doesn't care if the packet is UDP, so why is he even talking about UDP in the first place? Does he not realize that TCP and others are just as likely to get dropped, TCP just automatically resends.

(I wish authors would date articles.)

URL has the date of 2016 08 24 ...

That's funny, I'd even viewed the source looking for a date, and totally missed the url!

Not so sure that urls should be any more than a uniqid - should they have semantic meaning? Discuss... No don't.


<time datetime="2016-08-24T18:53:10-04:00" pubdate="" data-updated="true"></time>


Though Windows Vista was once installed on 1Gb laptops and sold that way to unsuspecting customers.

Throw shit on the wall and leave to dry for a few hours first.

You can, but it's not generally advisable.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact