
Issue 87 – google-compute-engine – UDP Packet Fragments cannot be reassembled - sunsu
https://code.google.com/p/google-compute-engine/issues/detail?id=87
======
ajross
Broadly speaking, UDP applications which rely on IP packet fragmentation are
broken as designed. If you wanted reliable transport you would have used TCP
or a higher level abstraction. If you wanted simple transport of large data
chunks, you would have chosen likewise. You picked UDP because you have
latency requirements that cannot be met by TCP, and that means you need to
know what your packets are actually doing, and that includes fragmentation.

If you want to play in that world you need to be prepared to handle MTU
discovery on your own, or else design your app around deliberately small
packet sizes.

That's not to say that this isn't a bug. But let's not start editorializing
our titles: the apps for which GCE is "unusable" are buggy apps to start with.

~~~
kentonv
I'm not sure if I'd go so far as to say that apps which rely on UDP fragment
reassembly are "buggy", but I definitely agree that the article title is
exaggerating by calling UDP on GCE "unusable". Many UDP services (including
one operated by my company, on GCE) will work just fine.

Interestingly Linux will actually, by default, throw EMSGSIZE any time you try
to send() a UDP datagram that is larger than the detected network MTU to the
destination. As I understand it, you have to explicitly turn this behavior off
to get fragmentation.

[http://man7.org/linux/man-pages/man7/udp.7.html](http://man7.org/linux/man-
pages/man7/udp.7.html)

~~~
ars
> I'm not sure if I'd go so far as to say that apps which rely on UDP fragment
> reassembly are "buggy"

Isn't fragment reassembly essentially implying all the packets (fragments)
will arrive, and will be rearranged in order? i.e. exactly what TCP does?

For example imagine sending a single 1MB packet via UDP, letting it fragment
and be reassembled. What distinguishes that from TCP?

~~~
cbsmith
> Isn't fragment reassembly essentially implying all the packets (fragments)
> will arrive, and will be rearranged in order? i.e. exactly what TCP does?

No. It is implying that it does what UDP does. TCP is different because it
constructs a continuous stream. UDP is inherently packet/message based.

UDP ensures that all of the IP fragments for a given UDP packet get
reassembled and rearranged and order. The UDP packets themselves are not
arranged in order, and it is possible to see duplicates.

> For example imagine sending a single 1MB packet via UDP

Not possible. You can't send more than a 65,507 byte packet via UDP[1].

Sigh... unfortunately, there are a lot of misconceptions about the basic
plumbing of TCP/IP.

[1] _UPDATE_ : as was pointed out in comments, my statement was only true for
IPv4. Since IP packets in IPv6 can have jumbograms larger than 65,535 packets,
IPv6 UDP does support larger packets. Of course, in that case, UDP doesn't
handle the fragmentation/defragmentation protocol itself, and of course GCE
won't handle anything IPv6.

~~~
ars
> Not possible. You can't send more than a 65,507 packet via UDP.

It's a thought experiment not a specific number. I'm trying to demarcate some
of the differences between TCP and UDP, and UDP reassembling fragments looks a
whole lot like TCP.

~~~
cbsmith
> UDP reassembling fragments looks a whole lot like TCP

There is only a tiny bit of similarity in that they are both dealing with
reassembling a higher layer protocol payload from fragments split up over IP.

Let's count the ways they are different:

1\. UDP does provide ordering guarantees _within_ a UDP packet. It provides no
ordering guarantees between UDP packets. Consequently, the protocol never has
to worry about ordering more than 2^16 bytes of IP data (that makes the
sorting algo really simple and efficient).

2\. UDP doesn't have to deal with complex stalls or partial delivery of data.
Either it has all the IP packets for a given UDP packet (in which case, it
immediately delivers the UDP packet), or it doesn't (in which case it doesn't
deliver it). If there is a UDP packet is missing a fragment, but a subsequent
UDP packet is fully assembled, that one gets delivered.

3\. There is no retransmission. UDP doesn't retransmit. It doesn't even
guarantee deduplication. It is entirely possible that fragmented packets going
different routes can end up creating an echo of multiple copies of a UDP
packet being transmitted to the recipient.

The above makes UDP far simpler, and the contract application layers operate
under very, very different. If you look at your typical TCP stack's logic for
reassembling fragments, you'll find it far, far more complex than for UDP.

~~~
ars
> UDP doesn't have to deal with complex stalls or partial delivery of data.

Are you sure? If some fragments show up and some not, doesn't it have to wait
and see if the rest show up? Maybe it doesn't have to wait for a long time
(like TCP), but it does have to wait.

Both point 1 and point 2 are a matter of quantity vs TCP, rather than quality.
i.e. UDP+fragments is a sort of TCP-lite.

~~~
cbsmith
> If some fragments show up and some not, doesn't it have to wait and see if
> the rest show up? Maybe it doesn't have to wait for a long time (like TCP),
> but it does have to wait.

Much of the stall logic is tied in to whether to wait longer, whether to
deliver partial data, whether to request a retransmit, whether you've already
transmitted the data (once you reassemble a UDP packet, you can fire it and
then promptly forget you ever saw it.. if defragmentation causes you to
reconstruct and send the packet again, that's 100% okay!) and most
importantly, what to do with an open ended amount of data that may need to be
held up while waiting for that one lost fragment.

The logic is simple:

1) Allocate space for UDP packet. 2) Fill in UDP fragment data as it arrives.
3) If all the data for a given packet is there, transmit it and delete all
memory of the fragment. 4) If you timeout, delete all memory of the fragment.

That is so, so, so much easier.

If you looked at the code, you'd not think it much similar.

~~~
cbsmith
To clarify, first paragraph is talking about TCP stall logic. The rest is
talking about UDP logic.

------
dsl
The last update really hits the nail on the head for most Google products:

> Apparently the update from Google is to find another support channel to
> escalate, or only use TCP.

I've never had a Google product issue ever resolved using an official channel
I was directed to. It's only by back channels, friends of friends, posting to
HN, etc.

~~~
bad_user
I've had a couple of issues with Google Apps and their support has been very
helpful, getting a phone call from them and the issue solved in the same day.
For example I got them to revert me from annual to the flexible plan by simply
asking nicely and I even got them to operate settings that aren't normally
available, like changing the primary domain. Nowadays I've been moving off
Google Apps, but let me tell you, it's the support that I'll miss ;-) Also
back in the day when I was working on integrating with AdX, their reply was
very slow, but they did reply and they did help us with our integration.

I do not have experience with GCE, but saying that you don't get support for
any Google product is disingenuous.

------
eloff
Wow, reported May 2014, and still no resolution for such a serious issue. GCE
looks nice in theory, but I've heard no end of problems like this with shitty
communication and support when something doesn't work. I'll stick with AWS for
the time being. I really wish Google would get their act together and provide
serious competition though. That's good for everybody who uses the cloud.

~~~
toomuchtodo
AWS still doesn't support IPv6 except on internet facing ELBs. Pick your
poison.

[https://forums.aws.amazon.com/thread.jspa?messageID=536049](https://forums.aws.amazon.com/thread.jspa?messageID=536049)

[https://www.reddit.com/r/aws/comments/3ccn5o/real_ipv6_suppo...](https://www.reddit.com/r/aws/comments/3ccn5o/real_ipv6_support_in_aws/)

~~~
api
There are loads of good cloud providers with IPv6.

~~~
cbsmith
Yes, just none with Amazon in their name.

------
oofabz
It sounds like UDP packets that fit within the MTU work fine. If you need to
transmit more than fits in one packet (1452 bytes), UDP is a bad choice.

SCTP is ideal for this use case but it is not well supported by OSs or
networking APIs. TCP works but adds overhead. TFTP works, is UDP-only and has
less overhead than TCP, but it does not respond well to packet loss. UDT is
like TFTP done right, and is a good solution if you can setup a dependency on
its large C++ library.

~~~
addingnumbers
> If you need to transmit more than fits in one packet (1452 bytes)

You must never make assumptions about what fits in one packet. The MTU could
be 100, or less, or 8000, or more.

As soon as you start doing math based on MTU values that you don't permanently
have end-to-end control of yourself, you're setting yourself up for trouble.

~~~
oofabz
That's true, it's a bad idea to assume an MTU of 1500. Although there is no
minimum MTU in IPv4, IPv6 specifies a minimum of 1280 bytes. So if you send
your UDP packets over IPv6, you are guaranteed room for 1232 bytes of payload.

------
dang
Please don't editorialize the titles of articles you submit to HN.

The submitted title was "Warning: Google Compute Engine Unusable for
Applications Which Rely on UDP", which several commenters have objected to as
exaggerated.

------
kerr23
AWS de-prioritizes UDP packets making it not a great choice for UDP based
applications as well.

~~~
majke
That doesn't sound right. Consider DNS.

~~~
dfc
Whenever people talk about behavior/treatment of UDP traffic I consider DNS as
a special case. I have no idea how AWS handles UDP but I will never use DNS as
a generalizable example of UDP traffic.

------
halayli
I feel the title should be updated to: "for Applications Which Reply on UDP
packets larger than 1500 bytes".

~~~
fabulist
Any application might very reasonably choose to do this. Maybe version X never
does, and you can satisfy yourself of this by peaking at the code, but there
is no reason to believe that X+1 won't, or that another vendor's product/FOSS
project you need to interoperate with won't, etc.

------
dboreham
This is certainly not the only case of "network subtly broken on cloud VMs".
For example, every provider I have tested (including AWS, Rackspace, Digital
Ocean, Linode) enables TCP segment reassembly offload, and provides no way to
disable it (presumably because it is being done on the host not the VM). This
will typically break TCP tunneling (e.g. using GRE) because PMTUD doesn't work
under these conditions. fwiw a shout out to Soft Layer which is the only VM
hosting provider I'm aware of that does not suffer from this blight (provided
you pay for additional IP addresses routed to your box).

------
kaa2102
I've run into several misfires while using Google Cloud/Compute Engine: MySQL
database access, email and encryption. These features didn't work without
either a Google or third-party service. I set up postfix and use SendGrid for
email and Google's Cloud SQL. You can get tech support at Silver or Gold
level. I think everyone starts at Bronze.

------
api
Wait... you mean there are protocols other than http? Somebody should tell
Google, Amazon, and Microsoft.

