
Dealing with IPv6 fragmentation in the DNS - jsnell
https://blog.apnic.net/2017/08/22/dealing-ipv6-fragmentation-dns/
======
the8472
> IPv6 Extension Headers require any transport protocol-sensitive functions in
> network switches to unravel the packet header’s extension header chain. This
> takes a variable number of cycles for the device and furthermore requires
> that the switch should recognise all the extension headers encountered on
> the header chain. This is anathema to a switch, in so far as it entails a
> variable amount of time to process.

If it is so much anathema to their nature then maybe they shouldn't insist on
violating network layers?

~~~
rocqua
I agree that switches shouldn't care about TCP or UDP ports. That is a job for
a firewall. However, that does not mean it is a good idea to have variable
length headers in IPv6. In fact it seems like a singularly bad idea to me.

Regarding firewalls, I imagine it still sucks a firewall cannot judge each IP-
packet separately but needs to deal with possible fragmented packets. Then
again, that is the cost of being a firewall. You need to function at the
application level and yet decide on the IP-packet level.

~~~
colmmacc
Switches use L4 ports to implement flow-switching. When multiple paths for a
packet are available, it is convenient (it leads to less re-ordering) if the
related packets from the same flow all choose the same path.

~~~
avasylev
It's not just convenience for anycast addresses, with multiple pathes and
anycast if your round robin packets across different routes, you may end up
receiving them in two different locations (ok for UDP, but TCP broken
completely). Anycast is widely deployed now and imho ECMP support is now
considered "standard". I'm not sure what are other alternative solution to
ECMP that don't require looking into TCP headers.

------
tinus_hn
> another painful issue in today’s IPv6 Internet, namely that of network
> filters discarding ICMPv6 Packet Too Big messages.

How about we just let the people who configure these filters solve their own
problems?

~~~
kevin_nisbet
I've actually had this conversation before, and it get's a bit tricky.

I'd say it has a few main facets: \- ICMP is dangerous, and needs to be
blocked. This is partly true, as there is historically information leakage
with ICMP under certain codes. So blocking all of it is easier than trying to
figure out what should be allowed and what should not.

\- It's not perceived to be a problem. Most services just work, so TCP/IP with
mss clamping will avoid the fragmentation issue for many services. So the
services that have an issue, might be something like 2%, and the question that
always lingers is if everything else works, and this one thing doesn't, it's
not the filters problem.

\- ICMP doesn't necessarily work with multiple layers of encapsulation anyways
I encountered this on mobile wireless networks, where doing multiple rounds of
encapsulation caused multiple packet size reductions. Basically, if we
encapsulate in tunnels used by the mobile network, then encapsulate again in
say IPSec, the IPSec node will send it's icmp packet too big to the tunnel
endpoint, not the originator of the traffic.

While I suppose it could be argued that the tunnel endpoint should maintain
it's own PMTU, and pro-actively send it's own icmp packet too big based on
this, it can be difficult to guarantee these components work.

I've seen lots of telco equipment struggle with handling this properly, and
it's not just filtering, it's bugs.

So unfortunately, we're in a situation where many networks are likely miss-
configured, but lots of other services for various reasons just work. So
unless you can drive the networks with problems to actually change,
realistically you just lose a portion of your own customer base if you don't
handle the situation while your competitors just work.

~~~
feld
> I'd say it has a few main facets: - ICMP is dangerous, and needs to be
> blocked. This is partly true, as there is historically information leakage
> with ICMP under certain codes. So blocking all of it is easier than trying
> to figure out what should be allowed and what should not.

It's actually quite easy to identify which should be allowed and which should
not.

ICMP: 0,3,8,11 (echo reply, destination unreachable, echo, time exceeded)

ICMP6: 1,2,3,4,128,129,135,136 (unreachable, packet too big, time exceeded,
parameter problem, echo request, echo reply, neighbor solicitation, neighbor
advertisement)

See, that wasn't hard. These are the base requirements for you to have a
reasonably functional IPv4 network, and the absolute minimum requirements for
IPv6 to work properly.

~~~
verri
But why filter ICMP at all? I can understand that ICMP allows for covert
tunnelling, but by that logic any IP protocol number should be blocked.

~~~
mgsouth
I think it's more the incoming ICMP that is troublesome, particularly
redirect, and to a lesser extent destination unreachable (DOS).

------
stephengillie
Having a variable header complicates Cut-Through switching[0], which forwards
the entire frame just after the switch gets the destination field of the
header.

This sounds like it would force a return to Store-And-Forward[1], where the
switch waits for the entire frame to load into memory, before forwarding to
its next hop. Waiting these few milliseconds doesn't sound bad, until you
consider that's added to _each_ packet, reducing (e.g.) application
performance.

[0] [https://en.wikipedia.org/wiki/Cut-
through_switching](https://en.wikipedia.org/wiki/Cut-through_switching)

[1]
[https://en.wikipedia.org/wiki/Store_and_forward](https://en.wikipedia.org/wiki/Store_and_forward)

~~~
squeed
The real problem is that you can no longer look at a fixed offset for your
parameters, which complicates filtering silicon.

In other words, if you want to have a firewall rule of "drop tcp 135", in ipv4
you can just look at bytes 0x24-25 (for Ethernet). IPv6's extension-header
mechanism means that the header has a non-fixed length, so you need to do work
for the same effect. Given that firewalls are critical in a world without NAT,
this can be a scary prospect.

Part of the blame falls on vendors who go "well, IPv6 is fringe so we'll just
do all firewalling in software." The protocol is 20 years old, it's time to
design the silicon.

~~~
verri
But IPv4 headers could have a variable length too, it's just that we don't
encounter that much in the wild. And what about the AH header, GRE and IPIP
tunnel headers, 6rd/6in4 tunnel headers, etc. Filtering at fixed offsets
sounds very brittle to me. Isn't the real problem here that network providers
and administrators appropriate the right to filter on OSI layers they
shouldn't be touching? This problem sounds like the exact reason why Google
insisted on having its QUIC headers ciphered: so network equipment can't pull
of this kind of misbehaviour.

~~~
noselasd
At least with IPv4 you can easily compute the start of layer 4, it's just
ipv4[0]&0xf * 4 - trivially implemented with little real-estate in hardware.

IPv6 requires you to loop through all extension headers to reach layer 4, and
you need to know about a handful of those extension headers as not all follow
the same format.

------
tveita
It sounds like IPv6 acknowledges that most protocols try hard to avoid
fragmentation anyway, so fragmented packets are the exception.

And it's not like fragmentation hasn't caused problems with firewalls in IPv4
either.

E.g.
[http://all.net/Analyst/netsec/1995-09.html](http://all.net/Analyst/netsec/1995-09.html)

The fix is for equipment to handle the fragmented packets properly, just like
for IPv4. How is forward fragmentation, which involves creating multiple new
packets + headers with recomputed checksums, easier than simply parsing a
header, which is apparently "anathema to a switch"?

~~~
skywhopper
The problem is that the fragmentation header in IPv6 is not deterministic, so
it can't be parsed in constant time, which is what makes it "anathema to a
switch" which needs to be able to guarantee certain levels of packet
throughput. IPv4-style fragmentation can be done in constant time.

~~~
pas
It is still bounded time, which is still O(1).

Yes, the standard doesn't limit the number of extension headers, but looking
at the fixed header, and at most 5-6 extenion headers is the maximum time
required to process a packet. And that'd be enough for 100% of traffic.

The exact limit is the number of header types known to the processing node.
Because encountering an unknown header should result in Parameter Error, and
only one header can be used multiple times (the Destination Options), but that
should be after the Frag one anyway.

------
anc84
> The initial connection between Cloudflare's network and the origin web
> server timed out. As a result, the web page can not be displayed.

Sad to see even a NIC letting itself be MITMed by Cloudflare :(

------
dom0
_Maybe we should bow to the inevitable and recognise that IPv6 is an unfixable
problem._

~~~
pilif
if it's unfixable, why is 40% of our in-office internet traffic running over
IPv6? Why is 100% of our LAN traffic over IPv6? Why is 30% of the traffic to
our sites running over IPv6?

~~~
majke
The article suggests "Fragmentation in IPv6 is unfixable", not ipv6 itself.
Try running this to see if fragments get delivered to you:

[http://icmpcheckv6.popcount.org](http://icmpcheckv6.popcount.org)

[http://icmpcheck.popcount.org](http://icmpcheck.popcount.org) (IPv4 version)

via: [https://blog.cloudflare.com/ip-fragmentation-is-
broken/](https://blog.cloudflare.com/ip-fragmentation-is-broken/)

~~~
joeseeder
Network I am on atm, IPv6 give me both green, IPv4 MTU fails ...

It means my IPv6 vpn is better than IPv4 delivered by some wannabe corpo
network masters.

~~~
pilif
the firewall is probably blocking ICMP. For a long time, it was considered
best-practice to blanket-block ICMP and for a long time you could mostly get
away with it.

But over time some really useful stuff has been added to ICMPv4 (early
congestion notification for example) and IPv6 mostly won't work at all without
ICMP.

I would argue that it's time for firewall administrators to reconsider this
decision. The time where ICMP packets could do harm are long past (remember
the ping-of-death on Windows 98?) and I would argue that the time when a
network could work at peak efficiency (or at all) without ICMP are also mostly
past.

~~~
oasisbob
> For a long time, it was considered best-practice to blanket-block ICMP

I agree with your point, but think this needs some clarification about who
considered it a best practice - among enterprise firewall admins and
financial-industry compliance audits, perhaps.

Large scale network operators have seen the problems associated with entirely
blocking ICMP for at least a decade. (v4 has blackhole problems too)

A lot of the hatred for ICMP came from simple things like easy endpoint
discoverability. nmap blew that argument out of the water pretty quick.

