
The EC2 firewall is broken - cperciva
http://www.daemonology.net/blog/2012-11-28-broken-EC2-firewall.html
======
minimax
PMTU discovery on the Internet is generally unreliable. Very few people
understand that it exists, and even fewer understand how it actually works.
Most ADSL (PPPoE) providers rewrite the TCP MSS on TCP SYN packets traveling
over their network to account for the PMTU discovery brokenness. [1] You see
the same thing happen with VPN connections where the PMTU is effectively
reduced by the size of the overhead for the encapsulation protocol.

1\.
[http://www.cisco.com/en/US/docs/ios/12_2t/12_2t4/feature/gui...](http://www.cisco.com/en/US/docs/ios/12_2t/12_2t4/feature/guide/ft_admss.html#wp1062184)

~~~
gonzo
just because people don't understand doesn't give them latitude to break the
RFCs.

~~~
minimax
I agree, and when I worked in that world it was a huge pain in the ass. Users
don't care about RFCs though, they just want the Internet to work, so you end
up doing something that's pragmatic but kludgey. As I pointed out in another
comment, Google is responding with a TCP MSS of 1430 (so assuming a PMTU of
1470). It's just what you do so you can get on with your business.

------
ChuckMcM
From the RFC quoted :

"A packet-filtering router acting as a firewall which permits outgoing IP
packets with the Don't Fragment (DF) bit set MUST NOT block incoming ICMP
Destination Unreachable / Fragmentation Needed errors sent in response to the
outbound packets from reaching hosts inside the firewall, as this would break
the standards-compliant usage of Path MTU discovery by hosts generating
legitimate traffic. "

That would be great, next tell the folks at SBCGlobal to fix their damn
network as well. I don't know how many folks we've had to 'patch' by manually
walking the MTU down on the local router until packets actually get through.
It really really sucks and leads to sending way more small packets than
needed.

~~~
minimax
I'm not sure I completely understand your situation, but turning down the MTU
on an intermediate router is rarely a good fix for PMTU issues. If you have
web servers, you may consider turning down the MTU on the Internet facing
interfaces. Even if you turned it down to say 1400, you'd only have around 5%
more packets ... probably not enough load increase to melt your routers.

In fact, I just ran some test TCP connections against Google, and their web
servers respond a TCP MSS of only 1430, suggesting they are assuming a PMTU of
1470. That sounds pretty pragmatic to me and I wouldn't be surprised if other
large Internet companies did a similar thing.

~~~
ChuckMcM
The way this expresses itself is some web sites work and some just hang.
Looking at the traffic with wireshark you see packets go in but they don't
come back. But other web sites work just fine. So you start ratcheting down
the MTU size on the outbound router until the non-functioning web site starts
returning your calls. Doing ping's with various sized packets can (if the
server is responding to pings) also identify the longest packet you can send
before someone between you and them decided they want a different fragment
size.

------
csense
There should be some sort of open-source testing environment for common
breakages. For example, maybe a bunch of VirtualBox VM's running Linux. Maybe
a unit test for the problem mentioned in the article sets up machines A, B,
FW, and C connected A <-> B <-> FW <-> C, has B fragment packets, and has A
perform path MTU discovery to C. The firewall configuration under test goes on
FW and running the test will catch this problem.

With a good enough framework of this type, all the testing could be done "out
of the box" so all you have to do is set up a disk image or IP address of a
firewall box to test, and the testing is fully automatic. The firewall under
test can run any OS or firewall that will run as a VirtualBox client -- or
even be its own box connected to the testing machine's Ethernet port. Heck, if
you had machines on both sides of some third-party you don't control, like
your ISP, you could even use it to probe their network configuration for
issues without any special cooperation from them.

If the test suite gets good enough, maybe eventually pressure will build on
vendors to make their products pass and we'll see firewall brokenness start to
disappear.

As well as cloud services like AWS, such tests could be used by Linux distros,
operating system vendors, and network equipment manufacturers.

I'd build it myself, but I'm not a networking expert and I'm not particularly
enthusiastic about becoming one.

~~~
fjarlq
<http://www.emulab.net/> is along those lines.

------
mike_heffner
"While at Amazon re:invent I had the opportunity to complain to some
Amazonians..."

So what was their response? Was their response the `ec2-authorize` command to
run?

~~~
cperciva
Their response was generally along the lines of "yes, that's something we
really ought to fix some day...".

One person commented that "public blog posts tend to hurry things along". I
imagine that getting to the top of HN might help too...

~~~
jacques_chester
It got me a refund from a certain blog host.

------
revelation
This sort of shenanigans will be over with IPv6. Blocking ICMP is not an
option there.

~~~
mef
Can you elaborate on why that is?

~~~
revelation
IPv6 no longer supports fragmenting on routers. That means if you don't want
to be stuck with the default minimum MTU of 1280 (which you really don't want
to for low-latency applications) you need to support Path MTU Discovery, which
in turn requires ICMP to go unhindered across a large number of different
networks between you and the receiver.

~~~
aidenn0
ipv6 enforces a minimum MTU of 1280? I've personally run into many VPNs with
much smaller MTUs

~~~
danudey
And so the networks involved in those VPNs need to support path MTU discovery,
and they won't have any problems.

------
kami8845
OK so I can see how it violates standards. How many of the millions of users
that send traffic through EC2 does this affect however? I can see how they
would be reluctant to mess with Firewall rulesets. Even if it they only apply
it to new users that would mean fragmentation ... Keep it simple stupid. Again
it depends on how many users this affects and from the sounds of the blog post
- vanishingly few

------
zurn
In my experience this is the rule rather than the exception, most firewall
configs are broken in some way and there are often several firewalls on the
path. I turn them off where circumstances allow.

~~~
csense
> most firewall configs are broken

If we just accept "networks are unreliable and sometimes broken" as a fact of
life, things will never get better. I applaud the unsung heroes who are
finding and fixing the actual root causes of lower layers of our networks.
Other important networking issues that come to mind are bufferbloat and IPv6
brokenness.

I'm sure it will be fixed if this stays on the front page for a while, so be
sure to upvote the article.

~~~
cube13
>If we just accept "networks are unreliable and sometimes broken" as a fact of
life, things will never get better. I applaud the unsung heroes who are
finding and fixing the actual root causes of lower layers of our networks.

As developers, this should always be a fact of life. True, we should strive
for making the networks perfect, but at the end of the day, these things still
need to be accounted for.

Because you don't have control over a client deciding that 50 cent network
cards are "good enough" for their deployment even though they've demanded five
9's uptime from your software.

Because you can't know when someone's going to spill beer on the switch.

Because someone's just going to pull the wrong cable.

Because the water company accidentally cut the line into the building.

~~~
magila
You're mostly talking about reliability at the physical layer, which as you
say is never going to be perfect. Csense is talking about reliability at the
link layer and above, which is infinitely more attainable.

~~~
cube13
That's kind of my point, though.

If you can't guarantee that every layer below you is absolutely reliable, then
you need to assume that everything below you might be broken, and that you
need to handle it. You can't start with the mindset that everything works
below you, and have everything above you also work fine.

The fact that we have people with the mindset that everything below them is
broken is the entire reason that these kinds of issues get detected and fixed.

~~~
xyzzy123
Yeah if you try and work around broken PMTUD though you're going to mess up
your application layer protocol SO bad...

EDIT: I _suppose_ you could rewrite TCP MSS on your own firewall or drop MTU
on all web servers.... but of course if you're going to reconfigure your
firewall / intefaces, you may as well just fix the problem - which was caused
by poor device configuration in the first place.

------
el_cuadrado
The shit is always broken, and always was. I understand some idealistic
network engineers may disagree, but this is a fact of life. Deal with it.

And this 'news' definitely do not deserve the frontpage of Y.

~~~
nuje
People often bungle their firewall rules because they don't know any better,
but Amazon continuing to willfully fuck up TCP for all of AWS is a pretty
large issue for the functioning of the net at large.

------
jrockway
What's with the comments on the article:

"johndurbinn • 29 minutes ago I'm bouncing on my toes wah me soopsoak dat hoe"

"Tony Stender • 35 minutes ago Fix this it needs word wrap and zoom
capabilities"

I'd downmod them but I'd have to create an account to do so.

------
kv3
It doesn't stop ssh or my web traffic. Why should I care?

~~~
keithwinstein
It absolutely does stop SSH or Web traffic, if the network path goes through a
link with MTU < 1500 and the connection comes in with MSS > PMTU - 40. But
only, as the post says, once you start sending a lot of data in one TCP
segment.

~~~
sillysaurus
I thought MTU of ~1500 was (realistically) the minimum nowadays?

~~~
gonzo
you thought wrong. RFC 791, p. 24, "Every internet module must be able to
forward a datagram of 68 octets without further fragmentation."

