
How 1500 bytes became the MTU of the internet - petercooper
https://blog.benjojo.co.uk/post/why-is-ethernet-mtu-1500
======
MrLeap
For.. reasons, I found myself having to make a 'driver' for a PoE+ sensing
device this month. The manufacturer had an SDK, but compiling it requires an
old version of Visual Studio - a bouquet of dependencies, and it had no OSX
support. None of the bundled applications would do what I needed (namely, let
me forward the raw sensing data to another application.. _SOMEHOW_ ).

The data isn't encoded in the usual ways, so even 4 hours of begging FFMPEG
were to no avail.

A few glances at wireshark payloads, the roughly translated documentation, and
weighing my options, I embarked on a harrowing journey to synthesize the
correct incantation of bytes to get the device to give me what I needed.

I've never worked with RTP/RTSP prior to this -- and I was disheartened to see
nodejs didn't have any nice libraries for them. Oh well, it's just udp when it
comes down to it, right?

SO MY NAIVETE BEGOT A JOURNEY INTO THE DARKNESS. Being a bit of an unknown-
unknown, this project did _not_ budget time for the effort this relatively
impromptu initiative required. An element of sentimentality for the customer,
and perhaps delusions of grandeur, I convinced myself I could just crunch it
out in a few days.

A blur of coffee and 7 days straight crunch later, I built a daisy chain of
crazy that achieved the goal I set out for. I read rfc3550 so many times I
nearly have it committed to memory. The final task was to figure out how to
forward the stream I had ensorcelled to another application. UDP seemed like
the "right" choice, if I could preserve the heavy lifting I had accomplished
to reassemble the frames of data.. MTU sizes are not big enough to accomodate
this (hence probably why the device uses RTP, LOL.). OSX supports some
hilariously massive MTU's (It's been a few days, but I want to say something
like 13,000 bytes?) Still, I'd have to chunk and reasemble each frame into
quarters. Having to write _additional_ client logic to handle drops and OOO
and relying on OSX's embiggened MTU's when I wanted this to be relatively OS
independent... and the SHIP OR DIE pressure from above made me do bad. At this
point, I was so crunched out that the idea of writing reconnect logic and
doing it with TCP was painful so I'm here to confess... I did bad...

The client application spawns a webserver, and the clients poll via HTTP at
about 30HZ. Ahhh it's gross...

I'm basically adrift on a misery raft of my own manufacture. Maybe protobufs
would be better? I've slept enough nights to take a melon baller to the bad
parts..

~~~
sneak
What does it sense that changes >=30 times a second?

~~~
jsight
I was curious about that to. Lots of references to video related standards
that imply its a PoE camera, but then why isn't the data encoded in the usual
ways? What does that mean?

~~~
MrLeap
What codec would you use for a camera that captures not RGB, but poetry of the
soul?

CONTEXTLESS, HEADERLESS, ENDLESS BYTE STREAMS OF COURSE, where the literal,
idealized (remember udp) position of each byte is part of a vector in a non
euclidean coordinate system.

~~~
cfallin
> What codec would you use for a camera that captures not RGB, but poetry of
> the soul?

I would love to read a collaborative work between you and James Mickens --
this genre of writing seems sadly under-present in the computing world...

~~~
MrLeap
I appreciate the interest in listening to a simulcast of Harvard-Professor-
Collaborates-With-A-Nobody-Hobo. I'll forward this to my agent.

My agent is a tin can. I think she used to hold beans. Sometimes I put a few
smashed nickels in her and rattle. While I do this, I pretend she's reading me
my messages, and I'm like "oh no, I would never consent to a biopic directed
by THAT charlatan." and then we laugh and laugh.

Oh how we laugh.

------
mhandley
For 802.11, the biggest overhead is not packet headers but the randomized
medium aquisition time so as to minimize collisions. 1500 bytes is way too
small here with modern 802.11, so if you only send one packet for each medium
aquisition, you end up with something upwards of 90% overhead. The solution
802.11n and later uses here is to use Aggregate MPDUs (AMPDUs). For each
medium aquisition, the sender can send multiple packets in a contiguous burst,
up to 64 KBytes. This ends up adding a lot of mechanism, including a sliding
window block ack, and it impacts queuing disciplines, rate adaptation and
pretty much everything else. Life would be so much simpler if the MTU had
simply grown over time in proportion to link speeds.

~~~
wtallis
> Life would be so much simpler if the MTU had simply grown over time in
> proportion to link speeds.

The problem is that the world went wireless, so _maximum_ link speeds grew a
lot but _minimum_ link speeds are still relatively low. A single 64kB packet
tying up a link for multiple milliseconds—unconditionally delaying everything
else in the queue by at least that much—is not what we want.

~~~
inetknght
> _The problem is that the world went wireless, so maximum link speeds grew a
> lot but minimum link speeds are still relatively low._

I would argue: the problem is that the MTU isn't negotiated at all, but
especially not based on link availability.

~~~
snuxoll
IPv6 tries to solve this with path MTU discovery.

~~~
inetknght
Yes, but IPv6 is still at a higher level than Ethernet, Wifi, et al and is
therefore subject to the limitations of the lower level framing

~~~
jandrese
Sure, I mean that's what pMTUd is all about. One big difference with IPv6:
Routers can't fragment packets. They either send or they don't.

~~~
pantalaimon
I thought so too, but apparently there is an IPv6 fragmentation extension and
it's implemented by several operating systems.

~~~
jandrese
Only the endpoints can fragment.

------
mertenVan
Software developer talking confidently about electrical engineering issues he
knows nothing about. How cute. /s

All Ethernet adapters since the first Alto card had self-clocking data
recovery [1].

Clock accuracy was never a problem, as long as it was withing the acceptable
range required for PLL lock/track loop.

The reason for 1500 MTU is that for packet-based systems, you don't want
infinitely large packets. You want _small_ packets. but large enough so that
packet overhead is insignificant, which in engineering terms means less than
2%-5% overhead. Thus 1500 max packet size. Everything above that just makes
switching and buffering needlessly expensive, SRAM was hella expensive back
then. Still is today (in terms of silicon area).

Look at all the memory chips on Xerox Alto's Ethernet board (below) - memory
chips were already taking ~50% of the board area!

[1] Schematic of the original Alto Ethernet card clock recovery circuit:
[https://www.righto.com/2017/11/fixing-ethernet-board-from-
vi...](https://www.righto.com/2017/11/fixing-ethernet-board-from-vintage.html)

EDIT: Lol! Author has completely replaced erroneous explanation with correct
explanation, including link to seminal paper about packet switching. Good.

~~~
mlyle
> How cute. /s

> EDIT: Lol! Author has completely replaced erroneous explanation with correct
> explanation, including link to seminal paper about packet switching. Good.

Don't be a jerk. Being right doesn't give you the right to make fun of people.

~~~
mertenVan
Noted. I find the over-confidence over a completely imagined issue funny and
interesting. I make that mistake too. It's always interesting to do a post-
mortem: why was I so confident? how did I miss the correct answer? I respect
the author for doing such a fast turn around :)

~~~
hackmiester
Good to know that your intent wasn't malicious, but fwiw, I also didn't find
the tone particularly appropriate for HN, either.

~~~
contingencies
The hardware world is full of this.

~~~
generatorguy
I think because in the hardware world the cost for being wrong is so much
higher since you can’t push out an over the air update or update your saas or
whatever. So if you don’t know you stay quiet instead of being wrong.

------
gugagore
It would be nice to corroborate this reason with another source, because my
understanding is that clock synchronization was not a factor in determining
the MTU, which seems really more like a OSI layer 2/3 consideration.

I am surprised the PLLs could not maintain the correct clocking signal, since
the signal encodings for early ethernet were "self-clocking" [1,2,3] (so even
if you transmitted all 0s or all 1s, you'd still see plenty of transitions on
the wire).

Note that this is different from, for example, the color burst at the
beginning of each line in color analog TV transmission [4]. It is also used to
"train" a PLL, which is used to demodulate the color signal transmission.
After the color burst is over, the PLL has nothing to synchronize to. But the
10base2/5/etc have a carrier throughout the entire transmission.

[1]
[https://en.wikipedia.org/wiki/Ethernet_physical_layer#Early_...](https://en.wikipedia.org/wiki/Ethernet_physical_layer#Early_implementations)

[2]
[https://en.wikipedia.org/wiki/10BASE2#Signal_encoding](https://en.wikipedia.org/wiki/10BASE2#Signal_encoding)

[3]
[http://www.aholme.co.uk/Ethernet/EthernetRx.htm](http://www.aholme.co.uk/Ethernet/EthernetRx.htm)

[4]
[https://en.wikipedia.org/wiki/Colorburst](https://en.wikipedia.org/wiki/Colorburst)

~~~
stripline
I also don't believe this is the reason. Early Ethernet physical standards
used Manchester encoding to recover the data clock.

~~~
peteri
I would agree given I worked on an Ethernet chipset back in 1988/9 keeping the
PLL synched was not a problem. I can't remember what the maximum packet size
we supported was (my guess is 2048) but that was more of a buffering to SRAM
and needing more space for counters.

The datasheet for the NS8391 has no such requirement for PLL sync.

[https://archive.org/details/bitsavers_nationaldaDataCommunic...](https://archive.org/details/bitsavers_nationaldaDataCommunicationsLANUARTsHandbook_53722793/page/n73/mode/2up)

------
jleahy
As others have said, with Manchester encoding 10BASE2 is self-clocking, you
can use the data to keep your PLL locked, just as you would on modern ethernet
standards. However I imagine with these standards you may not even have needed
an expensive/power-hungry PLL, probably you could just multi-sample at a
higher clock rate like a UART did (I don't actually know how this silicon was
designed in practice).

Futher PLLs have not got a lot better, but a lot worse. Maybe back when
10BASE2 was introduced you could train a PLL on 16 transitions and then have
acquired lock but there's no way you can do that anymore (at modern data
rates). PCI express takes thousands of transitions to exit L0s->L0, which is
all to allow for PLL lock.

My best guess for the 1500 number is that with a 200ppm clock difference
between the sender and receiver (the maximum allowed by the spec, which says
your clock must be +-100ppm) then after 1500 bytes you have slipped 0.3 bytes.
You don't want to slip more than half a byte during a packet as it may result
in duplicated or skipped byte in your system clock domain. (200 _1e-6)_
1500=0.3.

~~~
Unklejoe
I thought most Ethernet PHYs don't lock actually to the clock, but instead use
a FIFO that starts draining once it's half way full. The size of this FIFO is
such that it doesn't under or overflow given the largest frame size and worst
case 200 PPM difference.

I figured this is what the interframe gap is for - to allow the FIFO to
completely drain.

~~~
saber6
IFP is really more to let the receiver knows where one stream of bits stop and
the next stream of bits start. How they handle the incoming spray of data is
up to them on a queue/implementation level.

------
Animats
The original MTU was 576 bytes, enough for 512 bytes of payload plus 64 bytes
for the IP and TCP header with a few options. 1500 bytes is a Berkeleyism,
because their TCP was originally Ethernet-only.

~~~
wmf
Yeah, didn't T1 and ISDN use 576 to limit serialization delay and jitter? The
backbone probably switched to 1500 when OC-3 was adopted.

~~~
tssva
The default MTU for a T1/E1 was usually 1500. The default for HSSI was 4470
which meant the default for DS3 circuits was 4470. This was also the usual
default MTU for IP over ATM which is what most OC-3 circuits would have been
using when they were initially rolled out for backbone use. This remained the
usual default MTU all the way through OC-192 circuits running packet over
sonnet.

I left the lSP backbone and large enterprise WAN field around that time and
can't speak to more recent technologies.

------
willis936
IEEE 802 history is disappearing without a trace? Afaik it’s pretty well
documented, you just need to be a member for some of the stuff.

[http://www.ieee802.org/](http://www.ieee802.org/)

I feel like the last piece we’re missing in this story is the performance
impact of fragmentation. Like why not just set all new hardware to an MTU of
9000 and wait ten years?

~~~
cesarb
> Like why not just set all new hardware to an MTU of 9000 and wait ten years?

The hardware in question is Ethernet NICs. However, for you to set the MTU on
an Ethernet NIC to 9000, _every_ device on the same Ethernet network (at least
the same Ethernet VLAN), including all other NICs and switches, including ones
which aren't connected yet, must also support and be configured for that MTU.
And this also means you cannot use WiFi on that Ethernet network (since, at
least last time I looked, WiFi cannot use a MTU that large).

~~~
willis936
Sending a jumbo frame down a line that has hardware that doesn’t support jumbo
frames somewhere along the way does not mean the packet gets dropped. The NIC
that would send the jumbo frame fragments the packet down to the lower MTU. So
what’s the performance impact of that fragmentation? If it isn’t higher than
the difference in bandwidth overhead from headers of 9000 MTU traffic vs. 1500
MTU traffic then why not transition to 9000 MTU?

~~~
sathackr
But how does the NIC know that, 11 hops away, there is a layer 2 device, which
cannot communicate with the NIC(switches do not typically have the ability to
communicate directly with the devices generating the packets), that only
supports a 1500 byte frame?

Now you need Path MTU discovery, which as the article indicates, has its own
set of issues. (Overhead from trial and error, ICMP being blocked due to
security concerns, etc...)

~~~
wbl
If you block ICMP you deserve what you get. Don't do this. (Edit: don't block
ICMP)

~~~
oarsinsync
So now you're trying to communicate from your home machine to some random host
on the internet (website, VPS, streaming service), and you're configured for
MTU 9000, the remote service is also configured for MTU 9000, but some transit
provider in the middle is not, and they've disabled ICMP for $reasons.

They blocked ICMP, do you deserve what you get?

~~~
wbl
Transit providers should push packets and generally do. With PMTU failures
it's usually clueless network admins on firewalls nearer endpoints. And no,
you don't and I wish the admin responsible could feel your pain.

~~~
oarsinsync
> Transit providers should

Agreed

> and generally do

Agreed.

Now if you can make it 'will always just push packets', we'll be golden.

Unfortunately, there are enough ATM/MPLS/SONET/etc networks being run by
people who no longer understand what they're doing, that we're never going to
get there.

To make matters more entertaining, IPv6 depends on icmp6 even more.

------
smoyer
The article talks about how the 1500 byte MTU came about but doesn't mention
that the problem of clock recovery was solved by using 4b/5b or 8b/10b
encoding when sending Ethernet through twisted-pair wiring. This encoding
technique also provides a neutral voltage bias.

EDIT: As pointed out below, I failed to account for the clock-rate being 25%
faster than the bit-rate in my original assertion that Ethernet over twisted-
pair was only 80% efficient due to the encoding (see below)

~~~
Unklejoe
> Ethernet through twisted-pair wiring only provides 80% of the listed bit-
> rate

Actually, they already accommodated for this in the advertised speed.

In other words, a 1 GbE SerDes runs at 1.250 Gbit/s, so you end up with an
actual 1 Gbit/s bandwidth.

The reason you don't actually hit 1 Gbit/s in practice is due to other
overheads such as the interframe gaps, preambles, FCS, etc.

~~~
smoyer
You're absolutely correct ... it's been a long time since I was designing
fiber transceivers but I should have remembered this. Ultimately efficiency is
also affected by other layers of the protocol stack too (UDP versus TCP
headers) which also explains why larger frames can be more efficient. In the
early days of RTP and RTSP, there were many discussions about frame size, how
it affected contention and prioritization and whether it actually helped to
have super-frames if the intermediate networks were splitting and combining
the frames anyway.

------
rcarmo
I used to glue stuff together to FDDI rings and Token Ring networks back in
the day (I used Xylan switches, which had ATM-25 UTP line cards among other
long-forgotten oddities), and MTU sizes always struck me as being particularly
arbitrary.

But I'm not really sure about the clock sync limitations being a factor here.
It was way back in the deepest past.

What I do remember vividly is the mess that physical layer networking evolved
into over the years thanks to dial-up and DSL (ever had to set your MTU to
1492 to accommodate an extra PPP header?).

And something is obviously wrong today, since we're still using the same
baseline value for our gigabit fiber to the home connections, our 3/4/5G
(scratch to taste) mobile phones, etc.

~~~
neurostimulant
> ever had to set your MTU to 1492 to accommodate an extra PPP header?

Ah, I was always wondering why my ISP configured my fiber modem's mtu to 1492.
So it's due to using PPPoE? Is there no way to use bigger mtu when using
PPPoE?

~~~
toast0
Nowadays, there's PPPoA (over ATM) which wraps at a lower level, and allows
1500 byte ethernet payloads through. But running the ethernet over ATM at 1508
MTU so that PPPoE would be 1500 was probably out of reach --- when PPPoE was
introduced, the customer endpoint was often the customer PC, and some of those
were using fairly old nics that might not have supported larger packets.

Sadly, smaller than 1500 byte MTUs still cause issues for some people to this
day. It's all fine if everything is properly configured, or if at least
everything sends and receives ICMP, but if something is silently dropping
packets, you're in for a bad day. These days, I think it's usually problems
with customers sending large packets, as opposed to early days where receiving
large packets would routinely fail, but a lot of that is because large sites
gave up on sending large packets.

~~~
rcarmo
Yes, PPPoA was also a thing I dealt with, and another source of irritating MTU
issues.

------
throw0101a
Unreliable IP fragmentation, and the brokenness of Path MTU Discovery (PMTUD),
is causing the DNS folks to put a clamp on the size of (E)DNS message size:

* [https://dnsflagday.net/2020/](https://dnsflagday.net/2020/)

------
thehappypm
When networks were new, computers connected to each other using a shared trunk
that you _physically_ drilled into. It's a non-trivial problem to send data
over a shared channel; it's very easy for two systems to clobber each other. A
primitive, but somewhat effective mechanism is ALOHA
([https://en.wikipedia.org/wiki/ALOHAnet](https://en.wikipedia.org/wiki/ALOHAnet)),
where multiple senders randomly try to send their message to a single
receiver. The single receiver then repeats back any messages it successfully
receives. In that way the sender is able to confirm its message got through --
an ack. After a certain amount of time with no ack, senders repeat their
messages. As you can imagine, shorter packets are less likely to cause
collisions.

Ethernet uses something similar, but is able to detect if someone else is
using the wire, called carrier sense. A short packet of 1500 bytes reduced the
likelihood of collisions.

~~~
blitmap
Does multiplexing over Ethernet exist?

~~~
5436436347
Not anymore for all practical purposes, but it once did for the very old
10Base-2 standard for Ethernet over coaxial cable. This is practically why the
old MII Ethernet PHY interface protocol had the collision-sense lines to
indicate to the MAC to stop sending data if it detects incoming data, in
attempts to minimize collisions.

[https://en.wikipedia.org/wiki/10BASE2](https://en.wikipedia.org/wiki/10BASE2)

~~~
blitmap
This is very cool history, and something I never would have stumbled upon
myself. Thank you for sharing! :-)

------
anonymousiam
Minor factoid the article does not mention. ATM is an alternative to Ethernet
that's used in many optical fiber environments. The "transfer unit" size of
the ATM "cell" is 53 bytes (5 for the header and 48 for the payload). This is
much smaller than 1500.

Another quirky story from the past: Sometime around 20 years ago I was having
a bizarre networking problem. I could telnet into a host with no trouble, and
the interactive session would be going just fine until I did something that
produced a large volume of output (such as 'cat' on a large file). At that
point the session would freeze and I would eventually get disconnected. After
troubleshooting for a while I identified the problem as one of the Ethernet
NICs on the client host. It was a premium NIC (3Com 3C509). Nonetheless, the
NIC crystal oscillator frequency had drifted sufficiently that it would lose
clock synchronization to the incoming frame if the MTU was larger than about
1000.

~~~
TomVDB
Speaking about ATM: the 48 byte payload was a standardization compromise
between Europe and the US.

US companies had prototypes using 64 bytes, while European companies used 32
bytes. To avoid anyone giving a competitive advantage, they decided on a
middle ground of 48.

There were trade-offs between 32 and 64 bytes as well: a 32 byte payload had a
higher overhead than a 64 byte payload, but it had a shorter transmission time
which made it easier to do voice echo cancellation.

Or so I was told many decades ago when I got introduced to ATM systems...

------
2rsf
I remembered something different related to shared medium and CMSA/CD where
1500 ensured fairness, and the minimum of 46 related to propagation time in
the longest allowable cable

More at:

[https://networkengineering.stackexchange.com/questions/2962/...](https://networkengineering.stackexchange.com/questions/2962/why-
was-the-mtu-size-for-ethernet-frames-calculated-as-1500-bytes)

------
alexforencich
I think the author may have made a mistake in some of the math. The frame size
distribution plots are likely based on the number of frames, not the amount of
data contained in said frames. The 1500 byte and other large frames should
therefore account for the lion's share of the actual data transferred.
Correcting this error will totally change the final two graphs.

~~~
labawi
Yes. But only the "AMS-IX traffic by packet size range" graph is wildly
inaccurate. Ethernet frame overhead is per-packet and presumably right.

~~~
alexforencich
Ah yeah, that's probably true. According to some back of the envelope math, it
seems like the distribution should be more like 5%, 1%, 1%, 3%, 50%, 39%,
ignoring the first and last size bins.

------
tartoran
I find that technology cements in strata (the archaeology term) just as the
layers that accumulate as the result of natural processes and human activity.
The dynamics are not exactly the same but the tendency is similar. I wonder
whether we'll always be capable of digging down deeper to the beginnings as
things get more and more complicated.

------
IshKebab
I don't think the Ethernet Frame Overhead graph is correct. Surely the
overhead is proportionally higher, per amount of data, for smaller packets.
That graph shows that the overhead is just proportional to the amount of data
sent, irrespective of the packet size, which can't be right.

~~~
alexforencich
The graph above that one is totally wrong. The frame overhead graph may be
correct, though.

------
trixie_
Kind of expected an article titled 'How 1500 bytes became the MTU of the
internet' to tell us how 1500 bytes became the MTU of the internet.

Even I could of told you, 'the engineers at the time picked 1500 bytes'.

------
russfink
IIRC it was called “thinnet” (10B2). I loved the vampire taps on thick net.

------
franga2000
Not my field, so I might be making an obvious error here, but:

If there are efficiency gains to be had from using jumbo frames, wouldn't
setting my MTU to a multiple a 1500 still be of some benefit? If my PC, my
switch and my router all support it, that would still be a tiiiny improvement.
If the server's network does as well and let's say both of our direct
providers, even if none of the exchanges or backbones in between do, that
would still be an efficiency gain for ~10% of the link, right?

~~~
benjojo12
Locally you can set your MTU to larger than 1500, but if you (generally) try
and send a packet towards the internet larger than 1500 it will be dropped
without a trace, or it will be dropped and an ICMP message will be generated
to tell your system to lower the MTU. Assuming you have not firewalled off
ICMP ;)

As a handy feature on Linux at least, you can set your MTU to 9000 locally,
and then set the default (internet generally) route to have a MTU of 1500 to
prevent issues:

ip route add 0.0.0.0/0 via 10.11.11.1 mtu 1500

~~~
luma
Over-sized packets can (and generally will) be fragmented by your router. It
shouldn't be dropped unless you've intentionally set DNF.

~~~
benjojo12
Fragments are very hit or miss on the internet,
[https://blog.cloudflare.com/ip-fragmentation-is-
broken/](https://blog.cloudflare.com/ip-fragmentation-is-broken/)

------
afandian
Off-topic but looking at that old network card picture reminded me of a very
vague memory of more than one card with a component that looked like a
capacitor, except it looked cracked.

Is my mind playing tricks? Were they faulty units or was there meant to be a
crack?

This picture could be the same thing:

[https://www.vogonswiki.com/images/3/37/Viglen_Ethergen_PnP_2...](https://www.vogonswiki.com/images/3/37/Viglen_Ethergen_PnP_2000A.jpg)

~~~
gerdesj
Old network card eh? Its a 3Com 3C509:

[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/3com?h=v5.6-rc2)

I still have a load of them gathering dust somewhere. However a system with
ISA on it is a bit rare now and I'm not sure I can be bothered to compile a
modern kernel small enough to boot on one. Besides, it will probably need
cross compiling on something with some grunt that has heard of the G unit
prefix.

~~~
rasz
For Old network card try 3C500, or something made by Excelan ;-)

[http://www.os2museum.com/wp/emulating-
etherlink/](http://www.os2museum.com/wp/emulating-etherlink/)

------
_bxg1
So they just picked an arbitrary number that felt right? I expected the story
to be more interesting than that, given the title. Still, there was some
interesting trivia surrounding the core question.

Reminds me of the IPv6 adoption problem:
[https://news.ycombinator.com/item?id=14986324](https://news.ycombinator.com/item?id=14986324)

------
hinkley
> If we look at data from a major internet traffic exchange point (AMS-IX), we
> see that at least 20% of packets transiting the exchange are the maximum
> size.

He’s so optimistic. My brain heard this as “ _only_ 20% of packets […] are the
maximum size”

What are all of those 64 byte packets? Interactive shells, or some other low
bitrate protocol?

~~~
wmf
Probably mostly ACKs.

~~~
hinkley
Well now I feel dumb.

------
zamadatix
I've always wondered how 9000 became "jumbo". Technically anything over 1500
is consider ju!no and there is no standard. The largest I've seen is 16k. I
think there are some crc accuracy concerns at larger sizes but 9k still seems
quite arbitrary for computer land.

~~~
cesarb
The explanation according to
[https://web.archive.org/web/20010221204734/http://sd.wareone...](https://web.archive.org/web/20010221204734/http://sd.wareonearth.com/~phil/jumbo.html)
is: "First because ethernet uses a 32 bit CRC that loses its effectiveness
above about 12000 bytes. And secondly, 9000 was large enough to carry an 8 KB
application datagram (e.g. NFS) plus packet header overhead."

That is, 9000 is the first multiple of 1500 which can carry an 8192-byte NFS
packet (plus headers), while still being small enough that the Ethernet CRC
has a good probability to detect errors.

------
gargs
This reminds me of various Windows applications back in the day (Windows 3.1
and 95) that claimed to fine tune your connection and one of the tricks they
used was changing the MTU setting, as far as I can recall. Could anyone share
how that worked?

~~~
ndespres
If your computer sends a larger MTU than the next device upstream can handle,
the packets will be fragmented leading to increased CPU usage, increased work
by the driver, higher I/O on the network interface, higher CPU load on your
router or modem, etc depending on where the bottleneck is. For example if you
connect over Ethernet to a DSL modem, or to a router that has a DSL uplink,
all your packets will be fragmented. This is because DSL uses 8 bytes per
packet for PPPoE authentication. So if you send a 1500 byte packet to the
modem, it will get broken up by the modem into 2 packets: one is 1492+8 bytes,
and the other is 8+8 bytes.

But your PC is still sending more packets.. the modem is struggling to
fragment them all and send them upstream.. its memory buffer is filling up..
your computer is retrying packets that it never got a response on..

By lowering your computer MTU to 1492 to start with, you avoid the extra work
by the modem, which can have considerable speed increase.

------
fireattack
Probably a dumb question, why the maximum size (and the one has most of
packages) in the AMS-IX graph 1514 bytes instead of 1500 bytes that got
discussed in the article?

~~~
ra1n85
1500 bytes is the MTU of IP, in most cases. It often excludes the Ethernet
header, which is 14 bytes excluding the FCS, preamble, IFG, and any VLANs.

If have a 1500 byte MTU for IP, then we need at least a 1514 byte MTU for IP +
Ethernet. We often call the > 1514B MTU the "interface MTU". It's
unnecessarily confusing.

------
bjornsing
Actually, MTUs below 1500 bytes are pretty common, e.g. with PPP over Ethernet
or other forms of encapsulation/tunneling.

------
CGamesPlay
I think you’re saying that the smallest bucket of packets are all packets that
would have been combined with a larger packet of that had been an option...
but that doesn’t make sense. That class of packets includes TCP SYN, ACK, RST,
and 128 bytes could fit an entire instant message on many protocols.

------
leroman
Looks like a ripe low-hanging fruit for SpaceX Starlink to pick..

~~~
leroman
Why the downvote? possibly facilitating the end-to-end transport will allow
them to offer jambo packets

~~~
ekimekim
This would only be possible if you were talking from a jumbo-configured client
(let's say you've set up your laptop correctly), across a jumbo-configured
network (Starlink, in your scenario), to a jumbo-configured server (here's the
problem).

The problem is that Starlink only controls the steps from your router to "the
internet". If you're trying to talk to spacex.com it'd be possible, but if
you're trying to talk to google.com then now you need Starlink to be peering
with ISPs that have jumbo frames, and they need to peer with ISPs with jumbo
frames, etc etc and then also google's servers need to support jumbo frames.

Basically, the problem is that Starlink is not actually end to end, if you're
trying to reach arbitrary servers on the internet. It just connects you to the
rest of the internet, and you're back to where you started.

This is also true for any other ISP, Starlink is not special in this regard.

~~~
Avamander
True, you'd expect endpoints to support Jumbo Frames as well, but why not
start at least making it possible. It's a dead loop otherwise. IPv6 was the
same at start.

------
tambourine_man
Nah, it’s 1492 forever!

~~~
dredmorbius
Found the ADSL user.

