
How Do Routers Work, Really? - turingbook
https://kamila.is//teaching/how-routers-work/
======
geerlingguy
I learned how routers _really_ work from Ericsson's seminal video on the
matter, The Good Warriors of the Net:
[https://www.youtube.com/watch?v=x9XWxD6cJuY](https://www.youtube.com/watch?v=x9XWxD6cJuY)

Though I always thought the "router switch" was much more fun.

~~~
sgillen
Haha thanks for sharing. Interesting how much emphasis there is on "the ping
of death" compared to literally any other exploit. Does anyone know if this
was really such a big problem when this video came out?

~~~
schoen
What I remember is that the ping of death was extremely surprising in terms of
the number of OSes affected, the ease of exploiting it, and the super-
noticeable consequence of instantly crashing the target machine. And it came
out at a time when there wasn't as much vulnerability research and very few
extensively cross-platform vulnerabilities.

Also, with the ping of death, the only way to use it was to very noticeably
crash systems -- not to secretly build a botnet or something, as might have
been done with RCE vulnerabilities.

------
Cyph0n
> If that is the case, my condolences.

As a software engineer working on IOS-XR, that gave me a chuckle :p

In the case of enterprise- and SP-grade routers, the data-plane - i.e., where
the actual forwarding and lookups take place - runs entirely on a dedicated
network processor (NP), mainly for performance reasons. Information on the NP
is populated by the router's operating system in response to user
configuration, network topology changes, or protocol state updates. On the
other hand, the control plane runs mainly on the CPU(s). This is required so
that the protocols running on the router OS (e.g., BGP) can receive and send
out updates based on their state machines.

~~~
anotherkamila_
> As a software engineer working on IOS-XR, that gave me a chuckle :p

Good good :D

Thanks for the clear data plane / control plane explanation, that's a good way
to summarise the distinction. May I link to it from the article?

~~~
Cyph0n
Thanks! Sure, go ahead!

------
xg15
> _Note that the next hop’s IP address is in the router’s memory only: it does
> not appear in the packet at any time._

This clears some points that always puzzled me:

If the gateway is identified by an IP address, but the destination host is
also an IP address, which address exactly is put into the packet? And how can
a packet be routed if the gateway's IP is itself part of the subnet that's
supposed to be routed to it. (E.g. 192.168.0.0/24 with default gateway
192.168.0.1)

So the answer is, if I send the packet to host 1.1.1.1 but the routing table
has 2.2.2.2 as the next hop, the packet will have 1.1.1.1 as the destination
in the IP part but the _MAC of 2.2.2.2_ as destination of the Ethernet part
(or equivalent). It doesn't matter which subnet the next hop's IP is in, as
the routing table isn't consulted for it anyway - it's only used in ARP)

This leaves the question, why the indirection and why the mucking around with
ARP and IPs that are never used as the destination to anything?

Couldn't you simply put the next hop's MAC address (instead of IP address)
into the routing table and be able to route packets just as well, with a lot
less complexity?

~~~
jcrawfordor
To give a simplified but largely accurate summation: IP and Ethernet were each
designed in different time periods and largely without knowledge of the other.
Ethernet was historically used in such a fashion that multiple hosts (more
than 2) occupied the same collision domain, that is, they were physically
connected to the same cable, or through hubs that repeated frames to all
interfaces without routing. This means that Ethernet required an addressing
scheme so that hosts on the same media knew which frames were for them
(higher-level protocols at the time did not necessarily handle this).

Ethernet's addressing scheme was not designed to accommodate large
hierarchical networks and so is unsuitable for the IP use case, but more
importantly, IP was designed completely separately from Ethernet, and was not
used primarily with Ethernet until later, so IP could not "assume" that the
layer below it handled addressing (typically there was either no layer below
[point-to-point] or only a very simple one).

The result is that Ethernet and IP duplicate functionality to some extent. It
is theoretically possible, although not common, to build a network which uses
only layer 3 routing without any reliance on Ethernet addressing. A
significant reason this is rare, arguably _the_ most significant reason, is
that IP is now carried over Ethernet a significant majority of the time and L2
Ethernet devices (like switches) require the use of Ethernet addressing for
the network to function. You usually see "pure IP" in virtual networking
environments where the IP is encapsulated in, well, more IP, but even then
Ethernet frames are sometimes used because, well, just like network hardware,
operating system network stacks generally expect them (examine, e.g., the
linux bridge implementation). It is completely possible to build network
stacks and network appliances which do not require the use of Ethernet but it
is expensive and there's not much of a motivation to do so, and you'd run into
issues with any kind of equipment not so designed.

Addressing is not the only duplicate functionality between Ethernet and IP,
and it's one of the less significant ones since Ethernet addressing does
provide utility even if not strictly required. Ethernet frames are
checksummed, and IP headers are also checksummed, even though the Ethernet
checksum is already over them. The IP header checksum exists because IP was
historically carried over lower layers that did not provide integrity
checking. This is basically pure wasted space in typical networks, so IPv6
drops the header checksum to remove the overhead.

In general, though, network protocols tend to make more sense when you have
some awareness of the history of their development, as when you try to view
the modern internet as an elegant, monolithic design as some authors attempt,
a lot of things won't make sense because they simply are that way for historic
reasons. Ethernet and IP were each designed in the '70s, but separately, and
their use has accumulated significant cruft since then, including some radical
changes in the ways that they were used (for example the transition of
Ethernet from shared media to point-to-point, which occurred de facto earlier
but became largely formalized with the introduction of GbE which prohibits
more than two hosts in a collision domain, and of course ironically the
introduction of multiple hosts in a collision domain as an even larger issue
with wireless protocols, which requires additional handling below, or actually
in lieu of, the ethernet layer, 802.11 being a replacement for ethernet that
happens to behave similarly in many ways for compatibility).

Finally, the OSI model is something that tends to add complexity and confusion
to these discussions, which is why I doggedly discourage its use in teaching.
The OSI Model describes the OSI protocols, which were contemporaries
competitors to the TCP/IP protocols. Arguably, one of the reasons that the OSI
protocols fell out of use (in favor of IP) is exactly because they assumed
seven layers, and each was fairly complex. Some OSI protocols are still in
use, for example IS-IS (OSI layer 2) in the telecom industry and some backbone
IP transit, but in niches and generally being replaced with IP. IP is
intentionally simpler, and can be fully described using four layers, what's
usually referred to as the TCP/IP model.

The OSI layers do not map 1:1 to the TCP/IP layers, even if you simply ignore
the ones that map more poorly as instructors often do. Even worse, many
instructors and textbook authors feel such a strong compulsion to map modern
networks to the obsolete OSI model that they cram application-layer protocols
into OSI layers 5 and 6 in order to have examples of them. I have seen cases
as extreme as an instructor claiming that HTTP cookies represent the session
layer. This kind of thing is nonsense and hinders understanding rather than
contributing to it. If the OSI model is taught (not a bad idea at all as
students should realize that TCP/IP is merely the popular way, and certainly
not the only way), it should be taught specifically by contrasting it to the
different TCP/IP model. Unfortunately few instructors and website authors
today seem to even be aware that the OSI protocol stack existed separately
from IP.

And, if you are wondering, yes, Ethernet can be used in a switched network
completely independently from IP (although not really in a routed network
unless you are generous about how you define routing). This was more common
decades ago, the only equipment I have ever personally encountered that used
bare Ethernet was a very outdated CNC setup.

~~~
jwatzman
Along with the above fantastic comment, I found
[https://apenwarr.ca/log/20170810](https://apenwarr.ca/log/20170810) an
interesting (if inflammatory/divisive) essay on the subject and its history.

~~~
jcrawfordor
Yes, that essay is outstanding! I largely left out mention of IPv6 because
it's a whole different can of worms, but as that article presents, it aims to
make the situation radically simpler but in practice, well, doesn't. Cue the
XKCD about making a new standard.

A bit ago I touched on various competitors to IP on my blog-thing
([https://computer.rip/](https://computer.rip/)) but I need to find time to
give the topic a more thorough treatment. As with a lot of fields, you can
probably learn more about what really matters in networking by studying the
protocols that didn't make it than by studying the ones that did. It's hard
for most people that entered the computing field in the last couple of decades
to imagine IP and TCP/UDP not being the clearly correct design, but in the
'80s to early '90s the expansion of microcomputers was accompanied by a
flourishing of network protocols for use with them. There are multiple reasons
that TCP/IP over Ethernet eventually became dominant but in the end it's
mostly happenstance, it's pretty easy to imagine XNS becoming the norm if
ARPANET had gone a little differently. Imagine the problems we'd be talking
about today in that parallel universe, XNSv6 adoption is such a mess.

I'm honestly a bit sad to see the "all-IP" trend working its way through the
telecom industry. It's reducing use of protocols like MPLS that I think are
very cool. But now software-defined networking brings a whole new world of
strange network technologies that we'll find ill-advised in fifty years.

------
anotherkamila_
Hi, I'm the author. Uh hi w00t how why what's it doing here?! :D

I promise to make it better and actually finish it now! Check back in a day or
two I guess? Also I should post the code I promised. Hello from the ADHD
squirrel!

~~~
anotherkamila_
Also thanks a ton for your suggestions, I really appreciate them!

------
pfarrell
I would suggest expanding your terminology section. I know almost nothing
about routers and I'm lost in the first sentence of the High Level Overview
section.

    
    
      "A switch (or an L2 switch :-) ) is an L2-only thing."
    

I don't know what L2 means. I suspect a definition of the various levels would
expand the audience for this post.

~~~
AlphaSite
I think you need to know your audience and cater to them, trying to explain
everything just ends in a book. L2 is especially googleable.

~~~
hinkley
To be fair, L2 could be Layer 2 or Level 2 (cache) and it might be a crapshoot
what you get. You might get confused trying to answer your own questions.

Discoverability lives in the space between overexplaining and underexplaining.

~~~
dreamcompiler
In a networking discussion, L2 always means Layer 2. If the subject of caching
came up the author would say "I'm talking about L2 cache here."

It's like TTL. It means one thing in a networking context but something
totally different in a digital logic context.

But granted, somebody with no networking background wouldn't necessarily know
that.

------
icedchai
Maybe a mention of other, non-ethernet, links. Serial PPP? Frame Relay? I
realize these are mostly historical curiosities these days, but it might help
to enforce the differences between L2 and L3.

When I first started working with routers, over 25 years ago, it was all
ethernet LAN to serial WAN, usually point-to-point T1 or frame relay. On site
had a _dual_ T1, load balanced on both ports of a Cisco 2501. Fun times.

------
rabuse
I learned a lot about networking when setting up servers in racks. Had to deal
with issues arising from terrible UI's on a lot of the routers out there, so I
just kept digging deeper and deeper into how it all works. Also, if more are
looking into how packets are actually routed, look into BGP, and how CDN's
work. Great stuff.

~~~
walshemj
I would start with how internal routing works before starting on WAN routing.

Id look at the cisco press and CCNA training materials

------
boryas
I believe this piece does a good job with forwarding, but would be improved by
a discussion of termination.

Routing is only triggered when the packet is L2 terminated: the destination
MAC of the packet is one of the router's own MACs.

If the packet's destination MAC does not belong to the router, it doesn't
matter what is in its IP header, it will be switched in the LAN it came in on.

This design also generalizes nicely to the case when the destination IP of a
routed packet is one of the router's IPs.

~~~
anotherkamila_
Good point. Incorporating that would require more brain that I have right now
(bad timezone :D), but you're right, I completely left that out. May I update
the article with a link to this comment?

~~~
boryas
sure!

------
teleforce
I teach computer networking class with lab using Linux Switch Appliance (LISA)
and Quagga router (based on Zebra) on embedded computer running x86 CPU with
multi-port Ethernet. The embedded router need to be dual-boot for its specific
function because LISA is based on custom Linux kernel but Quagga is just using
normal/vanilla kernel.

I am looking for a "layer 3 switch" than has switching and routing
functionalities without rebooting. If anyone know any software based open
source solution for this it will be very helpful. Preferably with Cisco IOS
like user command interface but it is optional but not mandatory.

Based on the article, it is explaining router internal based on P4. Perhaps I
should try to use P4 for the above mentioned requirements?

~~~
wmf
VyOS supports bridging and routing although the config is more like a Linux
host and unlike a real Cisco/Arista switch.

~~~
snuxoll
The Vyatta/VyOS/EdgeOS CLI took heavy inspiration from Juniper’s JunOS, so
saying the config is unlike a “real” switch is factually incorrect.

It’s still a little odd, but as somebody quite comfortable with JunOS (I run
Juniper switches in my homelab) it’s pretty easy to pick up any of the Vyatta
forks and hit the ground running.

------
bogomipz
>"It needs to be routed: the router, based on L3 information, decides where it
needs to go ,in L3 speak – it will decide which host to send it to, but not
how. This corresponds to the routing table (or FIB)."

This is not correct. The FIB(forwarding information base) is concerned with
layer 2. The RIB(routing information base) determines the next hop. The RIB is
what is used to populate entries in the FIB with the correct outgoing
interface. These two terms are basic router terms. It was kind of surprising
to see this statement in a post titled "How Do Routers Work, Really?"

~~~
anotherkamila_
You're right, I noticed it about an hour ago -- no idea what was going on in
my head then :-/ Fixed already. Thank you!

------
wbsun
Click is a very good software router to read and learn:
[https://github.com/kohler/click](https://github.com/kohler/click)

It can be more than a router though.

------
dnautics
this is great if for no other reason that in section 1 it explains the
difference between a switch and a router (which took me a decade? to really
understand). I really wish someone could have laid it out clearly for me.

------
mrburton
I just have to say this "magnets how do they work"? ;) Anyone get the
reference?

