Hacker News new | past | comments | ask | show | jobs | submit login
Solving a bad ARP behavior on a Linux router (dataswamp.org)
45 points by zdw on Aug 5, 2022 | hide | past | favorite | 17 comments

The more obvious (to me at least) way to solve this is to not mix broadcast domains/subnets on a switch without keeping them in separate VLANs. I would assume the router and the NAS box would support tagging/untagging, and then you could keep the ISP network on the default untagged vlan.

That's exactly the right solution to the issue. As it stands, any client can talk to the modem and mess up connectivity. And as it turns out, simple switches with VLAN capability aren't expensive either.

The magic of VLANs then allows putting everything on one physical interface for the router. This essentially turns the switch into a port multiplexer for the firewall (10GBe comes in handy there). Also useful in case the router is a VM, since only a single NIC needs to be passed to it.

It's not specified what subnet the switch on eth3 is on, but assuming they're on the 10.42.42 network, then I wonder why not just plug the router into the modem, then add a sufficiently large switch behind the router to hold everything on the 10.42.42 block? Unless I'm missing something.

Having one subnet per broadcast domain is nice, but not always achievable with the equipment at hand. I wonder if it would work better if you used a single interface on the Linux box for both subnets? That's how I've always run multiple subnets and it seems to work ok? Having two interfaces on the same broadcast domain that aren't combined into an aggregate interface seems to be asking for trouble anyway?

Although, if it's just the Linux box and the NAS on 10/8 (or whatever), you could connect linux:eth2 to nas, perhaps? There's a lot about this particular network that hasn't been described.

So, roughly speaking, many of these problems arise because Linux is a weak end system in an RFC1122 sense (https://www.rfc-editor.org/rfc/rfc1122.html). This is by design, not a bug, etc -- the "meaning" is roughly that to Linux, IP addresses belong to the kernel, not the interfaces themselves. This is the right way to think about the problem.

There are a number of features available to Make It Work The Way You Want but they're often nonobvious. As a sibling comment notes, you can set the arp_announce and arp_ignore (and arp_filter) sysctls to instruct the kernel to answer or not answer for arp requests based on interface specifics (the other arp_ sysctls are related to gratuitous arp sending and receipt).

As in this post, you can use the rp_filter sysctl to change the reverse path filtering (as in RFC3704).

Additionally, you can use iproute2 rules and route tables to enforce outbound paths if you want a node to have multiple interfaces on the same subnet without arp confusion.

I don't know if this is a typo in the article, but I didn't know that RP filtering would influence how ARP is working. I think it would be better to solve the problem at the root by asking Linux to be more like other OS with:

    sysctl -qw net.ipv4.conf.all.arp_announce=2
    sysctl -qw net.ipv4.conf.all.arp_ignore=1

There's also net.ipv4.conf.interface.arp_filter. I found it quite confusing to identify exactly which of all of these should be tweaked. I run arp_ignore=1 and arp_filter=1 on my router to avoid the behavior and it seems to work.

"How I learned to hack around my misuse of networking on my linux box"

A couple of weeks ago I had a customer router which wouldn't update its ARP table unless a system performed exactly the IPv4 Duplicate Address Detection "Probe" and "Announcement" steps defined in RFC-5227:


Even for failover of a cluster VIP, which is needlessly strict. That's the strangest ARP behaviour I've seen in a while.

Also, if you're on Azure, setup two VMs in the same subnet and watch ARP between them. They actually aren't in the same subnet and there's something doing Proxy ARP between them with a MAC like 01:23:45:67:89:AB or something like that. Cursed.

Azure network is junk. Try running traceroute from a VM running there - it will come back empty. They just silently drop ICMP "TTL expired" packets, for "security" reasons, no doubt.

Also, you can't spoof source IP there, even when talking between your VMs on a private subnet. So forget about running a router inside a VM. On AWS you can. [0]

[0] https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Ins...

> Azure network is junk.

We haven't even got to the joys of the load balancer yet!

Just log onto the VM console to troubleshoot. Oh, they don't have a console by design, okay then.

>So forget about running a router inside a VM

How does VyOS do it?


It seems to me that a more logical configuration is:

    modem <-> router/NAT <-> switch <-> (wifi and other stuff)
Though I understand that OP wanted to save money/cables/wall warts by using the modem's Wi-Fi AP and Ethernet switch.

I tend to agree that if you are trying to slice an Ethernet switch (or network) for two networks that you don't want to be bridged together, you should probably use VLANs.

OP network design is silly and probably unjustified. No wonder he's having problems.

Setting rp_filter=1 is very standard for linux routers and almost all distros set it by default. Both Debian and Red Hat set it by default. I am not sure if OP accidentally disabled this or if he is using some silly distro.

I once had rp_filter bite me in the ass by filtering out multicast packets from a different subnet.

I've run in to this before, and it's great to know the solution.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact