Solving a bad ARP behavior on a Linux router

Comment on Mastodon

Introduction

So, I recently switched my home router to Linux but had a network issues for devices that would get/renew their IP with DHCP. They were obtaining an IP, but they couldn't reach the router before a while (between 5 seconds to a few minutes), which was very annoying and unreliable.

After spending some time with tcpdump on multiple devices, I found the issue, it was related to ARP (the protocol to discover MAC addresses associate them with IPs).

Wikipedia page about the ARP protocol

The arp flux problem explained

My setup

I have an unusual network setup at home as I use my ISP router for Wi-Fi, switch and as a modem, the issue here is that there are two subnets on its switch.


      +------------------+                                +-----------------+
      | ISP MODEM        | ethernet #1         ethernet #1|                 |
      |                  |<------------------------------>|                 |
      |                  | 192.168.1.254     192.168.1.111|                 |
      |                  |                                |  linux router   |
      |                  |                                |                 |
      |                  | ethernet #2         ethernet #2|                 |
      |                  |<------------------------------>|                 |
      |                  |                    10.42.42.42 |                 |
      |                  |                                |                 |
      |                  |                                |                 |
      +------------------+                                +-----------------+
       ^ethernet #4     ^ ethernet #3
       |                |
       |                |
       |                +----> some switch with many devices
       |
       v 10.42.42.150
       NAS

Because the modem is reachable over 192.168.1.0/24 and is used by the router on that switch, but that the LAN network uses the same switch with 10.42.42.0/24, ARP packets arrives on two network interfaces of the router, for addresses that are non routables (ARP packets for 10.42.42.0 would arrive at the interface 192.168.1.0 or the opposite).

Solution

There is simple solution, but it was very complicated to find as it's not obvious. We can configure the Linux kernel to discard ARP packets that are related to non routable addresses, so the interface with a 192.168.1.0/24 address will discard packets for the 10.42.42.0/24 network and vice-versa.

You need to define the sysctl net.ipv4.conf.all.arp_filter to 1.

sysctl net.ipv4.conf.all.arp_filter=1

This can be set per interface if you have specific need.

Documentation of the sysctl available on Linux

Conclusion

This was a very annoying issue, incredibly hard to troubleshoot. I suppose OpenBSD has this strict behavior by default because I didn't have this problem when the router was running OpenBSD.