nanog mailing list archives

Re: BFD for routes learned trough Route-servers in IXPs


From: Baldur Norddahl <baldur.norddahl () gmail com>
Date: Sun, 20 Sep 2020 22:32:42 +0200

Hello

ARP timeout should be lower than MAC timeout, but usually the default is
the other way around. Which is extremely stupid. To those who do not know
why, let me give a simple example:

Router R1 is connected to switch SW1 with a connection to server SRV: R1
<-> SW1 <-> SRV
Router R2 is connected to switch SW2 with a connection to server SRV: R2
<-> SW2 <-> SRV

The server is using R1 as default gateway. Traffic is arriving from the
internet through R2 towards the server. The server will however send
replies back through the default gateway at R1. This is a usual case with
redundant routers - only one will be used as a default gateway but traffic
may come from both.

Initially all will be good. But SW2 is only seeing unidirectional traffic
from R2. No traffic goes from SRV to R2 and thus, after some time, SW2 will
expire the MAC learning for SRV. This has the unfortunate result that SW2
will start flooding traffic to SRV out through all ports.

Then after more time has passed, R2 will renew the ARP binding by sending
out an ARP query to SRV. The server will send back an ARP reply to R2. This
packet from SRV to R2 will pass SW2 and thus have the effect of renewing
the MAC binding at SW2 too. The flooding stops and all is well again. Until
the MAC binding expires and the story repeats.

If the MAC timeout is 5 minutes and the ARP timeout is 20 minutes, which is
very usual, you will have flooding for 15 minutes out of every 20 minutes
interval! Stupid!

Why have vendors not fixed their defaults for this case?

Regards,

Baldur



On Thu, Sep 17, 2020 at 7:51 AM Saku Ytti <saku () ytti fi> wrote:

On Wed, 16 Sep 2020 at 23:15, Chriztoffer Hansen
<chriztoffer.hansen () de-cix net> wrote:
On 16/09/2020 04:01, Ryan Hamel wrote:

CoPP is always important, and it's not just Mikrotik's with default low
ARP timeouts.

Linux - 1 minute
Brocade - 10 minutes
Cumulus  - 18 minutes
BSD distros - 20 minutes
Extreme - 20 minutes
Juniper - 20 minutes
HP - 25 minutes
IOS - 4 hours

Why are these considered (by Ryan) low values? Does low have a
negative connotation here?

ARP timeout should be lower than MAC timeout, and MAC timeout usually
is 300 seconds. Anything above 300seconds is probably poor BCP for
default value, as defaults should interoperate in a somewhat sane
manner.
Of course operators are free to configure very high ARP timeout, as
long as they also remember to equally configure higher MAC timeout.

--
  ++ytti


Current thread: