nanog mailing list archives

Re: Multi-homed clients and BGP timers


From: Iljitsch van Beijnum <iljitsch () muada com>
Date: Sat, 23 May 2009 14:54:49 +0200

On 23 mei 2009, at 0:58, Zaid Ali wrote:

From experience I found that you need to keep all the timers in sync with all your peers. Something like this for every peer in your bgp config.

neighbor xxx.xx.xx.x timers 30 60


30 60 isn't a good choice because that means that after 30.1 seconds a keepalive comes in and then after 60.0 seconds the session will expire while the second one would be there in 60.1 seconds.

The other side will typically use hold timer / 3 for their keepalive interval. If you set it to something not divisible by 3 then you get all 3 of those within the hold timer.

I often recommended 5 16 in the past but that's a bit on the short side, some less robust BGP implementations work single threaded and may not be able to send keepalives every 15 seconds when they're very busy.

The minimum possible hold time is 3.

If you only change the setting at your end you can change it to something higher when bad stuff happens, if the other end also sets it then you'll have to change it at both ends as the hold time is negotiated and the lowest is used.

If you really want fast failover terminate the fiber in the BGP router and make sure fast-external-failover is on (I think it's the default).

For manual failover, simply shut down the BGP sessions on the router that you don't want to handle traffic at that time. If you have peergroups you can do "neighbor peergroup shutdown" for the fastest results. Shutting down interfaces is not such a good idea, then the routing protocols have to time out.




Make sure that this is communicated to your peer as well so that their timer setting are reflected the same.

Zaid
----- Original Message -----
From: "Steve Bertrand" <steve () ibctech ca>
To: "nanog list" <nanog () nanog org>
Sent: Friday, May 22, 2009 3:45:20 PM GMT -08:00 US/Canada Pacific
Subject: Multi-homed clients and BGP timers

Hi all,

I've got numerous single-site 100Mb fibre clients who have backup SDSL
links to my PoP. The two services terminate on separate
distribution/access routers.

The CPE that peers to my fibre router sets a community, and my end sets
the pref to 150 based on it. The CPE also sets a higher pref for
prefixes from the fibre router. The SDSL router to CPE leaves the
default preference in place. Both of my PE gear sends default- originate
to the CPE. There is (generally) no traffic that should ever be on the
SDSL link while the fibre is up.

Both of the PE routers then advertise the learnt client route up into
the core:

*>i208.70.107.128/28
                   172.16.104.22             0    150      0 64762 i
* i                 172.16.104.23             0    100      0 64762 i

My problem is the noticeable delay for switchover when the fibre happens
to go down (God forbid).

I would like to know if BGP timer adjustment is the way to adjust this,
or if there is a better/different way. It's fair to say that the fibre
doesn't 'flap'. Based on operational experience, if there is a problem
with the fibre network, it's down for the count.

While I'm at it, I've got another couple of questions:

- whatever technique you might recommend to reduce the convergence
throughout the network, can the same principles be applied to iBGP as well?

- if I need to down core2, what is the quickest and easiest way to
ensure that all gear connected to the cores will *quickly* switch to
preferring core1?

Steve




Current thread: