nanog mailing list archives

Re: BGP keepalive/holdtime at GigE exchange


From: Clayton Fiske <clay () bloomcounty org>
Date: Fri, 12 Jan 2001 13:04:45 -0800


On Fri, Jan 12, 2001 at 03:23:51PM -0500, Deepak Jain wrote:

I think the argument is one of stability. BGP is supposed to be stable for
days/weeks on end normally. Making your internal network too sensitive to
external changes destabilizes your network and those who connect to you.

If a BGP session with one peer resets once every three days, and you peer
with them at a few places, at most you are talking about a service
degradation for about 5-10 minutes as say 1/3 of your packets are resent
or dropped (assuming you peer in three places, etc). 180 seconds is
nothing for a router with many peering sessions and a reasonable traffic
load. 

With regard to your earlier comments about busy routers "pausing"
BGP, perhaps this is something that can be investigated at a vendor
software level. I would think keepalives (of any variety) should rank
fairly high on the food chain in terms of CPU precedence. If this isn't
the case already, why not? I don't know how true it is anymore, but I
recall a few years back having to deal with some routers which got
bogged down with OSPF updates to the point that they kept resetting
perfectly stable links (or the other end did) due to keepalives not
being processed in a timely manner. In the interest of stability, I
would certainly want keepalives to be processed ahead of routing
updates. After all, it's not as though they even represent a significant
percentage of the total workload on the CPU, even when you reach a
reasonably high number of links. And if your links keep resetting due
to route churn, you've got a self-perpetuating problem.

The bigger concern is IF a peer is dropping a session that often, *what*
is wrong with their router? I am very afraid of routers that *randomly*
timeout and re-peer with no good reason.

In this case, I would expect a NOC with proper monitoring of peering
sessions to take notice and initiate an investigation into the problem.

-c



Current thread: