nanog mailing list archives

Re: BGP convergence problem


From: Matthew Petach <mpetach () netflight com>
Date: Tue, 8 Jun 2010 09:26:47 -0700

On Tue, Jun 8, 2010 at 7:27 AM, Andy B. <globichen () gmail com> wrote:
I finally decided to shut down all peerings and brought them back one by one.

Everything is stable again, but I don't like the way I had to deal
with it since it will most likely happen again when DECIX or an other
IX we're at is having issues.

I've seen a few BGP convergence discussions on NANOG, but none about
deadlock situations and what could be done to avoid them. Setting
higher MTU or bigger hold queues did not help.

- Andy

Some people have found that upgrading to an alternate router vendor
helps.  ^_^;

Fundamentally, the CPU on your router is underpowered for the amount
of state information that needs to be updated in the time window of the
hold timers.  If you can't move to a faster/more efficient platform, then
you may need to negotiate raising the keepalive interval and corresponding
hold timers with your neighbors, to give your router time to finish processing
updates.

Alternately, if you aren't in a position to be able to upgrade platforms, but
have spare routers around, connecting a second router up to the exchange
and splitting your neighbors up among two links into the exchange would
reduce the load on each router during reconvergence, and buy you time
until you can move to a more capable platform.

Matt


Current thread: