nanog mailing list archives

Re: seeing the trees in the forest of confusion


From: Doug Junkins <junkins () nwnet net>
Date: Sat, 26 Apr 1997 14:02:49 -0700 (PDT)

On Sat, 26 Apr 1997, John Hawkinson wrote:

These cases seem to point to a problem with BGP route withdrawls that will
continue to increase the time it takes to recover from network problems.
Perhaps the router vendors would like to comment.

This seems inappropriate to me.

You have just said: "I sat and watched a provider keep routes around
long past their being withdrawn, and they didn't know what to do so
suggested two kludges: 1) advertising more-specifics and 2) rebooting
routers. Could some vendor comment on this problem?".


Perhaps I should have been more clear with what the provider did during
the 5 hours that the routing loop continued in there backbone.  It didn't
take 5 hours to for the provider to identify that there was a problem with
the routes in their tables (i.e. a few of their routers in their IBGP mesh 
had more specifics from Provider Y while most did not).  Instead, it took
the provider 5 hours to troubleshoot the problem with the router vendor
before both agreed that it was a software bug and identified the need to
reload some of the routers.  The hack of advertising more specifics was
used to buy time before reloading the routers to minimize the impact.


This is every vendor's worst nightmare.

Every vendor necessarily (and rightly so!) provides all users enough
rope to hang themselves with. It seems inappropriate for someone who
doesn't know what the full story is to call vendors to account.

If the provider in question adjusted some knobs and settings so as to
cause such a problem, what is the vendor to do?

How could the vendor even come close to trying to explain the problem
without detailed information about the problems and configurations?


Pessimistically speaking, it seems that there are two ways that this
thread could come to a close:

      1)      People will keep badgering the vendor and the vendor
              will come out looking ugly if they cannot account for
              the problem based on insufficient data.

      2)      People will all be quiet and stop complaining until
              the operator(s) in question and vendor(s) have information
              and communicate it.

2) seems obviously preferable, but I suspect that the people on this
list will go for 1) since it will allow everyone to flame and chatter
incessantly, increasing NANOG mail volume and everyone's productivity.


If I'm the only person that's seen this type of problem, I'll shut up
about it.  But if this type of problem has impacted more providers, I
think it's appropriate in this forum to ask the router vendors to comment
on any known problems with BGP route withdrawals.  If they don't have
enough information to account for the problem, then they should tell us
that so we can get the data to them the next time something like this
happens. 

- Doug 

If anyone who has seen this problem first hand has detailed technical
information to provide, that is of course useful and welcome in this
forum. But complaining without having any of the data? What's the point?

--jhawk


- - - - - - - - - - - - - - - - -


Current thread: