nanog mailing list archives

Re: Global BGP - 2001-06-23


From: lucifer () lightbearer com
Date: Sun, 24 Jun 2001 15:15:39 -0700 (PDT)


Brett Frankenberger wrote:

Out of curiosity - did anyone see a duration of significanlt instability
in the global routing tables on Saturday afternoon? Without violating NDA,
all I can say is that it resembled a historic event involve a bad route,
Ciscos, and Bay routers (only this time, it was a bad route, Ciscos, and
<X> vendor whom I cannot name but is being soundly beaten with wet noodles
to resolve the issue). The bad route, and instability, were seen across
all of our transit vendors (all "household" names of transit service).

Hmm ... why is <X> being beaten?  Was the problem reversed this time?

The only historic event I can recall involving a bad route, Cisco, and
Bay (actually, events would be better, since it happened at least
twice) was a case of (a) someone injecting a bad route, (b) the cisco
at the other end accepting it in violation of the RFC, (c) ciscos
passing that bad route all around the internet, all in violation of the
RFC, (d) that route eventually hitting a cisco<->bay peering
connection, and (e) the Bay (although the problem wasn't limited to
Bay, as gated, and possible other implementations as well, behaved the
same way) properly sending a NOTIFY and taking down the BGP session, as
required by the RFC.

A) Ciscos flap sessions, according to the only reports I've heard.
B) <X> routers were crashing, either due to the bug, or the session resets.
   Thus, <X> is being flogged. I have reports of at least one <Y> having
   problems, as well.
C) I would post the BugID, but the only source I have is under NDA. However,
   having now heard this much in a public forum (IE, not covered), I can say
   "Invalid AS path data bug".

It only took two major outages before Cisco fixed the problem.  (The
BGP advertisement was posted to NANOG both times, as was the BugID the
second time.)  

I have the guilty announcement, but again, it's under NDA. However, I can
say that we are now seeing this announcement from all of our upstreams,
non-blocked, so it appears that they fixed the origionating point.

So if this is the same issue, Cisco would be the vendor to flog,
although assuming they didn't re-introduce it, the flogging might more
correctly be directed at providers still running code old enough to
have this particular problem.

I would flog Cisco as well, but A) they have a bug on it already, and B)
we're not using Ciscos for our core (note: this is my personal email, and
I am not speaking for my employer; however, this is publically documented
on my employers website, so it's not NDAed).

Both my transits (Bay on my end, Cisco on the other end) made it
through just fine, though.  (This time.  The last two times it
happened, the cisco's on the other end happily passed the invalid route
to me and the Bay on my end happily dropped the BGP session, and this
was repeated ad infinitum until the bogus route was removed from the
other end.)

I have no data on Bay; my apologies if this wasn't clear. Bay was *only*
being referenced as a historical point of note. No attempt at FUD, and my
apologies if anyone read it that way.
-- 
***************************************************************************
Joel Baker                           System Administrator - lightbearer.com
lucifer () lightbearer com              http://www.lightbearer.com/~lucifer


Current thread: