nanog mailing list archives

Re: problem at mae-west tonight?


From: Rob Liebschutz <rob () rjl com>
Date: Sun, 14 Jul 96 17:01:57 PDT

On a regular basis, MAE West has failures that cause only some of the
other routers to become unreachable.  I believe all of the instances
that I have heard of are related to problems with the Netedges which
show up mostly under high load.  I've heard various people call it
the "Sleeping Interface" problem.  It usually goes away when you
reset the netedges, but under heavy load it can come back quickly.

We have seen it extensively, when our 10mb connection became heavily
loaded.  Just yesterday, we upgraded to a DS3, so were hoping
not to see it for a while.  I know AGIS has also had this problem
and I believe Best has seen it as well.  Maybe you haven't seen
it because you have a colocated router at MAE-WEST.

Rob


We experienced the same thing with Netcom.  Currently we are peered with over
40 netwroks through the RS, but I have only had this problem with Netcom.

Is it really a next-hop problem or a Netcom internal problem?  Last time
this happened, about 2 weeks ago, they cleared their RA session and did
some other things and everything came up fine.  I did not get details from
the routing folks over there.

I don't quite see how and where the layer 2 topology comes into play here.
Netcom should simply be seeing routes (through the RS) that state your MW
IP address and the routes advertised from it.  Is there some reason that your MW
IP would be unreachable by Netcom?  I am confused as to why this would ever
happen in the MW scenario.  Now the PB-NAP is a different story with the
non-fully meshed scenario.

Please explain what you mean Matt.

Rob
Exodus Communications Inc.

 The problem I have with the route server this evening is that I announce
my routes to the route server, and my policy configuration in the route server
reflects that I peer with Netcom, and so the route server tells Netcom how
to reach me. Unfortunately, packets leaving Netcom headed to me at layer 2
are going into a black hole. To fix this, I've had to dump my peering with
the route server entirely, so that Netcom is only seeing my routes from AGIS
(our transit provider) and not from the route server. Ugh. My fears about
the route server not knowing the status of the layer 2 topology have come true,
and there's no way to fix this that doesn't involve manual intervention.

-matthew kaufman
 matthew () scruz net



Well, I run gated on a BSDI box for the Hooked MAE West router.  I'm
thinking about implementing a "pingnouse INTERVAL" option on the
peer/group commands in gated, so it will periodically ping next hops
received from the route servers and set the nouse bit if the nexthop
is unreachable.  Any better ideas?

It would be nice to come up with a good mechanism for doing 3rd party
keepalives that cisco and other router vendors would be willing to
implement.

Rob




- - - - - - - - - - - - - - - - -


Current thread: