nanog mailing list archives

Re: Links on the blink - what will/should mci & sprint do?


From: Curtis Villamizar <curtis () ans net>
Date: Mon, 20 Nov 1995 14:57:47 -0500


In message <Pine.LNX.3.91.951119010238.14818B-100000 () okjunc junction net>, Mich
ael Dillon writes:
On Sat, 18 Nov 1995, Sean Doran wrote:


| Sounds like there is a need for a good ip switch.  Something simple, 
| very fast, and low cost that you can download "static" routes to.  

It's called an SSP.

And the problem on the net isn't with SSP's. The problem is that the 
routing tables are NOT static. Switching is working fine, but the size of 
the routing tables (CIDRize or die!) and the constant change in the 
routing tables are the problem. Note that CIDRizing also reduces the 
amount of change in the routing tables by replacing a set of potentially 
varying routes with an unvarying aggregate.

Even building a mondo box to handle huge routing tables and lots of 
changes is not enough to solve the problem because there there is also 
the protocol problem whereby routers communicate these route changes to 
one another. This limits the number of BGP peering sessions that are 
practical.

Of course, most people here already know this but for those who are 
trying to understand what is going on, I hope my brief explanation helps.

Michael Dillon                                    Voice: +1-604-546-8022
Memra Software Inc.                                 Fax: +1-604-542-4130
http://www.memra.com                             E-mail: michael () memra com


Actually, you don't have the problem quite right.

The problem is not the sheer size of the routing table.  The 64 MB RP
has fixed that for quite a while.  It is not the processing load
associated with the route change.  An RS6K can keep up easily if it
doesn't have to page (enough RAM in the box), and so can a 68020 if it
was allowed enough CPU time to do something.

The problem is that when a large set of routes change, a large set of
routes in the SSP are invalidated.  This results in a large amount of
traffic forwarded to the RP.  The SSP is bludgenning the RP in order
to tell it that it needs some cache entires updated.  The RP then
can't keep adjacencies up and more route change results, which can
kill other routers.  If it gets far enough out of hand, the
instability can turn into a stable oscillation and you have a melted
backbone.  This is a consequence of the architecture and the cache
design.  I've been pointing out this for years.  Now it blew up.

This is very fixable and Cisco could even fix it without requiring
everyone to throw out their Cisco 7000s.  Just get rid of the cache
completely and push full routing from the RP to the SSP!

Curtis

ps - This is my guess.  Cisco or Sprint have not yet confirmed or
denied this.  Perhaps Sean or Tony would care to comment.  ;-)


Current thread: