Educause Security Discussion mailing list archives

Route table growth and router hardware limits


From: Joe St Sauver <joe () OREGON UOREGON EDU>
Date: Thu, 20 Sep 2007 09:43:08 -0700

Hi,

There was an interesting and (IMHO) very important thread recently on NANOG
that I suspect some people may have overlooked because they may not be on
that list, or because they missed it in the volume of traffic on that list,
or because of thread subject labeling drift, or because of unlucky timing
(lots of people out on vacation in August, etc.), or whatever. Hopefully
folks at most schools will already be on top of this issue, but on the off
chance not, I wanted to make sure I mentioned it here so no one gets an
unpleasant surprise in a few months.

Let's start with the nutshell version of this issue (because even the
nutshell version isn't particularly brief).

-- The global BGP routing table now has 235,174 routes, as measured
   via Routeviews data (see http://bgp.potaroo.net/index-bgp.html )

   The number seen by others may be higher or lower (for example, APNIC's
   number is somewhat higher).

-- Some routing engines from a major routing vendor have a hardware limit
   to the maximum number of routes which that equipment can process in
   hardware (244,000 is the magic number you'll see mentioned).

-- Growth in the global routing table has been ongoing (see the graph at
   http://bgp.potaroo.net/bgprpts/bgp-active.png ), and I doubt that
   growth will suddenly slow and cause the angle of that curve to suddenly
   inflect downward.

-- When the maximum hardware capacity of that popular routing engine
   is exceeded, some routes may be ignored and/or traffic may be processed
   in software rather than in hardware, which may be unacceptably slow in
   some circumstances

-- You can't fix this by anything as simple as adding more memory to the
   cards; fixing this may require either purchasing new and more capable
   cards for the existing router chassis, or doing a forklift replacement
   of the entire router.

-- Some configuration-related mitigation strategies exist, but none of
   them are ideal (IMHO), although they may become highly interesting
   given the timing I'll mention next...

-- Route table growth may be on the order of 3.5K new routes per month;
   assuming that estimate is correct, note the following calculation

   (244,000-235,174)/3,500=~2.5 months till folks reach the magic number

-- Universities typically do not like to have to do material hardware
   upgrades in the middle of the term

-- Even if your gear isn't affected, the gear at the sites you connect to
   may be, so beware of discarding this as an irrelevant issue :-)

With that for context, let's tease that out of the hairball that's the
NANOG archives (don't trust my summary, I'd encourage you to read it for
yourself). A reasonable starting point is:

1) http://www.merit.edu/mail.archives/nanog/msg02665.html (the following
   excerpt is quoted from that posting), a poster mentions:

   > The MSFC2 therefore can server 244,000 routes without uRPF turned on.

   I'm hit square on with this because I use Sup2's with the msfc2/pfc2
   for the link to both of my transit providers. I took this up with the
   Cisco TAC overnight to find out where I stand. Here's what I found:

   1. The msfc2/pfc2 does in fact have a limit that starts at 244,000 routes.

   2. Once the limit is reached, excess routes will fail over to software
   switching. TAC did not specify how routes are designated as excess.

   3. The Sup 720 (except for the 3bxl) has a similar limit, however the
   "mls cef maximum-routes" command can be used to make upwards of
   260,000 TCAM entries available to IPv4 unicast routing. The Sup 2 does
   not support this command.

   4. The suggested upgrade path is the Supervisor 720-3BXL whose TCAM
   can support up to 1M IPv4 FIB entries or 500k IPv6 FIB entries. With a
   7600 (instead of a 6500) the RSP 720-3CXL can do the same and also has
   a faster processor, more memory, etc.

   [continues]

2) There was an excellent followup post from Lincoln Dale of Cisco (see
   http://www.merit.edu/mail.archives/nanog/msg02670.html ):

   > [...]
   > > 2. Once the limit is reached, excess routes will fail over to software
   > > switching. TAC did not specify how routes are designated as excess.

   most-specific-prefixes first.  it has to be this way due to the way a TCAM
   search works.

   > I'm not sure if the Sup2's handle this case differently from the
   > Sup720s we were using, but, in our case, when we reached the ceilign
   > the routes appeared in both the routing and CEF tables but were not
   > populated into the FIB.
   >
   > Translation: the route was ignored....

   how old is the software you were running on your cat6k?
   reason i ask is that since circa. 12.2(18)SXF9 (i.e. back in 2005), there
   has been a graceful degradation back to software forwarding for those
   entries that don't fit into the FIB TCAM:

    - when the h/w FIB is full (FIB exception) it goes into exception state
      where it will maintain the longest-prefix-matches by removing shortest-
      prefix-matches from the FIB TCAM first
    - it will also insert a default entry to punt lookup exceptions to
      software
    - software typically CAN maintain a full FIB, so entries which don't fit
      into hardware can be software forwarded in the CEF software switching
      path
    - when the h/w FIB is full, the following syslog message is generated:
         MLSCEF-SP-7-FIB_EXCEPTION: FIB TCAM exception, Some entries will
         be software switched

   of course, software forwarding is potentially orders-of-magnitude slower
   than h/w forwarding, so how much extra headroom this gives you once you
   exceed the capabilities is dependent on the amount of traffic to those
   prefixes that don't fit into the h/w tables.

   agree that this isn't "ideal", however Cisco has always been very specific
   about the h/w FIB & adjacency table sizes on the hardware in question.
   i know that vendor bashing is a sport in this list, but....

   relevant bug-ids if you wanted to look up the details:
        CSCse90572 syslog message when FIB TCAM exceeds 95% utilization
        CSCsb18172 wrong packet forwarding at FIB exception

   if you need further clarification, feel free to contact me off-list, my
   work email address is ltd () cisco com.

   [snip]

3) For time-till-magic-number makes itself known, see
   http://www.merit.edu/mail.archives/nanog/msg02639.html -- the relevant
   bit there is probably:

   >>Any reasonably valid way of predicting when we'll hit 244,000 routes
   >>in the default-free zone?
   >
   >     Real Soon Now?

   According to Geoff, the BGP table is growing at around 3500 routes per
   month, so we're looking at blowing out MSFC2s in about 3 months if
   nothing changes.

4) Bill Manning contributed the next interesting bit (see
   http://www.merit.edu/mail.archives/nanog/msg02680.html ):

   > >For a few more months.  What are upgrade cycles like again?  How common
   > >are the MSFC2s?
   >
   > I think we'll find out in a few months, when the "internet breaks" in a
   > whole bunch of places where the admins aren't aware of this issue or
   > operations have been downsized to the point that things are mostly on
   > auto-pilot.  I'm guessing there are a good number of Sup2's in use, and
   > that a good % of them think they're fine...as they have 512MB RAM and on
   > the software based routers, that's plenty for current full BGP routes.

        private replies suggest (w/ lots of handwaving) that perhaps 20-35%
        of the forwarding engines in use might fit this catagory.

   > Anyone want to bet there will be people posting to nanog and cisco-nsp in
   > a few months asking why either the CPU load on their Sup2's has suddenly
   > shot up or why they keep noticing parts of the internet have gone
   > unreachable?...oblivious to this thread.

        that would be a sucker bet

5) This month, Jon Lewis reported some interesting results on prefix
   filtering tests he'd done, and that work resulted in many interesting
   followup comments and insights; that thread begins at:

   http://www.merit.edu/mail.archives/nanog/msg02822.html

   The easiest way to see the followups to that posting is probably via

   http://www.merit.edu/mail.archives/nanog/threads.html

   (see "Route table growth and hardware limits" beginning 09/07/07 )

Anyhow, give the "availability" component of the traditional security
confidentiality/integrity/availability triad, I just wanted to make sure
that this was an issue that higher ed security folks had on their radars.

Your network engineers may have/hopefully will already have addressed
this issue, but if not, it would be something that folks should be
looking at as the total number of routes continues to grow.

Regards,

Joe St Sauver (joe () oregon uoregon edu or joe () internet2 edu)
Internet2 Security Programs Manager
http://www.uoregon.edu/~joe/

Current thread: