nanog mailing list archives

Re: Scalability issues in the Internet routing system


From: "Rubens Kuhl Jr." <rubensk () gmail com>
Date: Wed, 26 Oct 2005 02:21:43 -0200


Assume you have determined that a percentage (20%, 80%, whatever) of
the routing table is really used for a fixed time period. If you
design a forwarding system that can do some packets per second for
those most used routes, all you need to DDoS it is a zombie network
that would send packets to all other destinations... rate-limiting and
dampening would probably come into place, and a new arms race would
start, killing operator's abilities to fast renumber sites or entire
networks and new troubleshooting issues for network operators.

Isn't just simpler to forward at line-rate ? IP look ups are fast
nowadays, due to algorithmic and architecture improvements... even
packet classification (which is n-tuple version of the IP look up
problem) is not that hard anymore. Algorithms can be updated on
software-based routers, and performance gains far exceed Moore's Law
and projected prefix growth rates... and routers that cannot cope with
that can always be changed to handle IGP-only routes and default
gateway to a router that can keep up with full routing.
(actually, hardware-based routers based on limited size CAMs are more
vulnerable to obsolescence by routing table growth than software ones)

Let's celebrate the death of "ip route-cache", not hellraise this fragility.


Rubens





On 10/24/05, Alexei Roudnev <alex () relcom net> wrote:

One question - which percent of routing table  of any particular router is
REALLY used, say, during 1 week?

I have a strong impression, that answer wil not be more than 20% even in
biggerst backbones, and
will be (more likely) below 1% in the rest of the world. Which makes a hige
space for optimization.


----- Original Message -----
From: "Daniel Senie" <dts () senie com>
To: <nanog () nanog org>
Sent: Tuesday, October 18, 2005 9:50 AM
Subject: Re: Scalability issues in the Internet routing system



At 11:30 AM 10/18/2005, Andre Oppermann wrote:

I guess it's time to have a look at the actual scalability issues we
face in the Internet routing system.  Maybe the area of action becomes
a bit more clear with such an assessment.

In the current Internet routing system we face two distinctive
scalability
issues:

1. The number of prefixes*paths in the routing table and interdomain
   routing system (BGP)

This problem scales with the number of prefixes and available paths
to a particlar router/network in addition to constant churn in the
reachablility state.  The required capacity for a routers control
plane is:

 capacity = prefix * path * churnfactor / second

I think it is safe, even with projected AS and IP uptake, to assume
Moore's law can cope with this.

Moore will keep up reasonably with both the CPU needed to keep BGP
perking, and with memory requirements for the RIB, as well as other
non-data-path functions of routers.



2. The number of longest match prefixes in the forwarding table

This problem scales with the number of prefixes and the number of
packets per second the router has to process under full or expected
load.  The required capacity for a routers forwarding plane is:

 capacity = prefixes * packets / second

This one is much harder to cope with as the number of prefixes and
the link speeds are rising.  Thus the problem is multiplicative to
quadratic.

Here I think Moore's law doesn't cope with the increase in projected
growth in longest prefix match prefixes and link speed.  Doing longest
prefix matches in hardware is relatively complex.  Even more so for
the additional bits in IPv6.  Doing perfect matches in hardware is
much easier though...

Several items regarding FIB lookup:

1) The design of the FIB need not be the same as the RIB. There is
plenty of room for creativity in router design in this space.
Specifically, the FIB could be dramatically reduced in size via
aggregation. The number of egress points (real or virtual) and/or
policies within a router is likely FAR smaller than the total number
of routes. It's unclear if any significant effort has been put into this.

2) Nothing says the design of the FIB lookup hardware has to be
longest match. Other designs are quite possible. Again, some
creativity in design could go a long way. The end result must match
that which would be provided by longest-match lookup, but that
doesn't mean the ASIC/FPGA or general purpose CPUs on the line card
actually have to implement the mechanism in that fashion.

3) Don't discount novel uses of commodity components. There are fast
CPU chips available today that may be appropriate to embed on line
cards with a bit of firmware, and may be a lot more cost effective
and sufficiently fast compared to custom ASICs of a few years ago.
The definition of what's hardware and what's software on line cards
need not be entirely defined by whether the design is executed
entirely by a hardware engineer or a software engineer.

Finally, don't discount the value and performance of software-based
routers. MPLS was first "sold" as a way to deal with core routers not
handling Gigabit links. The idea was to get the edge routers to take
over. Present CPU technology, especially with good embedded systems
software design, is quite capable of performing the functions needed
for edge routers in many circumstances. It may well make sense to
consider a mix of router types based on port count and speed at edges
and/or chassis routers with line cards that are using general purpose
CPUs for forwarding engines instead of ASICs for lower-volume sites.
If we actually wind up with the core of most backbones running MPLS
after all, well, we've got the technology so use it. Inter-AS routers
for backbones, will likely need to continue to be large, power-hungry
boxes so that policy can be separately applied on the borders.

I should point out that none of this really is about scalability of
the routing system of the Internet, it's all about hardware and
software design to allow the present system to scale. Looking at
completely different and more scalable routing would require finding
a better way to do things than the present BGP approach.






Current thread: