nanog mailing list archives

Re: any dangers of filtering every /24 on full internet table to preserve FIB space ?


From: Ryan Rawdon <ryan () u13 net>
Date: Mon, 10 Oct 2022 18:54:47 -0400


On Oct 10, 2022, at 6:37 PM, Matthew Petach <mpetach () netflight com> wrote:



On Mon, Oct 10, 2022 at 8:44 AM Mark Tinka <mark@tinka.africa> wrote:
On 10/10/22 16:58, Edvinas Kairys wrote:

Hello,

We're considering to buy some Cisco boxes - NCS-55A1-24H. That box has 
24x100G, but only 2.2mln route (FIB) memory entries. In a near future 
it will be not enough - so we're thinking to deny all /24s to save the 
memory. What do you think about that approach - I know it could 
provide some misbehavior. But theoretically every filtered /24 could 
be routed via smaller prefix /23 /22 /21 or etc. But of course it 
could be a situation when denied /24 will not be covered by any 
smaller prefix.

I wouldn't bank on that.

I am confident I have seen /24's with no covering route, more so for PI 
space from RIR's that may only be able to allocate a /24 and nothing 
shorter.

It would be one heck of an experiment, though :-).

Mark.


I may or may not have done something like this at $PREVIOUS_DAY_JOB.

We (might have) discovered some interesting brokenness on the Internet in doing so; 
in one case, a peer was sending a /20 across exchange peering sessions with us, 
along with some more specific /24s.  After filtering out the /24s, traffic rightly flowed 
to the covering /20.  Peer reached out in an outraged huff; the /24s were being 
advertised from non-backbone-connected remote sites in their network, that suddenly 
couldn't fetch content from us anymore.  Traceroutes from our side followed the /20 
back to their "core", and then died.  They explained the /24s were being advertised 
from remote sites without backbone connections to the site advertising the /20, and 
we needed to stop sending traffic to the /20, and send it directly to the /24 instead.
We demurred, and let them know we were correctly following the information in the 
routing table. 

We encountered similar behavior, but not from a network desegregating their own address space like this.  Rather, it 
was a network (actually a network services vendor) who had a PA /24 from a colo provider that they were no longer 
interconnected with.  We had to filter /24s on transit (our network does not resell transit) due to issues with 
spanslogic inefficiency on Nexus 7k.  

When trying to turn up a demo with this vendor, connections were not establishing.  We found that they had an older PA 
/24 in the FIB but we were following a /20 or some such route to their old upstream/colo.  We ended up doing a bunch of 
work to find other such “possibly disconnected /24s” based mainly on origin ASN, and put in exceptions to our filtering 
until we could complete some hardware upgrades.

In situations like this, we of course did have functioning default routes from our upstream — but that doesn’t help 
since the /20 from a peer was attracting and blackholing the traffic.  As IPv4 continues to desegregate and get resold 
and otherwise optimized, I imagine this will become more common.  Not a problem for a multi-homed stub network with 
multiple default routes coming from upstream, unless they have peering and don’t micromanage it with this in mind.

Ryan

They became even more huffy, insisting that we were breaking the internet by not 
following the correct routing for the more-specific /24s which were no longer present 
in our tables.  No amount of trying to explain to them that they should not advertise 
an aggregate route if no connectivity to the more specific constituents existed seemed 
to get the point across.  In their eyes, advertising the /24s meant that everyone should 
follow the more specific route to the final destination directly.

So, even seeing a 'covering route' in the table is no guarantee that you won't create 
subtle and not-so-subtle breakage when filtering out more specifics to save table space.   ^_^;

+1 


Having (possibly) done this once in the past, I'd strongly recommend looking for a 
different solution--or at least be willing to arm your front-end response team with 
suitable "No, *you* broke the Internet" asbestos suits before running a git commit 
to push your changes out to all the affected devices in your network.   ;)

Matt

 


Current thread: