nanog mailing list archives

Re: Destination Preference Attribute for BGP


From: Jon Lewis via NANOG <nanog () nanog org>
Date: Sat, 19 Aug 2023 17:53:53 -0400 (EDT)

On Fri, 18 Aug 2023, Matthew Petach wrote:

Hi Robert,

Without naming any names, I will note that at some point in the not-too-distant past, I was part of a 
new-years-eve-holiday-escalation to $BACKBONE_ROUTER_PROVIDER when
the global network I was involved with started seeing excessive convergence times (greater than one hour from BGP 
update message received to FIB being updated).  
After tracking down development engineer from $RTR_PROVIDER on the new years eve holiday, it was determined that the 
problem lay in assumptions made about how communities
were stored in memory.  Think hashed buckets, with linked lists within each bucket.  If the communities all happened to 
hash to the same bucket, the linked list in that
bucket became extremely long; and if every prefix coming in, say from multiple sessions with a major transit provider, 
happened to be adding one more community to the very
long linked list in that one hash bucket, well, it ended up slowing down the processing to the point where updates to 
the FIB were still trickling in an hour after the BGP
neighbor had finished sending updates across.

A new hash function was developed on New Year's day, and a new version of code was built for us to deploy under 
relatively painful circumstances. 

This reminds me of two things.

First, some code I wrote more than 20 years ago to track and bill for overlapping dial-up sessions (i.e. dial-up account sharing). Processing the RADIUS accounting data, I built a binary tree of users with each node having a linked list of session data. I found while testing it, that as the amount of data fed in grew, the program got slower. I solved it by converting the session data linked lists to doubly linked lists, allowing me to add session data to the lists by jumping directly to the end, seeing if that's where the current session belonged, and walking back the list if necessary, but generally it was not since the input data was generally in chronological order. That made it super fast again.

Second, we ran into an issue with Arista some time ago and a peer on AMS-IX that set a ridiculous number of communities on their routes. Arista uses (used?) a fixed length buffer for communities in route-map processing and when doing "match community" in a route-map, if the set of communities on the route is longer than the fixed length buffer, and the communitites you're trying to match fall off the end, your route map match statement will fail to match, even though a show ip bgp... will show you that the communities you're trying to match are there.

----------------------------------------------------------------------
 Jon Lewis, MCP :)           |  I route
 StackPath, Sr. Neteng       |  therefore you are
_________ http://www.lewis.org/~jlewis/pgp for PGP public key_________


Current thread: