nanog mailing list archives

Re: Destination Preference Attribute for BGP


From: Tom Beecher <beecher () beecher cc>
Date: Mon, 21 Aug 2023 11:44:18 -0400


So, while this all sounds good, without any specifics on vendor, box,
code, code revision number, fix, year it happened, current status, e.t.c.,
I can't offer any meaningful engagement.


If you clicked Matt's link to the Google search, you could tell from the
results what vendor , model, and year it was pretty quickly.


 It's just not moving the needle on this thread, though.


Assertion Made : "Networks can scrub communities for memory or convergence
reasons."
Others : "That doesn't seem like a concern. "
Matt : "Here was a real situation that happened where it was a concern, and
the specifics on the reason why."

How is that not 'moving the needle? Because you didn't get full transcripts
of his conversation with the vendor?.  I'm sure a lot of people didn't even
know that hashing / memory hotspotting was even a thing. Now they do.








On Sat, Aug 19, 2023 at 1:17 AM Mark Tinka <mark@tinka.africa> wrote:



On 8/19/23 00:22, Matthew Petach wrote:

Hi Mark,

I know it's annoying that I won't mention specifics.
Unfortunately, the last time I mentioned $vendor-specific information on
NANOG, it was picked up by the press, and turned into a multimillion dollar
kerfuffle with me at the center of the cross-hairs:

https://www.google.com/search?q=petach+kablooie&sca_esv=558180114&nirf=petah+kablooie&filter=0&biw=1580&bih=1008&dpr=2

After that, I've learned it's best to not name specific very-big-name
vendors on NANOG posts.

What I *can* say is that this was one of the primary vendors in the
Internet backbone space, running mainstream code.
The only reason it didn't affect more networks was a function of the
particular cluster of signalling communities being applied to all inbound
prefixes, and how they interacted with the vendor's hash algorithm.

Corner cases, while valid, do not speak to the majority. If this was a
major issue, there would have been more noise about it by now.


I prefer to look at it the other way; the reason you didn't hear more
noise about it, is that we stubbed our toes on it early, and had relatively
fast, direct access to the development engineers to get it fixed within two
days.  It's precisely *bcause* people trip over corner cases and get them
fixed that they don't end up causing more widespread pain across the rest
of the Internet.


There has been quite some noise about lengthy AS_PATH updates that bring
some routers down, which has usually been fixed with improved BGP code. But
even those are not too common, if one considers a 365-day period.


Oh, absolutely.  Bugs in implementations that either crash the router or
reset the BGP session are much more immediately visible than "that's odd,
it's taking my routers longer to converge than it should".

How many networks actually track their convergence time in a time series
database, and look at unusual trends, and then diagnose why the convergence
time is increasing, versus how many networks just note an increasing number
of "hey, your network seems to be slowing down" and throw more hardware at
the problem, while grumbling about why their big expensive routers seem to
be less powerful than a *nix box running gated?

I suspect there's more of these type of "corner cases" out there than you
recognize.
It's just that most networks don't dig into routing performance issues
unless it actually breaks the router, or kills BGP adjacencies.

If you *are* one of the few networks that tracks your router's convergence
time over time, and identifies and resolves unexpected increases in
convergence time, then yes, you absolutely have standing to tell me to pipe
down and go back into my corner again.  ;D


So, while this all sounds good, without any specifics on vendor, box,
code, code revision number, fix, year it happened, current status, e.t.c.,
I can't offer any meaningful engagement.

We all run into odd stuff as we operate this Internet, but the point of a
list like this is to share those details so we can learn, fix and move
forward.

Your ambiguity does not lend itself to a helpful discussion,
notwithstanding my understanding of your caution.

I am less concerned about keeping smiles on vendors' faces. I tell them in
public and private if they are great or not. But since you've been burned,
I get. It's just not moving the needle on this thread, though.

Mark.


Current thread: