nanog mailing list archives

Re: Networks ignoring prepends?


From: James Jun <james.jun () towardex com>
Date: Wed, 24 Jan 2024 04:59:29 -0500

On Tue, Jan 23, 2024 at 10:12:33PM -0800, William Herrin wrote:
Respectfully Chris, you are mistaken.

https://datatracker.ietf.org/doc/html/rfc4271#section-9.1.2.2

"a) Remove from consideration all routes that are not tied for having
the smallest number of AS numbers present in their AS_PATH
attributes."

So literally, the first thing BGP does when picking the best next hop
is to discard all but the routes with the shortest AS path.

Not true.  Read the whole RFC--you've ommitted Sections 9.1 and 9.1.1, which are very critical.

Discarding all but the routes with shortest AS path is _not_ literally the first thing BGP does as you stated above.

The first thing BGP does is to calculate the degree of preference whenever BGP receives a new route, withdrawn route or 
replacement route (See Section 9.1.1).  The determination of the degree of preference is considered to be a local 
matter for each Autonomous System exercising route policy, typically expressed using LOCAL_PREF, to execute upon the 
configured administrative policy to class the incoming routes.

After completion of 9.1.1, section 9.1.2 and 9.1.2.2 which you cited begins (Phase 2: Route Selection).  Route 
selection under 9.1.2 is only invoked after degree of preference is determined (called 'Phase 1' decision) as clearly 
described in Section 9.1.

In fact, even in 9.1.2.2 that you cited above, it clearly states:

   In its Adj-RIBs-In, a BGP speaker may have several routes to the same
   destination that have the same degree of preference. 

   [ snip ]

   The following tie-breaking procedure assumes that, for each candidate
   route, all the BGP speakers within an autonomous system can ascertain
   the cost of a path (interior distance) to the address depicted by the
   NEXT_HOP attribute of the route, and follow the same route selection
   algorithm.

   The tie-breaking algorithm begins by considering all equally
   preferable routes to the same destination, and then selects routes to
   be removed from consideration.  The algorithm terminates as soon as
   only one route remains in consideration.  The criteria MUST be
   applied in the order specified.

   [ snip ]

      a) Remove from consideration all routes that are not tied for
         having the smallest number of AS numbers present in their
         AS_PATH attributes.  Note that when counting this number, an
         AS_SET counts as 1, no matter how many ASes are in the set.



So you see, the comparison of AS_PATH and therefore the route selection process could only begin after routes are first 
resolved by their degree of preference, often typically exercised by LOCAL_PREF across the AS (or other similar import, 
such as Cisco's "weight" parameter which is applied before LOCAL_PREF locally significant to the router itself where 
its been configured).  The route selection process, including the elimination of routes with inferior AS paths, is a 
tie-breaker algorithm after degree of preference is first calculated, which is what we've been trying to tell you.  So 
no, AS_PATH comparison is not literally the first thing BGP does.

You're ignoring Section 9.1.1 in its entirety, which chronologically begins before Section 9.1.2.2 (the section you 
cited), which also clearly specifies that route selection process described in it (including AS_PATH comparison) is a 
tie-breaking procedure. 



It also says that BGP implementations are -allowed- to use other
selection criteria.


Further followed by the following clause immediately afterwards: 
  "BGP implementations MAY use any algorithm that produces the __same results__ as those described here."

And restricted by the following clause in the preceding paragraph:
  "The criteria MUST be applied in the order specified."

And clarified by Section 9.1:
  "as long as the implementations support the described functionality and they exhibit the same externally visible 
behavior."


And there are many situations where doing so is
well advised and improves the result. But AS path length is
unambiguously the default, off which a user has to move it.


So, when a BGP implementation is written in a router software, how does the manufacturer know whether your network is 
going to need to be applying lot of degrees of preference, or none?  The vendors have no idea, and RFC also clarifies 
that degree of preference is a local policy matter.  Therefore, the default behavior is to assume a universally same 
LOCAL_PREF until a policy is configured, which typically has been '100' across many vendor implementations.  In this 
instance, since all routes have the same degree of preference of 100, Section 9.1.2.2 you cited then begins to 
tie-break the routes of same preference, starting with the AS_PATH comparison, but it is absolutely by no means, the 
first thing BGP does, at all.  The first thing BGP does as clearly specified in the RFC is to determine the degree of 
preference to meet local routing policy.


The degree of preference differs greatly depending on what type of network you run.  If you're an edge consumer ASN 
(such as multi-homed stub enterprise running BGP), without providing any downstream IP transit to other BGP customers, 
and not peering with other networks (at an IX or otherwise), then your network probably doesn't have a lot of need to 
apply administrative policy to determine a degree of preference, and you can be happy fiddling with just AS_PATH.

But if you're running a network which provides transit to other ASNs and peering with other networks, then suddenly, 
applying administrative policy is not only desirable, but operationally required.  This isn't solely a revenue/greed 
problem as some have cynically stated, but it's actually also a critical service availiability and reliability issue, 
because not having degree of preference pursuant to established routing policy in an IP network completely eliminates 
the ability to implement a desired predictability in traffic engineering to meet capacity planning objectives for 
network interconnections.

Are there exceptions, pitfalls to this, where poorly designed or thought-out networks suffer in certain routing 
situations?  Absolutely.  But that's the Internet-- it's not perfect, but it works very well most of the time for most 
situations.  

Your desired 'policy-free, AS_PATH-only' world may solve your particular complaint at hand, but it absolutely would 
break the rest of the Internet, with no effective ways to implement routing policy for large-scale network 
interconnections that make the Internet tick.  BGP exists to provide anchors to apply routing policy into the path 
selection process at scale.  It is wrong to assume that AS_PATH is the first thing and the only thing which matters in 
BGP, through incorrect and out-of-context parsing of the RFC to fit your desired narrative.  

In operational realities, backed by the history and the RFCs themselves, the single most important and influencial knob 
in BGP is actually arguablely the LOCAL_PREF, more so than AS_PATH.  Sadly, most people won't get to experience this 
until they've run or dealt with operational realities of managing a large IP network.  The problem you're complaining 
about is an exception, primarily caused by your poor selection of IP transit provider at the data center which you're 
running AS11875, and you're demanding everyone else to take responsibility for the purchasing decision you've made.  
There are some good proposals, such as commonly accepted wide communities for commonly encountered traffic-engineering 
scenarios to help improve upon this, and make BGP a better experience for the end-user in situations like the one 
you're having, but we're not quite there today, and it's understandably not going to be a quick process.

In the meantime, in the immediate short term, glad to hear that your route pollution announcement solved the issue for 
you.  In the medium-term, you should get a new transit provider for AS11875 with better connectivity into 3356.  
Long-term, perhaps commonly accepted wide communities could become a standard some day to improve knobs in situations 
like this.


James


Current thread: