nanog mailing list archives

Re: bfd-like mechanism for LANPHY connections between providers


From: Sudeep Khuraijam <skhuraijam () liveops com>
Date: Wed, 16 Mar 2011 22:33:39 -0700


On Mar 16, 2011, at 6:05 PM, Jeff Wheeler wrote:

There a difference of several orders of magnitude  between BFD keepalive intervals  (in ms) and BGP (in seconds) with 
generally configurable multipliers vs. >>hold  timer.
With Real time media and ever faster last miles, BGP hold timer may find itself inadequate, if not in appropriate in 
some cases."

For eBGP peerings, your router must re-converge to a good state in < 9
seconds to see an order of magnitude improvement in time-to-repair.
This is typically not the case for transit/customer sessions."



Not so, if your goal is peer deactivation and failover.    Also you miss the point.   Once the event is detected the 
rest of the process starts.  I am talking about
event detection.    One may  want longer than a  30 second hold-timer but  peer state deactivated instantly on link 
failure.  If thats the design goal AND link state is not passed through, then
   BFD BGP deactivation is a good choice.

To make a risk/reward choice that is actually based in reality, you
need to understand your total time to re-converge to a good state, and
how much of that is BGP hold-time.  You should then consider whether
changing BGP timers (with its own set of disadvantages) is more or
less practical than using BFD.



Yes I see that and  I mentioned  "in some cases" not all or most cases.


Let's put it another way: if CPU/FIB convergence time were not a
significant issue, do you think vendors would be working to optimize

  This goes orthogonal to my point.  The Table size taxes, best path algorithms and the speed with
  which you can re-FIB  &rewrite the ASICs are constant in both the cases.  But thats post event.
this process, that we would have concepts like MPLS FRR and PIC, and

Those are out of scope in the context of this thread and have completely different roles.

that each new router product line upgrade comes with a yet-faster CPU?


For things they can sell more licenses for such as 3DES,  keying algorithms , virtual instances, other things on BGP, 
stuff that allow service providers to charge a lot more money
while running on common infrastructure such as MPLS  & FRR and zillion other things like stateful redundancy, higher 
housekeeping needs, inservice upgrades and anything else with a list price.   And its cheaper than the old cpu.

Of course not.  Vendors would just have said, "hey, let's get
together on a lower hold time for BGP."


Because it would be horrible code design.  Link detection is a common service.  Besides BGP process threads can run 
longer than min intervals for link.  Vendors would have to write checkpoints within BGP
   code to come up and service link state machine.   And wait its a user configurable checkpoint!!   So came BFD.  
Write a simple state machine and make it available to all protocols.


As I stated, I'll change my opinion of BFD when implementations
improve.  I understand the risk/reward situation.  You don't seem to
get this, and as a result, your overly-simplistic view is that "BGP
takes seconds" and "BFD takes milliseconds."

 I have no doubt that you understand your risk/reward but you don't for every other environments.

For event detection leading to a state change leading to peer deactivation,  "my overly-simplistic view"  is the fact ( 
not as you put it, but as it was written unedited).  How you want to act in response is dependent on design.
is that "BGP
takes seconds" and "BFD takes milliseconds."

Thats what you read not what I wrote.   I was comparing the speed of event detection.

Now like I said for speed of deactivation  "BGP hold timer may find itself inadequate, if not in appropriate in some 
cases" in this same context.  But as I mentioned , we don't know the pain we are trying to solve for the requirements 
thats drove this thread in the first place.  So I simply put the facts and a business driver.


   BFD is no different than deactivating a peer based on link failure.  Your view is that there is no case for it.  My 
point is - it arrived yesterday,  its just a damn hard thing to monetize upstream in transit.


For a provider to require a vendor instead of RFC compliance is sinful.

Many sins are more practical than the alternatives.
Few maybe.


--
Jeff S Wheeler <jsw () inconcepts biz<mailto:jsw () inconcepts biz><mailto:jsw () inconcepts biz<mailto:jsw () 
inconcepts biz>>>
Sr Network Operator  /  Innovative Network Concepts











Current thread: