nanog mailing list archives
Re: Juniper BGP Convergence Time
From: Adam Kajtar <akajtar () wadsworthcity org>
Date: Wed, 30 May 2018 15:49:53 -0400
“I'm running two Juniper MX104s. Each MX has 1 ISP connected running BGP(full routes). iBGP is running between the routers via a two port 20G lag. When one of the ISPs fails, it can take upwards of 2 minutes for traffic to start flowing correctly. The router has the correct route in the routing table, but it doesn't install it in the forwarding table for the full two mins.” I finished my testing and concluded that I would continue running full routes without any fanciness. I will detail some tests and what the outcomes were as well as explain why I decided to keep running full routes. *Receiving Full Routes* Convergence time was 180 seconds. The routing table updated and showed the correct path in under a minute but the forwarding table took 180 seconds for most the routes to update. *BGP Multipath* There was no effect on convergence speed. I think paths between eBGP neighbors are preferred over iBGP. Therefore, no routes are ever equal in this case. *BFD* The slower to converge ISP refused my request to setup BFD between our routers. This option is out of the question. *BGP Timers* I adjusted the BGP hold timer to 30 seconds and the stale route timer to 5 seconds. This change appeared to have no effect on convergence speed. *Receiving Full Routes with a Default* I suspected receiving a default route would fix the issue because the only route that would need to be updated in the forwarding table for traffic to flow. I assumed that it would process the lowest binary route first( 0.0.0.0/0) Once the full table was updated traffic would take the optimal path(This would avoid customer complaints due to latency with VPNs and Voice traffic). I also suspected exporting the default BGP default route into OSPF would speed up OSPF convergences avoiding a generated default route based on neighbor state. Unfortunately, it appears like the forwarding table of the MX104 converges abruptly instead of slowly as router processes them. Also, Traffic would fail as the ISP connection came back up due to BGP exporting the route into OSPF. *Receiving Full Routes with forwarding engine commands* After I completed the above tests, I concluded the forwarding engine would need to speed up, and some sort of hack was in order. I tested the following commands. https://www.juniper.net/documentation/en_US/junos/topics/concept/use-case-for-bgp-pic-for-inet-inet6-lu.html https://www.juniper.net/documentation/en_US/junos/topics/topic-map/forwarding-indirect-next-hop.html With these commands enabled equal cost routes installed into the forwarding table. Failover on equal cost routes was 40 – 50 seconds and 180 seconds on non-equal-cost routes. This was unacceptable because most of the routes are preferred out one ISP over the other. I disabled ECMP and the router began installing all routes into the forwarding table including the secondary route. The router would dump sections of the forwarding table and act very flakey. *Receiving Default Only* I tested filtering out all routes besides the default route. The speed of convergence was 30 - 45 seconds depending on which upstream ISP connection I disconnected. This solution was unacceptable due to the traffic not taking the optimal path outbound. I concluded that 180 seconds was an acceptable failover time given that I exhausted all other resources. I would prefer to have a more reliable failover mechanism than a faster one. Also, everyday speed and usability are more important that failover speed(which rarely happens and almost never during peak hours) in my use case. Thank you to anyone who gave me suggestions on this issue. It helped me understand and accept the outcome. On Sat, May 26, 2018 at 12:15 PM Baldur Norddahl <baldur.norddahl () gmail com> wrote:
Add a static default route on both routers. This will be invalidated as soon the interface goes down. Should be faster than relying on the BGP process on withdrawing the route. Also does not require any config changes at your upstreams. Regards Baldur ons. 16. maj 2018 18.52 skrev Adam Kajtar <akajtar () wadsworthcity org>:Erich, Good Idea. I can't believe I didn't think of that earlier. Simple and effective. I will go ahead and request the defaults from my ISP andupdatethe thread of the findings. Thanks! On Wed, May 16, 2018 at 10:03 AM Kaiser, Erich <erich () gotfusion net> wrote:A last resort route (default route) could still be good to take fromyourISP(s) even if you still do full routes, as the propagation ishappeningonthe internet side, you should at least have a path inbound through the other provider. The default route at least would send the traffic outifit does not see the route locally. Just an idea. On Wed, May 16, 2018 at 8:22 AM, Adam Kajtar <akajtar () wadsworthcity org>wrote:I could use static routes but I noticed since I moved to full routesIhave had a lot fewer customer complaints about latency(especiallywhenitcomes to Voice and VPN traffic). I wasn't using per-packet load balancing. I believe juniper defaultisperIP. My timers are as follows Active Holdtime: 90 Keepalive Interval: 30 Would I be correct in thinking I need to contact my ISP to lowerthesevalues? An interesting note is when I had both ISPs connected into a singleMX104the failover was just a few seconds. Thanks again. On Tue, May 15, 2018 at 8:42 PM Ben Cannon <ben () 6by7 net> wrote:Have you checked your timeouts ? -BenOn May 15, 2018, at 4:09 PM, Kaiser, Erich <erich () gotfusion net>wrote:Do you need full routes? What about just a default route fromBGP?Erich Kaiser The Fusion Network erich () gotfusion net Office: 815-570-3101On Tue, May 15, 2018 at 5:38 PM, Aaron Gould <aaron1 () gvtc com>wrote:You sure it doesn't have something to do with 60 seconds * 3 =180secs ofBGP neighbor Time out before it believes neighbor is dead andremoveroutesto that neighbor? AaronOn May 15, 2018, at 9:10 AM, Adam Kajtar <akajtar () wadsworthcity orgwrote:Hello: I'm running two Juniper MX104s. Each MX has 1 ISP connectedrunningBGP(full routes). iBGP is running between the routers via a twoport20Glag. When one of the ISPs fails, it can take upwards of 2minutesfortraffic to start flowing correctly. The router has the correctrouteintherouting table, but it doesn't install it in the forwarding tableforthefull two mins. I have a few questions if anyone could answer them. - What would a usual convergence time be for this setup? - Is there anything I could do speed this process up? (I triedMultipath)- Any tips and tricks would be much appreciated Thanks in Advance -- Adam Kajtar Systems Administrator City of Wadsworth akajtar () wadsworthcity org ----------------------------------------------------- http://www.wadsworthcity.com Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter <https://twitter.com/CityOfWadsworth> *|* Instagram <https://www.instagram.com/cityofwadsworth/> *|* YouTube <https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>-- Adam Kajtar Systems Administrator, Safety Services City of Wadsworth Office 330.335.2865 Cell 330.485.6510 akajtar () wadsworthcity org ----------------------------------------------------- http://www.wadsworthcity.com Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter <https://twitter.com/CityOfWadsworth> *|* Instagram <https://www.instagram.com/cityofwadsworth/> *|* YouTube <https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>-- Adam Kajtar Systems Administrator, Safety Services City of Wadsworth Office 330.335.2865 Cell 330.485.6510 akajtar () wadsworthcity org ----------------------------------------------------- http://www.wadsworthcity.com Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter <https://twitter.com/CityOfWadsworth> *|* Instagram <https://www.instagram.com/cityofwadsworth/> *|* YouTube <https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>
-- Adam Kajtar Systems Administrator, Safety Services City of Wadsworth Office 330.335.2865 Cell 330.485.6510 akajtar () wadsworthcity org ----------------------------------------------------- http://www.wadsworthcity.com Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter <https://twitter.com/CityOfWadsworth> *|* Instagram <https://www.instagram.com/cityofwadsworth/> *|* YouTube <https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>
Current thread:
- Re: Juniper BGP Convergence Time, (continued)
- Re: Juniper BGP Convergence Time Aaron Gould (May 15)
- Re: Juniper BGP Convergence Time Kaiser, Erich (May 15)
- Re: Juniper BGP Convergence Time Ben Cannon (May 15)
- Re: Juniper BGP Convergence Time Adam Kajtar (May 16)
- Re: Juniper BGP Convergence Time Kaiser, Erich (May 16)
- Re: Juniper BGP Convergence Time Adam Kajtar (May 16)
- Re: Juniper BGP Convergence Time Mike Hammett (May 17)
- Re: Juniper BGP Convergence Time Adam Kajtar (May 17)
- Re: Juniper BGP Convergence Time Hugo Slabbert (May 17)
- Re: Juniper BGP Convergence Time Kaiser, Erich (May 15)
- Re: Juniper BGP Convergence Time Baldur Norddahl (May 26)
- Re: Juniper BGP Convergence Time Adam Kajtar (May 30)
- Re: Juniper BGP Convergence Time Saku Ytti (May 31)
- Re: Juniper BGP Convergence Time Aaron Gould (May 15)
- Re: Juniper BGP Convergence Time Thomas Bellman (May 16)
- Re: Juniper BGP Convergence Time Aaron Gould (May 16)
- Re: Juniper BGP Convergence Time Eric Sieg (May 17)
- Re: Juniper BGP Convergence Time Phil Lavin (May 21)
- Re: Juniper BGP Convergence Time Adam Kajtar (May 16)
- Re: Juniper BGP Convergence Time Mark Tinka (May 22)
- Re: Juniper BGP Convergence Time Adam Kajtar (May 23)
- Re: Juniper BGP Convergence Time Vincent Bernat (May 24)
- Re: Juniper BGP Convergence Time Olivier Benghozi (May 24)