nanog mailing list archives

Re: Juniper BGP Convergence Time


From: Adam Kajtar <akajtar () wadsworthcity org>
Date: Wed, 30 May 2018 15:49:53 -0400

“I'm running two Juniper MX104s. Each MX has 1 ISP connected running
BGP(full routes). iBGP is running between the routers via a two port 20G
lag. When one of the ISPs fails, it can take upwards of 2 minutes for
traffic to start flowing correctly. The router has the correct route in the
routing table, but it doesn't install it in the forwarding table for the
full two mins.”



I finished my testing and concluded that I would continue running full
routes without any fanciness. I will detail some tests and what the
outcomes were as well as explain why I decided to keep running full routes.



*Receiving Full Routes*

Convergence time was 180 seconds. The routing table updated and showed the
correct path in under a minute but the forwarding table took 180 seconds
for most the routes to update.



*BGP Multipath*

There was no effect on convergence speed. I think paths between eBGP
neighbors are preferred over iBGP. Therefore, no routes are ever equal in
this case.



*BFD*

The slower to converge ISP refused my request to setup BFD between our
routers. This option is out of the question.



*BGP Timers*

I adjusted the BGP hold timer to 30 seconds and the stale route timer to 5
seconds. This change appeared to have no effect on convergence speed.



*Receiving Full Routes with a Default*

I suspected receiving a default route would fix the issue because the only
route that would need to be updated in the forwarding table for traffic to
flow. I assumed that it would process the lowest binary route first(
0.0.0.0/0) Once the full table was updated traffic would take the optimal
path(This would avoid customer complaints due to latency with VPNs and
Voice traffic). I also suspected exporting the default BGP default route
into OSPF would speed up OSPF convergences avoiding a generated default
route based on neighbor state.



Unfortunately, it appears like the forwarding table of the MX104 converges
abruptly instead of slowly as router processes them. Also, Traffic would
fail as the ISP connection came back up due to BGP exporting the route into
OSPF.



*Receiving Full Routes with forwarding engine commands*

After I completed the above tests, I concluded the forwarding engine would
need to speed up, and some sort of hack was in order. I tested the
following commands.



https://www.juniper.net/documentation/en_US/junos/topics/concept/use-case-for-bgp-pic-for-inet-inet6-lu.html



https://www.juniper.net/documentation/en_US/junos/topics/topic-map/forwarding-indirect-next-hop.html



With these commands enabled equal cost routes installed into the forwarding
table. Failover on equal cost routes was 40 – 50 seconds and 180 seconds on
non-equal-cost routes. This was unacceptable because most of the routes are
preferred out one ISP over the other.



I disabled ECMP and the router began installing all routes into the
forwarding table including the secondary route. The router would dump
sections of the forwarding table and act very flakey.





*Receiving Default Only*

I tested filtering out all routes besides the default route. The speed of
convergence was 30 - 45 seconds depending on which upstream ISP connection
I disconnected. This solution was unacceptable due to the traffic not
taking the optimal path outbound.



I concluded that 180 seconds was an acceptable failover time given that I
exhausted all other resources. I would prefer to have a more reliable
failover mechanism than a faster one. Also, everyday speed and usability
are more important that failover speed(which rarely happens and almost
never during peak hours) in my use case.



Thank you to anyone who gave me suggestions on this issue. It helped me
understand and accept the outcome.












On Sat, May 26, 2018 at 12:15 PM Baldur Norddahl <baldur.norddahl () gmail com>
wrote:

Add a static default route on both routers. This will be invalidated as
soon the interface goes down. Should be faster than relying on the BGP
process on withdrawing the route. Also does not require any config changes
at your upstreams.

Regards
Baldur


ons. 16. maj 2018 18.52 skrev Adam Kajtar <akajtar () wadsworthcity org>:

Erich,

Good Idea. I can't believe I didn't think of that earlier. Simple and
effective. I will go ahead and request the defaults from my ISP and
update
the thread of the findings.

Thanks!

On Wed, May 16, 2018 at 10:03 AM Kaiser, Erich <erich () gotfusion net>
wrote:

A last resort route (default route) could still be good to take from
your
ISP(s) even if you still do full routes, as the propagation is
happening
on
the internet side, you should at least have a path inbound through the
other provider.  The default route at least would send the traffic out
if
it does not see the route locally.  Just an idea.



On Wed, May 16, 2018 at 8:22 AM, Adam Kajtar <
akajtar () wadsworthcity org>
wrote:

I could use static routes but I noticed since I moved to full routes
I
have had a lot fewer customer complaints about latency(especially
when
it
comes to Voice and VPN traffic).

I wasn't using per-packet load balancing. I believe juniper default
is
per
IP.

My timers are as follows
 Active Holdtime: 90
 Keepalive Interval: 30

Would I be correct in thinking I need to contact my ISP to lower
these
values?

An interesting note is when I had both ISPs connected into a single
MX104
the failover was just a few seconds.

Thanks again.



On Tue, May 15, 2018 at 8:42 PM Ben Cannon <ben () 6by7 net> wrote:

Have you checked your timeouts ?

-Ben

On May 15, 2018, at 4:09 PM, Kaiser, Erich <erich () gotfusion net>
wrote:

Do you need full routes?  What about just a default route from
BGP?

Erich Kaiser
The Fusion Network
erich () gotfusion net
Office: 815-570-3101




On Tue, May 15, 2018 at 5:38 PM, Aaron Gould <aaron1 () gvtc com>
wrote:

You sure it doesn't have something to do with 60 seconds * 3 =
180
secs of
BGP neighbor Time out before it believes neighbor is dead and
remove
routes
to that neighbor?

Aaron

On May 15, 2018, at 9:10 AM, Adam Kajtar <
akajtar () wadsworthcity org

wrote:

Hello:

I'm running two Juniper MX104s. Each MX has 1 ISP connected
running
BGP(full routes). iBGP is running between the routers via a two
port
20G
lag. When one of the ISPs fails, it can take upwards of 2
minutes
for
traffic to start flowing correctly. The router has the correct
route
in
the
routing table, but it doesn't install it in the forwarding table
for
the
full two mins.

I have a few questions if anyone could answer them.

 - What would a usual convergence time be for this setup?
 - Is there anything I could do speed this process up? (I tried
Multipath)
 - Any tips and tricks would be much appreciated

Thanks in Advance
--
Adam Kajtar
Systems Administrator
City of Wadsworth
akajtar () wadsworthcity org
-----------------------------------------------------
http://www.wadsworthcity.com

Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter
<https://twitter.com/CityOfWadsworth> *|* Instagram
<https://www.instagram.com/cityofwadsworth/> *|* YouTube
<https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>





--
Adam Kajtar
Systems Administrator, Safety Services
City of Wadsworth
Office 330.335.2865
Cell 330.485.6510
akajtar () wadsworthcity org
-----------------------------------------------------
http://www.wadsworthcity.com

Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter
<https://twitter.com/CityOfWadsworth> *|* Instagram
<https://www.instagram.com/cityofwadsworth/> *|* YouTube
<https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>




--
Adam Kajtar
Systems Administrator, Safety Services
City of Wadsworth
Office 330.335.2865
Cell 330.485.6510
akajtar () wadsworthcity org
-----------------------------------------------------
http://www.wadsworthcity.com

Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter
<https://twitter.com/CityOfWadsworth> *|* Instagram
<https://www.instagram.com/cityofwadsworth/> *|* YouTube
<https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>




-- 
Adam Kajtar
Systems Administrator, Safety Services
City of Wadsworth
Office 330.335.2865
Cell 330.485.6510
akajtar () wadsworthcity org
-----------------------------------------------------
http://www.wadsworthcity.com

Facebook <http://www.facebook.com/cityofwadsworth>* |* Twitter
<https://twitter.com/CityOfWadsworth> *|* Instagram
<https://www.instagram.com/cityofwadsworth/> *|* YouTube
<https://www.youtube.com/channel/UCymlH-AZgvxTaHtgp3-AmDQ>


Current thread: