nanog mailing list archives

Re: Peering/Transit eBGP sessions -pet or cattle?


From: Lukas Tribus <lists () ltri eu>
Date: Tue, 11 Feb 2020 00:33:37 +0100

Hello Baldur,


On Mon, 10 Feb 2020 at 19:57, Baldur Norddahl <baldur.norddahl () gmail com> wrote:
Many dual homed companies may start out with two routers and two
transits but without dual links to each transit, as you describe
above. That will cause significant disruption if one link goes
down. It is not just about convergence between T1 and T2 but for
a major part of the internet. Been there, done that, yes you can
be down for up to several minuttes before everything is normal
again. Assume tier 1 transits and that contact to T1 was lost.
This means T1 will have a peering session with T2 somewhere,
but T1 will not allow peer to peer traffic to go via that link.
All those peers will need to search for a different way to reach
you, a way that does not transit T1 (unless they have a contract
with T1).

Therefore, if being down for several minutes is not ok, you
should invest in dual links to your transits. And connect those
to two different routers. If possible with a guarantee the
transits use two routers at their end and that divergent fiber
paths are used etc.

That is not my experience *at all*. I have always seen my prefixes
converge in a couple of seconds upstream (vs 2 different Tier1's).
That is with a double-digit number of announcements. Maybe if you
announce tens of thousands of prefixes as a large Tier 2, things are
more problematic, that I can't tell. Or maybe you hit some old-school
route dampening somewhere down the path. Maybe there is another reason
for this. But even if 3 AS hops are involved I don't really understand
how they would spend *minutes* to converge after receiving your BGP
withdraw message.

When I saw *minutes* of brownouts in connectivity it was always
because of ingress prefix convergence (or the lack thereof, due to
slow FIB programing, then temporary internal routing loops, nasty
things like that, but never external).

I agree there are a number of reasons (including best convergence) to
have completely diversified connections to a single transit AS.
Another reason is that when you manually reroute traffic for a certain
AS path (say transit 2 has an always congested PNI towards a third
party ASN), you may not have an alternative to the congested path when
you other transit provider goes away. But I never saw minutes of
brownout because of upstream -> downstream -> downstream convergence
(or whatever the scenario looks like).


lukas


Current thread: