nanog mailing list archives

Re: inter-domain link recovery


From: Joel Jaeggli <joelja () bogus com>
Date: Wed, 15 Aug 2007 01:32:21 -0700


Chengchen Hu wrote:
Thank you for your detailed explainaton. 

Just suppose no business fators (like multiple ASes belongs to a same ISP),  is it always possible for BGP to 
automatically find an alternative path when failure occurs if exist one? If not, what may be the causes? 

If you have multiple paths to a given prefix in your rib, you're going
to use the shortest one. If it's withdrawn you'll use the next shortest
one. If you have no paths remaining to that prefix, you can't forward
the packet anymore.

I think to look back at your original question. You're asking a specfic
question about the dec 06 earth quake outage... The best people to ask
why to took so long to restore are the operators who were most
dramatically affected.

The fact of the matter is most ISP's are not in the business of buying
more diversity than they think they need in order to insure business
continuity, support sla's and stay in business. The earthquake and
undersea landslide affected a number of fiber paths over a short period
of time.

I think it's fair to assume that a number of operators have updated to
their risk models to account for that sort of threat in the future. it's
to totally anticipate the threat of loosing ~80% of your fiber capacity
in a rather dense and well connected cooridor.

There were two talks on the subject of that particular event at the
first 07 nanog, you can peruse them here:

http://www.nanog.org/mtg-0702/topics.html

In particular the second talk discusses the signature of that outage in
the routing table in some detail.

C. Hu 





-------------------------------------------------------------
From: Roland Dobbins
Data: 2007-08-15 13:21:33
To: nanog
CC: 
Subject: Re: inter-domain link recovery



On Aug 14, 2007, at 9:06 PM, Chengchen Hu wrote:

1. Why BGP-like protocol failed to recover the path sometimes? Is  
it mainly because the policy setting by the ISP and network operators?

There are an infinitude of possible answers to these questions which  
have nothing to do with BGP, per se; those answers are very  
subjective in nature.  Can you provide some specific examples  
(citing, say, publicly-available historical BGP tables available from  
route-views, RIPE, et. al.) of an instance in which you believe that  
the BGP protocol itself is the culprit, along with the supporting  
data which indicate that the prefixes in question should've remained  
globally (for some value of 'globally') reachable?

Or are these questions more to do with the general provisioning of  
interconnection relationships, and not specific to the routing  
protocol(s) in question?

Physical connectivity to a specific point in a geographical region  
does not equate to logical connectivity to all the various networks  
in that larger region; SP networks (and customer networks, for that  
matter) are interconnected and exchange routing information (and, by  
implication, traffic) based upon various economic/contractual,  
technical/operational, and policy considerations which vary greatly  
from one instance to the next.  So, the assertion that there were  
multiple unaffected physical data links to/from Taiwan in the cited  
instance - leaving aside for the moment whether this was actually the  
case, or whether sufficient capacity existed in those links to  
service traffic to/from the prefixes in question - in and of itself  
has no bearing on whether or not the appropriate physical and logical  
connectivity was in place in the form of peering or transit  
relationships to allow continued global reachability of the prefixes  
in question.

2. What is the actions a network operator will take when such  
failures occures? Is it the case like that, 1)to find (a)  
alternative path(s); 2)negotiate with other ISP if need; 3)modify  
the policy and reroute the traffic. Which actions may be time  
consuming?

All of the above, and all of the above.  Again, it's very  
situationally dependent.

3. There may be more than one alternative paths and what is the  
criterion for the network operator to finally select one or some of  
them?

Proximate physical connectivity; capacity; economic/contractual,  
technical/operational, and policy considerations.

4. what infomation is required for a network operator to find the  
new route?

By 'find the new route', do you mean a new physical and logical  
interconnection to another SP?

The following references should help shed some light on the general  
principles involved:

<http://en.wikipedia.org/wiki/Peering>

<http://www.nanog.org/subjects.html#peering>

<http://www.aw-bc.com/catalog/academic/product/ 
0,1144,0321127005,00.html>

-----------------------------------------------------------------------
Roland Dobbins <rdobbins () cisco com> // 408.527.6376 voice

      Culture eats strategy for breakfast.

            -- Ford Motor Company





Current thread: