nanog mailing list archives

Re: Simulated disaster exercise? Re: PAIX


From: "Stephen J. Wilcox" <steve () telecomplete co uk>
Date: Sun, 17 Nov 2002 01:57:52 +0000 (GMT)



On Sat, 16 Nov 2002, Sean Donelan wrote:


In the 1990's the MAEs and Gigaswitches would give us an unscheduled
failure of a major exchange point on a regular basis, which let us
demostrate our disaster recovery capabilities.  With the improved
reliability, i.e. the PAIXes haven't had a catastrophic failure, we
haven't had as many opportunities to demonstrate how well we can handle
a disaster at those locations.

Without creating an actual disaster, what if all the providers turned off
their BGP sessions with other providers at a PAIX (or Equinix or LINX or
where ever), both through the shared switch and private point-to-point
links, for an hour.  More than likely no one would notice, but then
we would have some hard data.  Individually providers have tested parts of
their own network, but I haven't heard of any coordinated efforts to test
recovery across all the service providers in a particular location.


The main problem will be coordination.. you need to get all providers to do this
in a tight slot of only one hour. And to make this a good test you need to
ensure that all the major players take part more so than the smaller ISPs. From
what I've seen its difficult enough to get ISPs to make config changes within a
window of a couple of weeks so you're gonna have a problem pulling this
together!

Also from what I've seen I'll think you'll find things have changed, reduced
budgets have forced compromises on redundancy and shutting down an exchange will
have a noticable impact to users in the region... you could argue this is all
the more reason to conduct these exercises!

Steve



Current thread: