nanog mailing list archives

Re: [outages] News item: Blackberry services down worldwide, Egypt affected (not N.A.)


From: Tayeb Meftah <tayeb.meftah () gmail com>
Date: Wed, 12 Oct 2011 17:56:40 +0200

Idiotberry


Envoyé de mon iPhone

Le 12 oct. 2011 à 17:55, Charles Mills <w3yni1 () gmail com> a écrit :

+1
On Oct 12, 2011 11:51 AM, <Valdis.Kletnieks () vt edu> wrote:

On Wed, 12 Oct 2011 09:52:02 CDT, -Hammer- said:
What kills me is what they have told the public. The lost a "core
switch". I don't know if they actually mean network switch or not but
I'm pretty sure any of us that work on an enterprise environment know
how to factor N+1 just for these types of days. And then the backup
solution failed? I'm not buying it either.

Yeah, and that extra comma in the one config file that didn't make a
difference
when you tested the failover in the lab *never* makes a difference when it
hits
in the production network, right?  Or they changed the config of the
primary and
it didn't get propogated just right to the backup, or they had mismatched
firmware
levels on blades in the blades on the primary and backup switches, so
traffic that
didn't tickle a bug on the primary blades caused the blade to crash on the
backup,
or...

Anybody on this list who's been around long enough probably has enough "We
should have had N+2 because the N+1'th device failed too" stories to drain
*several* pitchers of beer at a good pub... I've even had one case where my
butt got *saved* from a ohnosecond-class whoops because the N+1'th device
*was*
crashed (stomped a config file, it replicated, was able to salvage a copy
from
a device that didn't replicate because it was down at the time).




Current thread: