nanog mailing list archives

Re: San Francisco Power Outage


From: Stephen Wilcox <steve.wilcox () packetrade com>
Date: Wed, 25 Jul 2007 12:04:17 +0100


On Tue, Jul 24, 2007 at 11:57:37PM +0000, Paul Vixie wrote:

sethm () rollernet us (Seth Mattinen) writes:

I have a question: does anyone seriously accept "oh, power trouble" as a 
reason your servers went offline? Where's the generators? UPS? Testing 
said combination of UPS and generators? What if it was important? I 
honestly find it hard to believe anyone runs a facility like that and 
people actually *pay* for it.

If you do accept this is a good reason for failure, why?

sometimes the problem is in the redundancy gear itself.  PAIX lost power
twice during its first five years of operation, and both times it was due
to faulty GFI in the UPS+redundancy gear.  which had passed testing during
construction and subsequently, but eventually some component just wore out.

I had an issue with exactly that 7 or 8 years ago at Via Networks.. the switchover gear shorted and died horrifically 
leading to an outage that lasted well through the night (something like 16hours in total). Being on a Friday evening it 
was difficult to get people on site promptly.

The lesson learned was 'the big switch' .. a huge thing that took the weight of two adults to move it, but did mean 
that should something similar occur we could transfer the whole building power manually directly to the generator.

I doubt such a beast would scale to the power loads on a large datacentre tho, but then they are generally not on a 
single grid/UPS feed.

Steve


Current thread: