nanog mailing list archives

Re: Tornados in Ashburn (Equinix affected)


From: "Robert E. Seastrom" <rs () seastrom com>
Date: Sun, 19 Sep 2004 11:27:32 -0400



Sean Donelan <sean () donelan com> writes:

1) Good that they [seemed] to have maintained partial power.

It would be interesting to find out what happened to the two UPSes that
apparently failed.  Was it something that exceeded the design, i.e. a
lightning strike greater than X joules?  Or something else?  Equinix
tests the heck out of their systems, but there is always the potential
for a problem.

Where did you hear this?  If it was posted to NANOG, I missed it.

2) Good that they restored cooling [power to the blowers?] relatively
quickly. By the graph someone posted and their message, it looks like
their chillers were on an unaffected system, but their blowers weren't
[as in, were affected].

The initial spike looks normal, although a bit bigger than is comfortable.
Chiller plants and compressors take several minutes to reset and restart
when the backup generators come online.  The storm may have had some
impact on the recovery because the temperature appears to take a long time
to stabilize.

If this is to be expected and normal, then a statement to that effect
("Some customers may note a transient temperature spike of as much as
10 degrees C on their equipment due to designed-in characteristics of
an unplanned transfer of the chiller plant to backup power") in the
customer announcement would have gone a long way towards allaying
fears and creating positive spin.  A statement that the "chillers are
OK", when your inlet temperature has just spiked 9 degrees and is
currently sitting six degrees high is simply disingenuous.

Anyway, based on my information (including a couple of phone calls at
the time), suggesting that everything was nominal would be an overly
charitable assessment of the situation.

3) Good that they seemed to be able to bring together enough
knowledgeable folks quickly to resolve the problems that did occur
relatively quickly.

Yep, whatever the problem, restoration that quickly tends to indicate
their team was on the ball.  Stuff will always fail.  The real test is
how quickly is it fixed.

Absolutely.  In case it was not clear in my original message, let me
state for the record:

1) I don't have a problem with facilities being screwed up due to Acts
of God that are outside of the design parameters of the facility.  If
an Airbus on short final to Runway 19R at Dulles magically fell out of
the sky on top of Equinix, that would just be spectacularly bad luck,
not Equinix's fault.

1a) In the words of a friend of mine who grew up in Texas, regarding
tornadoes: "The odds of being in the path are actually quite low; the
consequences of being in the path are extremely high".  An F2 tornado,
while perhaps not impressive to our friends from the Great Plains,
is capable of causing substantial damage.

1b) No substitute for site diversity if your project is important
enough to justify the cost.

2) Under the circumstances, I think the Equinix staff did an excellent
job of bringing things under control quickly.  I'm sure glad this
happened during the day and not at night or on a weekend when due to
cost-cutting measures they have maybe one tech, two max, on duty.

3) I believe that the statements made by Equinix to its customers so
far, are outside the acceptable and expectable envelope of positive
spin to which Sean alluded in a previous message.  We're paying
customers, and when things go south we deserve frankness and full
disclosure, not a pep talk.

                                        ---Rob


Current thread: