nanog mailing list archives

Re: What to expect after a cooling failure


From: Jay Ashworth <jra () baylink com>
Date: Wed, 10 Jul 2013 00:04:23 -0400 (EDT)

----- Original Message -----
From: "Erik Levinson" <erik.levinson () uberflip com>


For those who have gone through such events in the past, what can one
expect in terms of long-term impact...should we expect some premature
component failures? Does anyone have any stats to share?

If the HDDs were spinning while above rated maximum ambient intake temp,
*especially* if they're not *right out front in the intake path* (is
anything not built that way anymore?  Yeah; the back side of 45-drive
Supermicro racks, among other things), you should probably plan on doing
a preemptive replacement cycle, or at the very least, pay *very* close
attention to smartctld, and have a good stock of pre-trayed replacements.

Remember that you may fall in the RAID Hole if you wait for failures,
and hence lose data which isn't backed up anyway -- if more drives in a 
raid group fail *during rebuilds*, you're essentially screwed.

If your raid groups were properly dispersed across drive build dates, then
this will probably be *slightly* less dangerous, but still.

Also watch bearing-type fans.

Cheers,
-- jra
-- 
Jay R. Ashworth                  Baylink                       jra () baylink com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com         2000 Land Rover DII
St Petersburg FL USA               #natog                      +1 727 647 1274


Current thread: