nanog mailing list archives

Re: What to expect after a cooling failure


From: "Tri Tran" <tritran () cox net>
Date: Wed, 10 Jul 2013 05:56:07 +0000

I have seen DDR2 RAM give random errors from inadequate cooling. The cabinets were stacked to the max with severs but 
the doors were not meshed. DDR2 run fairly hot, especially when all the banks are filled.
Tri Tran

-----Original Message-----
From: Jay Ashworth <jra () baylink com>
Date: Wed, 10 Jul 2013 00:04:23 
To: NANOG<nanog () nanog org>
Subject: Re: What to expect after a cooling failure

----- Original Message -----
From: "Erik Levinson" <erik.levinson () uberflip com>


For those who have gone through such events in the past, what can one
expect in terms of long-term impact...should we expect some premature
component failures? Does anyone have any stats to share?

If the HDDs were spinning while above rated maximum ambient intake temp,
*especially* if they're not *right out front in the intake path* (is
anything not built that way anymore?  Yeah; the back side of 45-drive
Supermicro racks, among other things), you should probably plan on doing
a preemptive replacement cycle, or at the very least, pay *very* close
attention to smartctld, and have a good stock of pre-trayed replacements.

Remember that you may fall in the RAID Hole if you wait for failures,
and hence lose data which isn't backed up anyway -- if more drives in a 
raid group fail *during rebuilds*, you're essentially screwed.

If your raid groups were properly dispersed across drive build dates, then
this will probably be *slightly* less dangerous, but still.

Also watch bearing-type fans.

Cheers,
-- jra
-- 
Jay R. Ashworth                  Baylink                       jra () baylink com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com         2000 Land Rover DII
St Petersburg FL USA               #natog                      +1 727 647 1274


Current thread: