nanog mailing list archives

Re: Power cut if temps are too high


From: Haudy Kazemi <kaze0010 () umn edu>
Date: Mon, 27 May 2019 19:41:38 -0500

Where granular temperature readings are available to control scripts, it
would also be possible to implement something like the tiers described
below. Adjust thresholds as deemed appropriate for the facility and
equipment, and also for the expected rates of temperature rise. System
peformance throttling and/or quiescing may also be ways to reduce load (and
thus cooling requirements and heat build up rates) during periods of
reduced or completely lost cooling capacity).

1.) Elevated temperature watch at 77 F / 25 C. Send alerts to on-call staff
but take no other action.

2.) Elevated temperature warning at 81.5 F / 27.5 C. Begin performance
throttling and engage other measures to reduce heat buildup to compensate
for insufficient cooling capacity.

3.) Elevated temperature severe warning at 86 F / 30 C. Begin automated
clean system shutdowns.

4.) Critical temperature limit exceeded at 95 F / 35 C. Trigger EPO to
protect hardware.

On sensor redundancy: 3x or higher redundancy allows for voting methods to
be used to rule out potential false readings.

On series vs parallel wiring: either can be used...what makes most sense
depends on the design of the system being integrated with (basically NC vs
NO).



On Mon, May 27, 2019, 13:18 Mel Beckman <mel () beckman org> wrote:

We use Intermapper, an SNMP network monitoring system, which supports UNIX
scripting. Intermapper probes two Weathergoose temperature sensors, and
calls a script with the values it retrieves. When both sensors exceed a
certain threshold, the script sends an snmp relay trip signal to the
Weathergoosen, which close a pair of dry contacts wired in series to the
emergency power off contacts for the whole-room UPS.

We chose to use two sensors and two dry contact relays to protect against
false trips, and thus false shut downs. Before the trigger temperature is
reached, the NMS would have sent various escalating alarms to on call
staffers, who hopefully would intervene before this point. This protection
is for the worst case scenario where nobody responds and the equipment is
at risk of damage.

We could have commanded an orderly shut down to all servers, but decided
that it would be better to kill the power in the event of a runaway heat
vent than to try to make it through all the disk activity necessary for a
clean shut down.

This system has triggered one time, successfully shutting down the data
center on a holiday weekend when people missed their notifications, and
undoubtedly saved a lot of hard drives. When we got to the room the
temperature was over 115°, but the power was cut at 95°.

 -mel

On May 27, 2019, at 11:01 AM, Dovid Bender <dovid () telecurve com> wrote:

Hi,

Is anyone aware of a device that will cut the power if the room goes above
X degrees? I am looking for something as a just in case.

Regards,

Dovid



Current thread: