nanog mailing list archives

Re: Operate until failure


From: "Eric A. Hall" <ehall () ehsco com>
Date: Mon, 08 Jan 2001 15:49:15 -0800



One issue with highly redudandent data centers is the failure modes
are "interesting."  You don't want to shutdown due to a single UPS
failure, so you don't use something simple like PowerChute Plus. You
most likely don't want to shutdown based on any automatic signal.
However, you do want a way for an operator to gracefully shutdown a
lot of equipment quickly when the decision is made.

The old Deltec stuff was good about this. They had it so that a server
daemon would notify different groups at different stages.

        Power lost->notify group A (Printers, PCs)
        Low battery->notify group B (Secondary servers)
        Dead battery->notify group C (Primary servers, comms)

They also had different outlets on different "groups", so if a device
wasn't able to understand the network alert (the routers and firewalls
don't have agents), they could be terminated as a part of a group.

Deltec got bought by somebody and I'm sure a lot of this stuff has changed
since I last looked at it, but it was a good design.

-- 
Eric A. Hall                                        http://www.ehsco.com/
Internet Core Protocols          http://www.oreilly.com/catalog/coreprot/


Current thread: