nanog mailing list archives
Re: Operate until failure
From: "Eric A. Hall" <ehall () ehsco com>
Date: Mon, 08 Jan 2001 15:49:15 -0800
One issue with highly redudandent data centers is the failure modes are "interesting." You don't want to shutdown due to a single UPS failure, so you don't use something simple like PowerChute Plus. You most likely don't want to shutdown based on any automatic signal. However, you do want a way for an operator to gracefully shutdown a lot of equipment quickly when the decision is made.
The old Deltec stuff was good about this. They had it so that a server daemon would notify different groups at different stages. Power lost->notify group A (Printers, PCs) Low battery->notify group B (Secondary servers) Dead battery->notify group C (Primary servers, comms) They also had different outlets on different "groups", so if a device wasn't able to understand the network alert (the routers and firewalls don't have agents), they could be terminated as a part of a group. Deltec got bought by somebody and I'm sure a lot of this stuff has changed since I last looked at it, but it was a good design. -- Eric A. Hall http://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Current thread:
- Re: Operate until failure, (continued)
- Re: Operate until failure Andy Walden (Feb 24)
- Re: Operate until failure Shawn McMahon (Feb 24)
- Re: Operate until failure Shawn McMahon (Feb 24)
- Re: Operate until failure Paul Timmins (Feb 24)
- Re: Operate until failure Henry Yen (Feb 24)
- Re: Operate until failure David Lesher (Feb 24)
- Re: Operate until failure Sean Donelan (Feb 24)
- Re: Operate until failure David Lesher (Feb 24)
- Re: Operate until failure Dalvenjah FoxFire (Feb 24)
- Re: Operate until failure bmanning (Feb 24)
- Re: Operate until failure Eric A. Hall (Feb 24)
- Re: Operate until failure Josh Richards (Feb 24)
- Re: Operate until failure Bennett Todd (Feb 24)