nanog mailing list archives
RE: Followup British Telecom outage reason
From: Sean Donelan <sean () donelan com>
Date: Mon, 26 Nov 2001 06:28:22 -0500 (EST)
On Mon, 26 Nov 2001, Christian Kuhtz wrote:
Now, if lack of infrastructure realiability can harm human life you may feel differently, but that isn't the case for most of us at the present time.
I've designed software and networks used for public safety and emergencies. And yes, people have died on my watch. It is a somewhat different mindset, but not that different. A lot of "good engineering practice" applies to any engineering activity, including software engineering. Its not even a matter of cost. A typical hospital spends less on their emergency power system than a Internet/telco hotel. The major difference is the hospital staff knows (more or less) what to do when the generators don't work. The big secret is most "life safety" systems fail regularly. Most of the time it doesn't matter because the "big one" doesn't coincide with the failure.
Faults will happen. And nothing matters as much as how your prepare for when they do.
Mean Time To Repair is a bigger contributor to Availability calculations than the Mean Time To Failure. It would be great if things never failed. But some people are making their systems so complicated chasing the Holy Grail of 100% uptime, they can't figure out what happened when it does fail. Murphy's revenge: The more reliable you make a system, the longer it will take you to figure out what's wrong when it breaks.
Current thread:
- RE: Followup British Telecom outage reason, (continued)
- RE: Followup British Telecom outage reason Christian Kuhtz (Nov 28)
- RE: Followup British Telecom outage reason Patrick Greenwell (Nov 28)
- Re: Followup British Telecom outage reason Peter Galbavy (Nov 30)
- Re: Followup British Telecom outage reason Neil J. McRae (Nov 30)
- RE: Followup British Telecom outage reason Daniel Golding (Nov 26)
- RE: Followup British Telecom outage reason Deepak Jain (Nov 26)
- Re: Followup British Telecom outage reason Jesper Skriver (Nov 27)
- Re: Followup British Telecom outage reason Paul Vixie (Nov 25)
- RE: Followup British Telecom outage reason Christian Kuhtz (Nov 26)
- Re: Followup British Telecom outage reason Valdis . Kletnieks (Nov 26)
- RE: Followup British Telecom outage reason Sean Donelan (Nov 26)
- Re: Followup British Telecom outage reason Ian Duncan (Nov 26)
- RE: Followup British Telecom outage reason Alex Bligh (Nov 26)
- Re: Followup British Telecom outage reason Christopher A. Woodfield (Nov 26)
- Re: Followup British Telecom outage reason jerry scharf (Nov 26)
- Re: Followup British Telecom outage reason Christopher A. Woodfield (Nov 26)
- Re: Followup British Telecom outage reason Brett Frankenberger (Nov 26)
- Re: Followup British Telecom outage reason Ryan O'Connell (Nov 27)
- Re: Followup British Telecom outage reason Alex Bligh (Nov 26)
- Re: Followup British Telecom outage reason Paul Vixie (Nov 26)