nanog mailing list archives

Re: Network Reliability Engineering


From: Ralph Doncaster <ralph () istop com>
Date: Sat, 18 May 2002 19:23:14 -0400 (EDT)


Good luck.  For a proper scientific analysis you'd need MTBF info on every
point of failure - i.e. the physical link, CSU/DSU, power supply, ...
As a rather non-scientific observation, a couple outages per year of 1-4
hours seems to be quite common for a single-homed T1 or faster connection,
be it from WorldCom, AT&T, Sprint...

I think the arguments in favor of dual-homing are pretty cut and
dry.  Tri-homing vs dual-homing would be a much tougher benefit to
quantify.

Ralph Doncaster
principal, IStop.com     
div. of Doncaster Consulting Inc.

On Sat, 18 May 2002, Pete Kruckenberg wrote:


I'm looking for some good reference materials to do some
"reliability engineering" calculations and projections.

This is to justify increased redundancy, and I want to
include quantifiable numbers based on MTBF data and other
reliability factors, kind of a scientific justification
instead of just the typical emotional appeal using
analyst/vendor FUD.

I'd appreciate references on how to do this in a network
environment (what data to collect, how to collect it, how to
analyze, etc). Also any data (or rules of thumb) on typical
MTBFs for network events that I won't find on vendor product
slicks (like what's the MTBF on IOS, or human-caused service
outages of various types, etc).

If someone has put together something remotely like this
that they'd care to share, that'd be incredibly helpful.

Thanks.
Pete.





Current thread: