nanog mailing list archives

Re: Followup British Telecom outage reason


From: S <sdsolomo () iupui edu>
Date: Sat, 24 Nov 2001 22:20:12 -0500 (EST)



you are right.  But in an era of focusing on the box, vendors are
forgetting that solid software and knowledgable support are just as
important.   

Possibly slow down a bit on rolling all those new features and widgets 
into the software.... Make the software do what it should, reliably.. then
put the new stuff in there.

ie.. bug scrub a train, per chassis.  Make it solid.. then put the toyz
in.

These days you don't see boxes hitting the 1 year mark that often..  It is
usually interupted somewhere in the 20 week range with something beautiful
like, 

SBN uptime is 2 weeks, 4 days, 6 hours, 12 minutes
System returned to ROM by processor memory parity error at PC 0x607356A0,
address 0x0 at 21:02:01 UTC Tue Nov 6 2001

or

BMG uptime is 34 weeks, 3 hours, 44 minutes
System returned to ROM by error - a Software forced crash, PC 0x6047F3E8
at 18:28:17 est Sat Mar 31 2001

or

LVX uptime is 24 weeks, 1 day, 20 hours, 21 minutes
System returned to ROM by abort at PC 0x60527DD4 at 00:38:36 EST Fri Jun 8
2001

At least its not 0xDEADBEEF..     yet.


On Sat, Nov 24, 2001 at 02:16:38PM -0500, Sean Donelan wrote:
On Sat, 24 Nov 2001, Neil J. McRae wrote:

<snip>

      No vendor claims to have perfect software.  Nor will you find
anyone but the irresponsible vendor to suggest that any specific
image is "perfect".

<snip> 
      I'm sure that BT and Cisco have had some conversations about
what can be done to improve the testing that Cisco does to better
simulate their network at this time from such a public outage.

-- 
Jared Mauch  | pgp key available via finger from jared () puck nether net
clue++;      | http://puck.nether.net/~jared/  My statements are only mine.





Current thread: