nanog mailing list archives
RE: Amazon diagnosis
From: Robert Bonomi <bonomi () mail r-bonomi com>
Date: Sun, 1 May 2011 16:35:29 -0500 (CDT)
Subject: RE: Amazon diagnosis Date: Sun, 1 May 2011 12:50:37 -0700 From: George Bonser <gbonser () seven com> They apparently had a redundant primary network and, on top of that, a secondary network. The secondary network, however, did not have the capacity of the primary network. Rather than failing over from the active portion of the primary network to the standby portion of the primary network, they inadvertently failed the entire primary network to the secondary. This resulted in the secondary network reaching saturation and becoming unusable. There isn't anything that can be done to mitigate against human error. You can TRY, but as history shows us, it all boils down the human that implements the procedure. All the redundancy in the world will not do you an iota of good if someone explicitly does the wrong thing. ... This looks like it was a procedural error and not an architectural problem.
A sage sayeth sooth: "For any 'fool-proof' system, there exists a *sufficiently*determied* fool capable of breaking it." It would seem that the validity of that has just been re-confirmed. <wry grin> It is worthy of note that it is considerably harder to protect against accidental stupidity than it is to protect againt intentional malice. ('malice' is _much_ more predictable, in general. <wry grin>)
Current thread:
- Re: Amazon diagnosis, (continued)
- Re: Amazon diagnosis Phil Pierotti (May 03)
- Re: Amazon diagnosis Paul Graydon (May 02)
- Re: Amazon diagnosis Ryan Malayter (May 05)
- Re: Amazon diagnosis George Herbert (May 05)
- Re: Amazon diagnosis Jay Ashworth (May 05)
- Re: Amazon diagnosis Ryan Malayter (May 06)
- Re: Amazon diagnosis Jay Ashworth (May 06)
- RE: Amazon diagnosis Kenneth M. Chipps Ph.D. (May 06)
- Re: Amazon diagnosis Brett Frankenberger (May 01)
- RE: Amazon diagnosis Robert Bonomi (May 01)