nanog mailing list archives
RE: Revisiting the Aviation Safety vs. Networking discussion
From: "George Bonser" <gbonser () seven com>
Date: Thu, 24 Dec 2009 18:27:37 -0800
-----Original Message----- From: Dobbins, Roland On Dec 25, 2009, at 7:01 AM, Michael Dillon wrote:It would be interesting to see what others have to say about thisanswer. I think it's a pretty accurate summation of how these things work in a lot of big organizations, all over the world.
I think that one must keep in mind that there are two kinds of check-lists. There is a takeoff list where you can always choose to go back to the ramp and fly another day if something doesn't check out but there is a different priority when someone is already in the air and something goes wrong. You can't decide to land a different day. In that case you must rely on experience and knowledge to handle the situation as it presents itself. Sure, you can have some basic checks for things even in an emergency but you can't know how the problem is going to present itself ahead of time. In cases like that you have set of general parameters but the person "at the controls" needs to have leeway to both clearly identify the nature of the problem and mitigate the same if possible and that might include calling in some extra eyes in order to identify things that might be going on with applications or other devices that aren't specifically network gear. So you can put a lot of process around changes in advance but there isn't quite as much to manage incidents that strike out of the clear blue. Too much process at that point could impede progress in clearing the issue. Capt. Sullenberger did not need to fill out an incident report, bring up a conference bridge, and give a detailed description of what was happening with his plane, the status of all subsystems, and his proposed plan of action (subject to consensus of those on the conference bridge) and get approval for deviation from his initial flight plan before he took the required actions to land the plane as best as he could under the circumstances. And while that is a bit extreme in the sense of most networks in that lives are not often at stake, some concepts are the same (and there might be networks supporting various occupations on this planet where lives might actually be at stake in the case of a network failure during some sort of activity). One of the most efficient shops I worked in was when the production internet operation was owned by the engineering department. Corporate operations owned the internal corporate IT, but engineering owned the internet production data centers and network operations. If engineering released a code revision that blew up the network, the VP of Engineering was responsible for the entire picture, not just the software piece. Same is true where a networking change blew up the application. Having the responsibility for the entire "system" (software, hardware platforms, and networking) under the same organization resulted in a lot smoother operation without backbiting and greater access to and sharing of resources between the application engineers, the systems administrators, and the network engineers.
Current thread:
- Re: Revisiting the Aviation Safety vs. Networking discussion, (continued)
- Re: Revisiting the Aviation Safety vs. Networking discussion Randy Bush (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Eddy Martinez (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Randy Bush (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Eddy Martinez (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Jim Shankland (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion David Andersen (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Randy Bush (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Dave Israel (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Michael Dillon (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Dobbins, Roland (Dec 24)
- RE: Revisiting the Aviation Safety vs. Networking discussion George Bonser (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Dobbins, Roland (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Scott Howard (Dec 24)
- RE: Revisiting the Aviation Safety vs. Networking discussion George Bonser (Dec 25)
- RE: Revisiting the Aviation Safety vs. Networking discussion Vadim Antonov (Dec 25)
- RE: Revisiting the Aviation Safety vs. Networking discussion Mikael Abrahamsson (Dec 25)
- Re: Revisiting the Aviation Safety vs. Networking discussion Anton Kapela (Dec 25)
- RE: Revisiting the Aviation Safety vs. Networking discussion George Bonser (Dec 25)
- RE: Revisiting the Aviation Safety vs. Networking discussion Vadim Antonov (Dec 25)
- Re: Revisiting the Aviation Safety vs. Networking discussion Randy Bush (Dec 24)
- Re: Revisiting the Aviation Safety vs. Networking discussion Michael Sinatra (Dec 28)
- Re: Revisiting the Aviation Safety vs. Networking discussion Owen DeLong (Dec 28)