nanog mailing list archives

Re: Out-of-band paging


From: Steve Gibbard <scg () gibbard org>
Date: Wed, 28 Jul 2010 09:54:29 -0700 (PDT)

On Wed, 28 Jul 2010, Joel M Snyder wrote:

But... you can take this sort of 'single point of failure' argument almost as far as you want. In the security business (where I spend most of my time), I see people do this a lot--they get deep into the ultra-ultra-ultra marginal risk, which takes then an enormous amount of money to mitigate. It's an easy rat hole to explore, and often fun.

I think people are getting lost in the weeds here, and confusing technologies with paths.

My current employer has been upgrading its transit circuits, and spent time in the last few months worrying about diversity of the transit paths. But we didn't insist that one provider come in via metro ethernet, one via SONET, and one via a GRE tunnel. What we did was have them bring in network maps, and make them sell us circuits that weren't running down the same streets as our other providers.

The same goes for your paging network. If it's running over IP, that's not a huge problem. If anything, if you're an IP engineer, it probably makes it easier for you to audit the setup. Where you do have a problem is if it's running over YOUR IP network, but that's just a more accute version of the problem you'd have if your paging company were using fiber along the same path as somebody you were buying fiber from.

So, for paging, or out of band management, or redundant capacity, the rules seem pretty simple. Buy from somebody who's not your customer. Audit whatever information you can get about their network paths to verify that they're not sharing segments with you. And, for good measure, have some backup plans in case the notifications don't work.

You probably are better off if you have humans in a NOC, rather than a purely automated alerting system. Those people can notice if you're not responding, and be creative. Maybe they can figure out how to fix problems themselves. If all else fails, they may be able to dispatch somebody to your house. Remember, organizations have been tracking down critical personnel for far longer than there have been telephones.

Or are people here worried about a scenario in which the entire world is run off of one big interconnected IP network, and that when it fails it's not only not possible to make a phone call, but also not possible to get across town to alert the people who could fix it? It seems to me that if things really got that bad, it might be pretty hard for even the most oblivious on-call person to miss.

-Steve


Current thread: