nanog mailing list archives

Re: Mitigating human error in the SP


From: Larry Sheldon <LarrySheldon () cox net>
Date: Tue, 02 Feb 2010 09:44:08 -0600

On 2/2/2010 6:26 AM, gb10hkzo-nanog () yahoo co uk wrote:



Otherwise, as Suresh notes, the only way to eliminate human error
completely is to eliminate the presence of humans in the
activity.
and,hence by reference.....
Automated config deployment / provisioning.

That's the funniest thing I've read all day... ;-)

A little pessimistic rant.... ;-)

Who writes the scripts that you use, who writes the software that you
use ?    There will always be at least one human somewhere, and where
there's a human writing software tools, there's scope for bugs and
unexpected issues.  Whether inadvertent or not, they will always be
there.

If the excrement is going to hit the proverbial fan, try as you might
to stop it, it will happen.  Nothing in the IT / ISP / Telco world is
ever going to be perfect, far too complex with many dependencies.
Yes you might play in your perfect little labs until the cows come
home ..... but there always has been and always will be an element of
risk when you start making changes in production.

Face it, unless you follow the rigorous change control and
development practices that they use for avionics or other high-risk
environments, you are always going to be left with some element of
risk.

How much risk your company is prepared to take is something for the
men in black (suits) to decide because it correlates directly with
how much $$$ they are prepared to throw your way to help you mitigate
the risk .....;-)

That's my 2<insert_currency>  over ...... thanks for listening (or
not !).... ;-)

Add to that the stuff that always sounds like a cop-out, even tom the victims--the "human error" made by people not on you payroll, the vendors that are responsible for the misleading (or absent) documentation, for the CLI stuff that doesn't work just the way a reasonable person would expect it too, for the hardware that fails dirty, and on and on--a very long list. Exacerbated by management that cheaps out on equipment, software, documentation, training, and staff.

Even with a lab with a rich fabric of equipment, there will be most of the other things to contend with.

A reasonable and competent management will not only provide what is needed for a reasonable error rate (which indeed can approach one over 5 nines) but will also provide the means of recovery when the inevitable happens. That might involve "needless" expense like additional staff, redundant equipment, alternate paths, ...

But it won't involve whippings until the morale improves or reductions in staff and funding until the errors go away.


--
"Government big enough to supply everything you need is big enough to
take everything you have."

Remember:  The Ark was built by amateurs, the Titanic by professionals.

Requiescas in pace o email
Ex turpi causa non oritur actio
Eppure si rinfresca

ICBM Targeting Information:  http://tinyurl.com/4sqczs
http://tinyurl.com/7tp8ml
        


Current thread: