nanog mailing list archives

Re: Mitigating human error in the SP


From: David Hiers <hiersd () gmail com>
Date: Tue, 2 Feb 2010 18:38:40 -0800

If your manager pretends that they can manage humans without a few
well-worn human factor books on their shelf, quit.




David









On Tue, Feb 2, 2010 at 5:36 PM, Michael Dillon
<wavetossed () googlemail com> wrote:
The actual error happened when someone was troubleshooting a turn-up,
where in the past the customer in question has had their ethertype set
wrong.  It wasn't a provisioning problem as much as someone
troubleshooting why it didn't come up with the customer.  Ironically,
the NOC was on the phone when it happened, and the switch was rebooted
almost immediately and the outage lasted 5 minutes.

This is why large operators have a "ready for service" protocol. The customer
is never billed until it is officially RFS, and to make it RFS requires more
than an operational network, it also requires the customer to agree in writing
that they have a fully functional connection.

This is another way of hiding human error, because now the up-down-up is
just part of the provisioning process. There is a record of the RFS date-time
so if the customer complains about an outage BEFORE that point, they can
be politely reminded that when RFS happened and that charging does not
start until AFTER that point.

--Michael Dillon




Current thread: