nanog mailing list archives

Re: DOs and DONTs for small ISP


From: William Herrin <bill () herrin us>
Date: Wed, 5 Jun 2019 11:45:03 -0700

On Wed, Jun 5, 2019 at 5:44 AM William Waites <ww () styx org> wrote:
It's not enough to have monitoring and a ticket system. You need to pay
attention to them, care for them and feed them. I can't count the number
of ticket systems full of ancient and irrelevant things or monitoring
systems that people have forgotten about or don't know how to add new
stuff to. Even the cycle of,

Some points to consider when monitoring your network:

1. Beware early automation. If you write a generator to go and monitor all
your stuff without addressing how operators will change things one-off
(which is hard to design well) the other operators will find the monitoring
system unusable. Which means they won't update it when stuff is added and
changed. Making it quickly useless.

2. Careful aggregating alarms. That big green or red light is useless. The
operator has to be able to start with the alarm and immediately trace back
to exactly what tests and results bubbled up in to the aggregate and from
there to the malfunctioning component. If you lose this information during
the aggregation process, you're just producing noise.

2. Every alarm must be actionable. When the light goes red, what -exactly-
do you want the operator to do as a result? Don't create an alarm until you
can offer a detailed and specific answer, and link that answer to the alarm
so the operator doesn't have to hunt for it.

Regards,
Bill Herrin

-- 
William Herrin
bill () herrin us
https://bill.herrin.us/

Current thread: