nanog mailing list archives

Re: Operations task management software?


From: Lee <ler762 () gmail com>
Date: Wed, 27 Jul 2016 20:20:29 -0400

On 7/27/16, David Hubbard <dhubbard () dino hostasaurus com> wrote:
Full automation is planned but does not eliminate the need for the software.
 Zero human auditing of fully automated processes and data collection are
not acceptable to various certifying entities, the relevant auditors, the
inevitably involved lawyers, and won’t pick up on bad data, like a bad
thermometer or snmp counter that says a CRAC is 65 degrees when it’s really
90.  So I’m still going to need a management solution to the issue whether
it’s to tell someone to do the work or to tell someone to check the
automated work.

You have a ticketing system - right?  Create a cron job that creates a
ticket to check whatever.

Regards,
Lee



David

On 7/27/16, 7:19 PM, "Lee" <ler762 () gmail com> wrote:

    On 7/27/16, David Hubbard <dhubbard () dino hostasaurus com> wrote:
    > Hi all, curious if anyone has recommendations on software that helps
manage
    > routine duties assigned to operations staff?

    Have computers do the routine scut work - not people.

    > For example, let’s say we have a P&P that says someone from the netops
group
    > must check that Rancid is successfully backing up all router configs
    > bi-weekly.

    You've got the source code for rancid, so change rancid-run to do
something like
      LOGFILE=$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S`; export LOGFILE
    change the
      ) >$LOGDIR/$GROUP.`date +%Y%m%d.%H%M%S` 2>&1
    to
      ) >$LOGFILE 2>&1

    and then in control_rancid do something like
      grep "clogin error:" $LOGFILE | sort | uniq -c >$TMP.fail
      if [ -s $TMP.fail ]; then
         # got some output, mail the report
         ...

    Do the same type thing for checking on
    > backup failures, backup internet circuit status, out of band
interfaces, etc.

    Automate the checks, put the scripts in crontab & mail out an
    "OhNoes!" or "all clear" msg at the end.   At which point you're left
    with the problem of making sure the managers are looking at the emails
    & making sure whatever problems are found actually get fixed :)

    Regards,
    Lee





Current thread: