Educause Security Discussion mailing list archives

Re: Alerting tool


From: Mike Lococo <mike.lococo () NYU EDU>
Date: Tue, 27 Oct 2009 16:47:35 -0400

Peter Charbonneau wrote:
We have a few tools that page us on "down" events.  All of these down
events are hardware or service related (ping of the device, loss of
HTTPD service).

We recently went down this road and landed on Zabbix, which we've been
relatively happy with.  My particular unit is quite small, and we're
only monitoring 15 or 20 systems (mostly hosts, not network gear... but
we do some snmp devices like pdu's).  We wanted something with a
low-overhead to manage that would provide reasonable monitoring,
alerting, and reporting capabilities.  Things we liked about Zabbix:

* Zabbix has a robust collection framework (snmp, or agent-based for
hosts if you'd like the extra capabilities it provides, plus trivial
checks like pings or http service checks from the server).

* Flexible trigger and alert/action criteria.  The case you mentioned of
ensuring a minimum value change per time period would be easy to implement.

* Almost entirely managed via web-interface.  There's still a learning
curve, but I felt like Zabbix had a fairly low barrier to productivity
compared to other system monitoring systems I researched.  Between the
manual and a couple of web-tutorials I was on my feet in an afternoon or
two and performing useful tests.

* Adequate trending and reporting capabilities.  It provides a fault
dashboard and generates on the fly graphs for each monitored host.  Plus
you can construct "screens" to combine multiple graphs together.  All
this is configured via the web-gui, so it's easy to do but also somewhat
limited.  If you really need rich reporting, zabbix is not for you.  If
you need basic trending and visual diagnostics, it works quite well.

* Open source, packaged in the EPEL repository for RHEL5 and in
Ubuntu... so installation and updating is fairly straightforward (some
manual database config on initial install, but after that normal updates
should be seamless).

Nagios is pretty much the standard, but we felt like Zabbix had a lower
barrier to entry and does as much or more of what we were looking for.
For example, Nagios requires add-ons to provide the kind of reporting
capabilities Zabbix has.  Other packages we considered were OpenNMS and
Zenoss.

The one thing I haven't done with Zabbix is work with really big SNMP
trees.  It might be a giant pain to map the OID's for each device type,
or there might be some automatic/template method.  We're pulling only a
few data types via SNMP and did the mappings manually.  I know OpenNMS
is supposed to be very strong on the SNMP side, but don't have much
personal experience with it.

Thanks,
Mike Lococo

Current thread: