nanog mailing list archives
Re: Monitoring system recommendation
From: Mikael Falkvidd <mikael.falkvidd () op5 com>
Date: Tue, 7 Jun 2016 09:42:12 +0200
On Monday, June 6, 2016, Manuel Marín <mmg () transtelco net> wrote:Dear Nanog community We are currently planning to upgrade our monitoring system (Opsview) duetoscalability issues and I was wondering what do you recommend formonitoring5000 hosts and 35000 services. We would like to use a monitoring system that is compatible with the nagios plugin format, however we are not sure if systems like Icinga/Shinken/Op5 are the way to go. Is someone using systems like Op5 or Icinga2 for monitoring > 5000 hosts? Would you recommend commercial systems like Sevone, Zabbix, etc insteadofopen source ones?
We (op5) have customers running > 50,000 hosts and > 300,000 services. So 5,000 hosts is generally not a problem. As mentioned by Jeff, the forking model *can* become a problem. Small binaries that don't load a lot of libraries fork pretty fast. A test we made some time ago showed a 15 minute load peak at 3.89 (on 24 cores/hyperthreads) when checking 100,000 services every 5 minutes. Check latencies were 0.8 seconds max and 0.002 seconds avg. Average cpu load was 15%. Specs for the machine used: Dell PowerEdge R620 2x Intel Xeon E5-2620 24 GB ram Dell PERC H710 hardware RAID card RAID10 on 4x300GB 15kRPM SAS drives So a single (now almost vintage) server can handle 300 plugin executions per second without breaking a sweat. Scaling up is definitely a possibility, but scaling out (using mod gearman, mk or merlin, all open source) is available as well. Complex plugins, for example check_vmware_api which loads the large VMware perl SDK can get you in trouble though. I suggest you run a test with the plugin mix you are planning to use. If scaling out is not an option, and you want to stay in the nagios/naemon world, a custom worker can be developed to get rid of the loading overhead. Documentation is available at http://www.naemon.org/documentation/developer/workers.html Full disclosure: I work as development team lead at op5 best regards Mikael Falkvidd
Current thread:
- Monitoring system recommendation Manuel Marín (Jun 06)
- Re: Monitoring system recommendation Mark Felder (Jun 06)
- Re: Monitoring system recommendation Jeff Gehlbach (Jun 06)
- RE: Monitoring system recommendation Raymond Burkholder (Jun 06)
- Re: Monitoring system recommendation Matthew Pounsett (Jun 06)
- Re: Monitoring system recommendation Andrew Kirch (Jun 06)
- Re: Monitoring system recommendation Guillaume Tournat (Jun 06)
- Re: Monitoring system recommendation Mikael Falkvidd (Jun 07)
- Re: Monitoring system recommendation Crier, Brent (Jun 07)
- Re: Monitoring system recommendation Mike Hammett (Jun 07)
- Re: Monitoring system recommendation Jeff (Jun 09)
- Re: Monitoring system recommendation Dan Lacey (Jun 09)
- Re: Monitoring system recommendation Mark Felder (Jun 06)