nanog mailing list archives
NOC Automation / Best Practices
From: Charles N Wyble <charles () knownelement com>
Date: Wed, 08 Sep 2010 08:54:20 -0700
NOGGERS,The recent thread on ISP port blocking practice mentioned a way to identify infected machines through a highly automated manner. This got me thinking about other ways to automate aspects of network/system operations when it comes to tier-1 end user support (is it plugged in/is your wireless working etc) and tier-2/3 NOC support (abuse desk/incident response/routing issues etc) .
I'm putting in a very high degree of monitoring/healing in place to reduce the amount of end user support calls that come in, and only bother a human when it's a real issue.
I'm in the process of launching a small regional wireless ISP / ad delivery network in Los Angeles CA. I have a small staff (I'm the only full time engineer, I have a couple NOC techs and 1 help desk tech who will provide escalation for any serious issues).
My initial thoughts/questions on the matter:1) Are people integrating their PBX with their OSS/CRM systems? So when a call comes in the tech has all the relevant information? (perhaps even things like traceroute/port scan/AV/security health status based on their phone number or customer number?). This way if I take a user offline because they are spewing spam/virii the tech can refer them to our IT support partner organization to clean up their PC. :)
2) What sort of automated alerting/reporting/circuit turn down/RADIUS lock out is done in regards to alerting customers or even taking them offline when they have a security issue?
3) What are folks doing in terms of frontline offloading? Do you have your PBX set to play a different recording when you have an outage so the NOC techs phones don't go crazy and leave them free to deal with the issue?
4) Your comments here. :)The way I see it, an ounce of prevention is worth a pound of cure. Along those lines, I'm putting in some mitigation techniques are as follows (hopefully this will reduce the number of incidents and therefore calls to the abuse desk). I would appreciate any feedback folks can give me.
A) Force any outbound mail through my SMTP server with AV/spam filtering.B) Force HTTP traffic through a SQUID proxy with SNORT/ClamAV running (several other WISPs are doing this with fairly substantial bandwidth savings. However I realize that many sites aren't cache friendly. Anyone know of a good way to check for that? Look at HTTP headers?). Do the bandwidth savings/security checking outweigh the increased support calls due to "broken" web sites? C) Force DNS to go through my server. I hope to reduce DNS hijacking attacks this way.
Thanks!
Current thread:
- NOC Automation / Best Practices Charles N Wyble (Sep 08)
- Re: NOC Automation / Best Practices Dobbins, Roland (Sep 08)
- <Possible follow-ups>
- RE: NOC Automation / Best Practices Martin Hotze (Sep 08)
- Re: NOC Automation / Best Practices Jared Mauch (Sep 08)
- Re: NOC Automation / Best Practices khatfield (Sep 08)
- RE: NOC Automation / Best Practices Nathan Eisenberg (Sep 08)
- Re: NOC Automation / Best Practices Owen DeLong (Sep 08)