nanog mailing list archives

NOC Automation / Best Practices


From: Charles N Wyble <charles () knownelement com>
Date: Wed, 08 Sep 2010 08:54:20 -0700

 NOGGERS,

The recent thread on ISP port blocking practice mentioned a way to identify infected machines through a highly automated manner. This got me thinking about other ways to automate aspects of network/system operations when it comes to tier-1 end user support (is it plugged in/is your wireless working etc) and tier-2/3 NOC support (abuse desk/incident response/routing issues etc) .

I'm putting in a very high degree of monitoring/healing in place to reduce the amount of end user support calls that come in, and only bother a human when it's a real issue.

I'm in the process of launching a small regional wireless ISP / ad delivery network in Los Angeles CA. I have a small staff (I'm the only full time engineer, I have a couple NOC techs and 1 help desk tech who will provide escalation for any serious issues).

My initial thoughts/questions on the matter:
1) Are people integrating their PBX with their OSS/CRM systems? So when a call comes in the tech has all the relevant information? (perhaps even things like traceroute/port scan/AV/security health status based on their phone number or customer number?). This way if I take a user offline because they are spewing spam/virii the tech can refer them to our IT support partner organization to clean up their PC. :)

2) What sort of automated alerting/reporting/circuit turn down/RADIUS lock out is done in regards to alerting customers or even taking them offline when they have a security issue?

3) What are folks doing in terms of frontline offloading? Do you have your PBX set to play a different recording when you have an outage so the NOC techs phones don't go crazy and leave them free to deal with the issue?

4) Your comments here. :)

The way I see it, an ounce of prevention is worth a pound of cure. Along those lines, I'm putting in some mitigation techniques are as follows (hopefully this will reduce the number of incidents and therefore calls to the abuse desk). I would appreciate any feedback folks can give me.

A) Force any outbound mail through my SMTP server with AV/spam filtering.
B) Force HTTP traffic through a SQUID proxy with SNORT/ClamAV running (several other WISPs are doing this with fairly substantial bandwidth savings. However I realize that many sites aren't cache friendly. Anyone know of a good way to check for that? Look at HTTP headers?). Do the bandwidth savings/security checking outweigh the increased support calls due to "broken" web sites? C) Force DNS to go through my server. I hope to reduce DNS hijacking attacks this way.

Thanks!



Current thread: