nanog mailing list archives

NOC Automation / Best Practices

From: Charles N Wyble <charles () knownelement com>
Date: Wed, 08 Sep 2010 08:54:20 -0700

 NOGGERS,

The recent thread on ISP port blocking practice mentioned a way toidentify infected machines through a highly automated manner. This gotme thinking about other ways to automate aspects of network/systemoperations when it comes to tier-1 end user support (is it plugged in/isyour wireless working etc) and tier-2/3 NOC support (abusedesk/incident response/routing issues etc) .

I'm putting in a very high degree of monitoring/healing in place toreduce the amount of end user support calls that come in, and onlybother a human when it's a real issue.

I'm in the process of launching a small regional wireless ISP / addelivery network in Los Angeles CA. I have a small staff (I'm the onlyfull time engineer, I have a couple NOC techs and 1 help desk tech whowill provide escalation for any serious issues).


My initial thoughts/questions on the matter:

1) Are people integrating their PBX with their OSS/CRM systems? So whena call comes in the tech has all the relevant information? (perhaps eventhings like traceroute/port scan/AV/security health status based ontheir phone number or customer number?). This way if I take a useroffline because they are spewing spam/virii the tech can refer them toour IT support partner organization to clean up their PC. :)

2) What sort of automated alerting/reporting/circuit turn down/RADIUSlock out is done in regards to alerting customers or even taking themoffline when they have a security issue?

3) What are folks doing in terms of frontline offloading? Do you haveyour PBX set to play a different recording when you have an outage sothe NOC techs phones don't go crazy and leave them free to deal with theissue?


4) Your comments here. :)

The way I see it, an ounce of prevention is worth a pound of cure. Alongthose lines, I'm putting in some mitigation techniques are as follows(hopefully this will reduce the number of incidents and therefore callsto the abuse desk). I would appreciate any feedback folks can give me.


A) Force any outbound mail through my SMTP server with AV/spam filtering.

B) Force HTTP traffic through a SQUID proxy with SNORT/ClamAV running(several other WISPs are doing this with fairly substantial bandwidthsavings. However I realize that many sites aren't cache friendly. Anyoneknow of a good way to check for that? Look at HTTP headers?). Do thebandwidth savings/security checking outweigh the increased support callsdue to "broken" web sites?C) Force DNS to go through my server. I hope to reduce DNS hijackingattacks this way.


Thanks!

Current thread:

NOC Automation / Best Practices Charles N Wyble (Sep 08)
- Re: NOC Automation / Best Practices Dobbins, Roland (Sep 08)
- <Possible follow-ups>
- RE: NOC Automation / Best Practices Martin Hotze (Sep 08)
  - Re: NOC Automation / Best Practices Jared Mauch (Sep 08)
  - Re: NOC Automation / Best Practices khatfield (Sep 08)
  - RE: NOC Automation / Best Practices Nathan Eisenberg (Sep 08)
  - Re: NOC Automation / Best Practices Owen DeLong (Sep 08)