Educause Security Discussion mailing list archives
Re: Spider scripts?
From: Curt Wilson <curtw () SIU EDU>
Date: Wed, 29 Apr 2009 16:29:04 -0500
Some comments in-line: Mike Lococo wrote:
We are currently working with Cornell's Spider to develop a process and tool for our IT techs and others to scan computers for confidential data (SSNs and CC#s). Has anyone refined the scripts that Spider uses to help lower the incidence of false positives? If so, would you be willing to share them with us? You can reply to me offline if you like. Thanks for your help in this.I haven't heard too many complaints about FP's recently, but the standard tuning list I've heard mentioned in the past is... 0) Utilize the "Mark as false positive" feature so bad hits don't show up in future scans. 1) Try Spider 4 (2008), it moves from a filetype blacklist to a filetype whitelist. By scanning only likely document types (word, excel, pdf, email, etc), FP's are cut way down.
Useful in a crunch, but we've seen sensitive data show up in unusual places, so you are going to be reducing visibility with this setting.
2) Scan the user's profile directory instead of the whole drive. You'll miss stuff stored in temp directories, but will cut down on FP's significantly. (Spider 2008 does this by default)
On windows boxen, we recommend a run of something like CCleaner first to dump the temp files before running our DataFind tool (a modified version of the useful Find_SSNs tool). The only downside here is that if some process the user performs normally places data in the temp files, they will be missed. But FP's are reduced.
3) If you have a common area prefix for your organization, require it. You might miss emails with 1-2 SSN's in them (if those 1-2 aren't from the right prefixes) but you'll find the big spreadsheets and 2000 SSN's in them (because at least some of those will be from the right prefixes). 4) Use a custom regex that requires delimiters. False positives will go WAY down, at the cost of missing undelimited strings which several folks have found to be very common. Older spiders had several regex options, some of which required delimiters and some of which didn't. The latest version seems to have standardized on matching with or without delimiters and offering no option for alternate behavior. Thanks, Mike Lococo
-- Curt Wilson SIUC IT Security Officer & Security Engineer
Current thread:
- Spider scripts? Theresa Semmens (Apr 29)
- <Possible follow-ups>
- Re: Spider scripts? Mike Lococo (Apr 29)
- Re: Spider scripts? Baumstein,Avi H (Apr 29)
- Re: Spider scripts? Sarazen, Daniel (Apr 29)
- Re: Spider scripts? Curt Wilson (Apr 29)
- Re: Spider scripts? Eric Case (Apr 29)
- Re: Spider scripts? Mike Lococo (Apr 29)
- Re: Spider scripts? Brad Judy (Apr 30)
- Re: Spider scripts? randy marchany (Apr 30)
- Re: Spider scripts? Mike Lococo (Apr 30)