Educause Security Discussion mailing list archives

Re: Spider scripts?


From: Curt Wilson <curtw () SIU EDU>
Date: Wed, 29 Apr 2009 16:29:04 -0500

Some comments in-line:

Mike Lococo wrote:
We are currently working with Cornell's Spider to develop a process
and tool
for our IT techs and others to scan computers for confidential data (SSNs
and CC#s).  Has anyone refined the scripts that Spider uses to help lower
the incidence of false positives?  If so, would you be willing to
share them
with us?  You can reply to me offline if you like.  Thanks for your
help in
this.

I haven't heard too many complaints about FP's recently, but the
standard tuning list I've heard mentioned in the past is...

0) Utilize the "Mark as false positive" feature so bad hits don't show
up in future scans.

1) Try Spider 4 (2008), it moves from a filetype blacklist to a filetype
whitelist.  By scanning only likely document types (word, excel, pdf,
email, etc), FP's are cut way down.

Useful in a crunch, but we've seen sensitive data show up in unusual
places, so you are going to be reducing visibility with this setting.


2) Scan the user's profile directory instead of the whole drive.  You'll
miss stuff stored in temp directories, but will cut down on FP's
significantly.  (Spider 2008 does this by default)

On windows boxen, we recommend a run of something like CCleaner first to
dump the temp files before running our DataFind tool (a modified version
of the useful Find_SSNs tool). The only downside here is that if some
process the user performs normally places data in the temp files, they
will be missed. But FP's are reduced.


3) If you have a common area prefix for your organization, require it.
You might miss emails with 1-2 SSN's in them (if those 1-2 aren't from
the right prefixes) but you'll find the big spreadsheets and 2000 SSN's
in them (because at least some of those will be from the right prefixes).

4) Use a custom regex that requires delimiters.  False positives will go
WAY down, at the cost of missing undelimited strings which several folks
have found to be very common.  Older spiders had several regex options,
some of which required delimiters and some of which didn't.  The latest
version seems to have standardized on matching with or without
delimiters and offering no option for alternate behavior.

Thanks,
Mike Lococo



--
Curt Wilson
SIUC IT Security Officer & Security Engineer

Current thread: