Educause Security Discussion mailing list archives
Re: SSN file scanner (C source available)
From: Wyman Miles <wm63 () CORNELL EDU>
Date: Fri, 12 May 2006 10:17:41 -0400
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - --On Friday, May 12, 2006 8:51 AM -0500 Graham Toal <gtoal () UTPA EDU> wrote:
OK, you guys are just too great! Three more tools posted since yesterday and I like them all. I'ld like to add some comments:A tool that you can use to examine file contents is available by default on both Mac OS X and Unix systems. This tools is grep. There are versions of grep available for the PC as well.hooks? Maybe clamav, since it is already open source?
Possibly. Something we thought of, but never pursued.
Grep can use regular expressions to look for data within a file. The following strings when used in grep will find Social Security and credit card numbers. SSNs 123-45-6789 or 123 45 6789 [0-9][0-9][0-9]\-[0-9][0-9]\-[0-9][0-9][0-9][0-9]|[0-9][0-9][0 -9]\ [0-9][0-9]\ [0-9][0-9][0-9][0-9]...Please examine the contents of any files carefully. I know on my system, I found a file containing flow data that matched the social security number format. Just because you get a particular hit does not automatically mean the data is of concern.
Spider's approach to this problem (at least in the Linux variant) is to log roughly 1K of text on either side of the match. You can then visually inspect the log for false positives. Right now we do a home-grown encrypted syslog. We're headed toward HTTPS and XML. Both the win32 and linux flavors also exclude images, executables, encrypted files, and other things where the chance of a valid match is nearly zero and/or the chance of a false positive is high. [ Eudora's license key is a perfectly valid credit card number, damn them. Japanese phone numbers match \d{3}-\d{2}-\d{4}, etc. ]
That was why I wrote my hack. Searching by regular expression is a useful tool (it's what all three solutions posted do) but if you're just using a generic regexp without any special knowlege of the domain (eg doing a check-digit calculation on a credit card no, or a validation of an apparent SSN) the noise from these tools is going to flood you with data and make it hard to see the signal. (You avoided most of the noise by not allowing 9 consecutive digits as a pattern...)
We're using the regex from BleedingSnort, which realizes SSNs above 772 as first-three haven't been assigned. This cuts things down some, at the expense of the valuable bycatch that comes from us assigning 999- to international students. For credit card numbers, there are regexes that accurately identify Visa/MC/Disc (4xxx/5xxx/6011, etc)
One other observation: searching for a fixed pattern string can be done *much* faster than searching for an arbitrary regexp of indeterminate length. Even searching for multiple fixed pattern strings at once can be done pretty efficiently.
Precompiling the patterns, as perl will do, or our libPCRE addition to dd/dcfldd does, speeds things up considerably. Win32 spider, using the .NET regex matching, is the slowest of them all. Hunting through 90K files on my pokey laptop for the 4 I've baited only takes about 30 minutes, though.
It doesn't have to be a fixed string (like an A/V signature), just a fixed *pattern* (with wild-cards for individual characters)
Wyman Miles Senior Security Engineer Cornell University, Ithaca, NY (607) 255-8421 -----BEGIN PGP SIGNATURE----- Version: Mulberry PGP Plugin v3.0 Comment: processed by Mulberry PGP Plugin iQA/AwUBRGSZBcRE6QfTb3V0EQKztQCfR6uvOB+MysNSrIU1AgiXBvgAubwAmwbs 4FDT0ZL7wOPrN/GxueKnl887 =0pms -----END PGP SIGNATURE-----
Current thread:
- SSN file scanner (C source available) Graham Toal (May 11)
- <Possible follow-ups>
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Roger Safian (May 12)
- Re: SSN file scanner (C source available) Graham Toal (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Steve Lovaas (May 12)
- Re: SSN file scanner (C source available) Gary Golomb (May 12)
- Re: SSN file scanner (C source available) Graham Toal (May 12)
- Re: SSN file scanner (C source available) Gary Golomb (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)