Educause Security Discussion mailing list archives
Re: SSN file scanner (C source available)
From: Wyman Miles <wm63 () CORNELL EDU>
Date: Fri, 12 May 2006 12:19:33 -0400
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We toyed with various validation strategies but were left with the problem of parsing out the necessary strings. Ultimately, final analysis is left to the Mk I Eyeball, as it's never wrong. When spider started out, it's false positive rate was about half. Some simple pre-screening of files dropped that to less than 10%. Beyond that, it just isn't worth the programmatic effort. In most departments, a simple "grep -i .xls spider.log" is a silver bullet. Sadly, in some departments, simply clobbering all the Excel spreadsheets without bothering to spider is enough. Anyone have numbers on their efforts? War stories? My boss is all about the charts, graphs, and pretty pictures: - - Cornell Dept A volunteered 20 random Windows desktops with the assurance *none* had confidential data. We found SSNs and credit card numbers of 19 systems. - - Any given faculty desktop gives us a 50/50 chance of having hits. - - The above includes the probability of finding the SSN of the system's primary user. - - we've found SSNs and CC#s in just about every imaginable location. Excel, Word, various DBs, e-mail (*sigh*), browser caches (*bigger sigh*), Windows update uninstall directories (!!!), you name it. Nothing is immune. - - we had one student, when asked by his department (over e-mail, of course) for his SSN, replied, "I'm not comfortable sending that in e-mail. I'll come over." Spider found that in a Eudora mailbox alongside 100 of his peers who weren't so bright.
int validgroup(int area, int group) { int cur, even, under10; if (maxgroup[area] < 0) return FALSE; cur = maxgroup[area]; even = ((cur&1) == 0); under10 = (cur < 10); if (debug) fprintf(stderr, "Our SSN's area is %d and group is %d. " " max group for %d is %d\n", area, group, area, cur); if (!even && under10) { if (debug) fprintf(stderr, "group is odd and < 10\n"); // our group must therefore also be odd and < 10 if (group > cur) return FALSE; // range check return ((group&1) != 0) && (group < 10); } if (even && !under10) { if (debug) fprintf(stderr, "group is even and >= 10, " "which also allows odd and < 10\n"); // our group may be odd and < 10, or even and >= 10 // first range check: if (group > cur) return FALSE; // range check return (((group&1) != 0) && (group < 10)) || (((group&1) == 0) && (group >= 10)); } if (even && under10) { if (debug) fprintf(stderr, "group is even and < 10, " "which also allows even and >= 10, " "plus odd and < 10\n"); // only illegal group would be if odd and >= 10 (note reversed logic) return (!(((group&1) != 0) && (group >= 10))); } // group must be odd and >= 10. // All groups now allowed, modulo range check if odd && >= 10. if (debug) fprintf(stderr, "group is odd and >= 10, which means " "anything goes (but can be range checked " "if our group is also odd)\n"); if (((group&1) != 0) && (group >= 10) && (group > cur)) return FALSE; return TRUE; }I know we don't necessarily need to catch EVERY number for the exercise to be useful, but as long as people are working on custom tools, it might pay to be as accurate as possible. To be honest, our first pass will probably use simpler pattern matching to just get the thing done in a timely fashion, but I'd be interested in working out a complete set of expressions (incorporated with a Luhn check) to really get the best coverage. Hey, I'm about to start a CS PhD.. sounds like a project ;0Sounds like we found our volunteer to construct a 'best of breed' tool :-) Mind you I'm not sure if it would be enough to justify a PhD, unless standards have gone downhill a lot in recent years ;-) G
Wyman Miles Senior Security Engineer Cornell University, Ithaca, NY (607) 255-8421 -----BEGIN PGP SIGNATURE----- Version: Mulberry PGP Plugin v3.0 Comment: processed by Mulberry PGP Plugin iQA/AwUBRGS1lcRE6QfTb3V0EQJ/TgCglT/cg4R0OPQ1oKsGQdmsAKebj9sAn3Zi qQYbf01JaOgxqFvE/uJboKYu =v7L6 -----END PGP SIGNATURE-----
Current thread:
- SSN file scanner (C source available) Graham Toal (May 11)
- <Possible follow-ups>
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Roger Safian (May 12)
- Re: SSN file scanner (C source available) Graham Toal (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Steve Lovaas (May 12)
- Re: SSN file scanner (C source available) Gary Golomb (May 12)
- Re: SSN file scanner (C source available) Graham Toal (May 12)
- Re: SSN file scanner (C source available) Gary Golomb (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)