Educause Security Discussion mailing list archives

Re: SSN file scanner (C source available)


From: Wyman Miles <wm63 () CORNELL EDU>
Date: Fri, 12 May 2006 12:19:33 -0400

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

We toyed with various validation strategies but were left with the problem
of parsing out the necessary strings.  Ultimately, final analysis is left
to the Mk I Eyeball, as it's never wrong.

When spider started out, it's false positive rate was about half.  Some
simple pre-screening of files dropped that to less than 10%.  Beyond that,
it just isn't worth the programmatic effort.

In most departments, a simple "grep -i .xls spider.log" is a silver bullet.
Sadly, in some departments, simply clobbering all the Excel spreadsheets
without bothering to spider is enough.

Anyone have numbers on their efforts?  War stories?  My boss is all about
the charts, graphs, and pretty pictures:

- - Cornell Dept A volunteered 20 random Windows desktops with the assurance
*none* had confidential data.  We found SSNs and credit card numbers of 19
systems.

- - Any given faculty desktop gives us a 50/50 chance of having hits.

- - The above includes the probability of finding the SSN of the system's
primary user.

- - we've found SSNs and CC#s in just about every imaginable location.
Excel, Word, various DBs, e-mail (*sigh*), browser caches (*bigger sigh*),
Windows update uninstall directories (!!!), you name it.  Nothing is immune.

- - we had one student, when asked by his department (over e-mail, of
course)
for his SSN, replied, "I'm not comfortable sending that in e-mail.  I'll
come over."  Spider found that in a Eudora mailbox alongside 100 of his
peers who weren't so bright.



int validgroup(int area, int group)
{
  int cur, even, under10;
  if (maxgroup[area] < 0) return FALSE;

  cur = maxgroup[area];
  even = ((cur&1) == 0);
  under10 = (cur < 10);

  if (debug) fprintf(stderr, "Our SSN's area is %d and group is %d. "
                             " max group for %d is %d\n",
                             area, group, area, cur);

  if (!even && under10) {
    if (debug) fprintf(stderr, "group is odd and < 10\n");
    // our group must therefore also be odd and < 10
    if (group > cur) return FALSE; // range check
    return ((group&1) != 0) && (group < 10);
  }

  if (even && !under10) {
    if (debug) fprintf(stderr, "group is even and >= 10, "
                               "which also allows odd and < 10\n");
    // our group may be odd and < 10, or even and >= 10
    // first range check:
    if (group > cur) return FALSE; // range check
    return (((group&1) != 0) && (group < 10))
        || (((group&1) == 0) && (group >= 10));
  }

  if (even && under10) {
    if (debug) fprintf(stderr, "group is even and < 10, "
                               "which also allows even and >= 10, "
                               "plus odd and < 10\n");
    // only illegal group would be if odd and >= 10  (note reversed
logic)
    return (!(((group&1) != 0) && (group >= 10)));
  }

  // group must be odd and >= 10.
  // All groups now allowed, modulo range check if odd && >= 10.
  if (debug) fprintf(stderr, "group is odd and >= 10, which means "
                             "anything goes (but can be range checked "
                             "if our group is also odd)\n");
  if (((group&1) != 0) && (group >= 10) && (group > cur)) return FALSE;
  return TRUE;
}


I know we don't necessarily need to catch EVERY number for
the exercise to be useful, but as long as people are working
on custom tools, it might pay to be as accurate as possible.
To be honest, our first pass will probably use simpler
pattern matching to just get the thing done in a timely
fashion, but I'd be interested in working out a complete set
of expressions (incorporated with a Luhn check) to really get
the best coverage. Hey, I'm about to start a CS PhD.. sounds
like a project ;0

Sounds like we found our volunteer to construct a 'best of breed'
tool :-)  Mind you I'm not sure if it would be enough to justify
a PhD, unless standards have gone downhill a lot in recent years ;-)


G



Wyman Miles
Senior Security Engineer
Cornell University, Ithaca, NY
(607) 255-8421
-----BEGIN PGP SIGNATURE-----
Version: Mulberry PGP Plugin v3.0
Comment: processed by Mulberry PGP Plugin

iQA/AwUBRGS1lcRE6QfTb3V0EQJ/TgCglT/cg4R0OPQ1oKsGQdmsAKebj9sAn3Zi
qQYbf01JaOgxqFvE/uJboKYu
=v7L6
-----END PGP SIGNATURE-----

Current thread: