Educause Security Discussion mailing list archives
Re: SSN file scanner (C source available)
From: Steve Lovaas <steven.lovaas () COLOSTATE EDU>
Date: Fri, 12 May 2006 08:32:08 -0600
What a great series of posts! We're working on exactly the same thing at Colorado State. Guess most state legislatures are too :) Remember, as you think about reducing false positives and becoming more confident of your hits, the following details of numbering: 1) Valid SSNs never start with '8' (and those beginning with '9' are "Individual Taxpayer Identification Numbers" issued to foreign nationals and their dependents), so a regex ought to start with [0-7,9] at the very least. 2) Valid credit card numbers are not always 16 digits long. Diners Club are 13 digits, AMEX are 15 digits, most of the rest are 16 digits, although the numbering scheme allows for numbers as long as 19 digits. More detailed info: Several people have coded versions of the "Luhn algorithm" which checks that the number is a potentially valid credit card number. It's the algorithm used to determine the last digit of your credit card number, which is a checksum. One of the most interesting sources talking about this is at http://www.merriampark.com/anatomycc.htm, and there's also an article at http://javascript.about.com/library/blccard.htm - both include Javascript for automating the check. Basically, (from the merriampark article), "For a card with an even number of digits, double every odd numbered digit and subtract 9 if the product is greater than 9. Add up all the even digits as well as the doubled-odd digits, and the result must be a multiple of 10 or it's not a valid card. If the card has an odd number of digits, perform the same addition doubling the even numbered digits instead." Also, as long as I'm being pedantic, there are invalid SSN patterns that a truly comprehensive search would exclude (this wording from Wikipedia,http://en.wikipedia.org/wiki/Social_security_number, though the concept can be found lots of places): "Currently, a valid SSN cannot have the first three digits (the area number) above 772, the highest area number which the Social Security Administration has allocated. There are also special numbers which will never be allocated: * Numbers with all zeros in a digit group (000-xx-xxxx, xxx-00-xxxx, xxx-xx-0000). * Numbers of the form 666-xx-xxxx, probably due to the potential controversy (see Number of the Beast). Though the omission of this area number is not acknowledged by the SSA, it remains unassigned. * Numbers from 987-65-4320 to 987-65-4329 are reserved for advertising use." I know we don't necessarily need to catch EVERY number for the exercise to be useful, but as long as people are working on custom tools, it might pay to be as accurate as possible. To be honest, our first pass will probably use simpler pattern matching to just get the thing done in a timely fashion, but I'd be interested in working out a complete set of expressions (incorporated with a Luhn check) to really get the best coverage. Hey, I'm about to start a CS PhD... sounds like a project ;0 Thanks, Steve Lovaas Wyman Miles wrote:
At their heart, all of these tools are one flavor or another of pcregrep. A somewhat organized "find it and nuke it" movement has started at Cornell, where the departments are conducting periodic, organized searches for confidential data and either encrypting, moving, or removing it. What we're striving to build here are LAN-capable tools with centralized logging and unattended operation to support that effort. --On Friday, May 12, 2006 8:17 AM -0500 Roger Safian <r-safian () NORTHWESTERN EDU> wrote:If it's on any use, here's a post I made a while back to our local user group about looking for SSN's and credit card numbers using grep. --
-- ============================================================== Steven Lovaas, MSIA, CISSP Network & Security Resource Manager Academic Computing & Network Services Colorado State University 970-297-3707 Steven.Lovaas () ColoState EDU ==============================================================
Current thread:
- SSN file scanner (C source available) Graham Toal (May 11)
- <Possible follow-ups>
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Roger Safian (May 12)
- Re: SSN file scanner (C source available) Graham Toal (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)
- Re: SSN file scanner (C source available) Steve Lovaas (May 12)
- Re: SSN file scanner (C source available) Gary Golomb (May 12)
- Re: SSN file scanner (C source available) Graham Toal (May 12)
- Re: SSN file scanner (C source available) Gary Golomb (May 12)
- Re: SSN file scanner (C source available) Wyman Miles (May 12)