Educause Security Discussion mailing list archives

Re: GWU and content monitoring


From: Roger Safian <r-safian () NORTHWESTERN EDU>
Date: Wed, 19 Jul 2006 09:19:30 -0500

I'll add my "me to" wanting to hear how the appliances 
actually perform.  I have the same concerns about SSH/SSL.

I sent out an earlier review of some tools to help with 
this task...this set of reviews includes the latest version
of Spider.  I'd love feedback and suggestions for other 
tools to test.

--
 
Recently there has been renewed interest in searching the contents of 
files on machines for sensitive information, such as Social Security 
Numbers.  I have tested a few different tools, and wanted to share 
those results with you.
 
For this series of tests I used two different machines.  The first was a 
basic Windows machine that contained nothing except the files I needed 
to conduct this test and a fresh installation of Windows XP.  The second 
was also windows XP, but contained a loaded disk with almost 50 GB of data.  
This included several hundred thousand files of various types as well as a 
Eudora installation that had more than 500,000 messages.  On each machine 
I created test files that contained the following data.
 
Name                 Value                 Value Type

Joe User            123-56-6789            SSN
Jane Doe            123 45 6789            SSN
Maury Eal           1234-123456-12345      AmEx
Jock Strap          0987 098765 09876      AmEx
Amanda Huginkiss    1234-5678-1234-5678    Visa/MC
Ben Dover           0987 6543 0987 6543    Visa/MC
 
I saved this in the following formats; Word, Excel, Adobe PDF and text.  
I then zipped all of these into a single zip archive.
 
Note that running these tools can often be very CPU intensive.  Often they make 
the machine virtually unusable.  These tools can take many hours to run on a 
machine with a full load of data.  It was not unusual for a test to take more 
than 12 hours to complete on the loaded machine.  Scans typically went much 
faster, often in less than an hour on the basic machine.
 
These tests were conducted on a Windows system, and the tools tested were the 
Windows versions of the tools.  Many of these same tools are also available in 
a Unix version.  Using that version should allow these tools to work on many 
other machines including various versions of Unix and the Macintosh.
 
This test assumes that there is not always going to be a known text string to 
search for, and that you will be looking for random strings of numbers that 
could be SSN’s.  If you do have a known string to search for, you can simply
use the built in search feature in Windows to find the text.
 
Summary ­ Make sure that the tool you are using checks all the possible files 
that may contain sensitive data.  PDF’s and ZIP files can cause problems and 
are also potential files with the data you want to discover.  I would recommend 
that you use a combination of tools, perhaps Spider to find the files, and 
PowerGREP to examine them.  Expect that it will take at least a good day to 
collect and examine the data on a loaded machine.
 
Cornell Spider  - 2.1.9a
<http://www.cit.cornell.edu/computer/security/tools/>
 
This tool has been updated since the last test, and seems to work much better.  
The tool is easy to use, and comes preconfigured to looks for SSN and credit 
card numbers.  Cornell claims to have put some intelligence into the tool to 
reduce the number of false positives.  The results are put in a log file that 
contains the full file name of the file with the suspected sensitive data.  
While it still had a number of false positives, it does provide a good way to 
quickly determine where on the system you need to devote your time.  This tool 
did not find the data contained in the Adobe PDF file.  Even on the loaded 
machine the number of false positives was quite reasonable.  The downside to 
using this tool is that while it does provide you a list of files to look at, 
it does not tell you what data it found in that file.  This makes it a little 
more difficult to find the potential sensitive data when the file is very 
large, such as a 250MB Eudora email file.
 
DTSearch Desktop ­ 7.25
<http://www.dtsearch.com/>
 
This is a commercial tool, but you can download an evaluation version that 
will work for 30 days.  The nice thing about this tool is that it produces 
an indexed list of the data on your machine so that searches are much faster.  
The tool shows you both the names of the file as well as the text that was 
located in the file.  If you click on a line in the results it displays 
several additional lines from the file as well.  This takes a lot of the work 
out of determining if the report is valid.  The number of false positives was 
not unreasonable on the basic machine.  The tool simply would not work on the 
loaded machine, which is a serious deficiency.  I did a second test on the 
loaded machine only indexing the directory with my test data and that worked.  
If you intend to use this tool on a loaded machine you will likely need to 
develop your own methodology for systematically searching the entire machine 
for data.  While this tool did find the data in the Adobe PDF file, it was 
not very clear that it had.  For some reason it did not list the name or 
extension of the PDF files, it simply called them NAME.  I suspect it chose 
this because that was the first field in the file.
 
File Hunter 3.5.6.0
<http://www.filehunter.com/download.htm>
 
This is a shareware program that sounds promising but really isn’t.  The 
program does not appear to have been updated recently and the old graphics are 
very difficult to read.  It also does not appear to search any files other than 
.TXT files.  All that being said, it did find the test files very quickly and 
when you click on the results you do get a very easy to read display showing 
you the strings you were searching for.  This was the only tool I used that had 
different color codes for different matching strings.  In the end, there are 
better choices so I would not recommend this.

Google Desktop Search
<http://desktop.google.com/?promo=mp-gds-v1-1>
 
This is really no more effective than the built in Windows search feature, since 
it has no effective way to search for strings of digits.  The fact that it is 
indexed does make the searches fast, but that does not seem to warrant installing 
this tool.  My opinion is that unless this tool is already installed and you are 
looking for a specific string, you will get better results by using one or more of 
the other tools in this document.
 
PowerGREP ­ 3.2.2
<http://www.powergrep.com/download.html>
 
This is another commercial tool.  An evaluation download is available that will work 
for 15 days.  This tool worked well on both the basic machine as well as the loaded 
machine.  It has a very nice display that shows all the files containing the matching 
data and the matches are highlighted to make them easy to spot.  The number of false 
positives was not unreasonable on the basic machine.  On the loaded machine it did 
generate a number of false positives, but thanks to the way the data is displayed it 
was relatively easy to spot them.  One note is that by default the tool does not 
search hidden files.  In my case, the application data directory was marked as hidden, 
so PowerGREP did not search that directory.  There’s an option under preferences to 
change the default behavior so it will search hidden directories, and I would recommend 
you use that option.  I also had problems on the loaded machine until I changed the 
display option to “Do not show files or matches”.  You can change this later and view 
the results, so it’s not as bad as it sounds.
 
Windows Grep ­ 2.3.0.2269
<http://www.wingrep.com/download.htm>
 
This is a shareware program.  It works pretty well.  The results are displayed in a window 
and the matches are color coded so they are very easy to spot.  You can also click on the 
match and you will automatically open the file.  This program worked on both the basic and 
the loaded machine.  The downside is that it does not find strings contained in either PDF 
or .ZIP files.  While I like this tool, for its simple and easy to use features, it does not 
appear to be under active development and for me that is an area of concern.  If cost is a 
concern, this is likely to be your best choice.

-- 
Roger A. Safian 
r-safian () northwestern edu (email) public key available on many key servers.
(847) 491-4058   (voice)
(847) 467-6500   (Fax) "You're never too old to have a great childhood!"

Current thread: