Educause Security Discussion mailing list archives
Re: GWU and content monitoring
From: Roger Safian <r-safian () NORTHWESTERN EDU>
Date: Wed, 19 Jul 2006 09:19:30 -0500
I'll add my "me to" wanting to hear how the appliances actually perform. I have the same concerns about SSH/SSL. I sent out an earlier review of some tools to help with this task...this set of reviews includes the latest version of Spider. I'd love feedback and suggestions for other tools to test. -- Recently there has been renewed interest in searching the contents of files on machines for sensitive information, such as Social Security Numbers. I have tested a few different tools, and wanted to share those results with you. For this series of tests I used two different machines. The first was a basic Windows machine that contained nothing except the files I needed to conduct this test and a fresh installation of Windows XP. The second was also windows XP, but contained a loaded disk with almost 50 GB of data. This included several hundred thousand files of various types as well as a Eudora installation that had more than 500,000 messages. On each machine I created test files that contained the following data. Name Value Value Type Joe User 123-56-6789 SSN Jane Doe 123 45 6789 SSN Maury Eal 1234-123456-12345 AmEx Jock Strap 0987 098765 09876 AmEx Amanda Huginkiss 1234-5678-1234-5678 Visa/MC Ben Dover 0987 6543 0987 6543 Visa/MC I saved this in the following formats; Word, Excel, Adobe PDF and text. I then zipped all of these into a single zip archive. Note that running these tools can often be very CPU intensive. Often they make the machine virtually unusable. These tools can take many hours to run on a machine with a full load of data. It was not unusual for a test to take more than 12 hours to complete on the loaded machine. Scans typically went much faster, often in less than an hour on the basic machine. These tests were conducted on a Windows system, and the tools tested were the Windows versions of the tools. Many of these same tools are also available in a Unix version. Using that version should allow these tools to work on many other machines including various versions of Unix and the Macintosh. This test assumes that there is not always going to be a known text string to search for, and that you will be looking for random strings of numbers that could be SSNs. If you do have a known string to search for, you can simply use the built in search feature in Windows to find the text. Summary Make sure that the tool you are using checks all the possible files that may contain sensitive data. PDFs and ZIP files can cause problems and are also potential files with the data you want to discover. I would recommend that you use a combination of tools, perhaps Spider to find the files, and PowerGREP to examine them. Expect that it will take at least a good day to collect and examine the data on a loaded machine. Cornell Spider - 2.1.9a <http://www.cit.cornell.edu/computer/security/tools/> This tool has been updated since the last test, and seems to work much better. The tool is easy to use, and comes preconfigured to looks for SSN and credit card numbers. Cornell claims to have put some intelligence into the tool to reduce the number of false positives. The results are put in a log file that contains the full file name of the file with the suspected sensitive data. While it still had a number of false positives, it does provide a good way to quickly determine where on the system you need to devote your time. This tool did not find the data contained in the Adobe PDF file. Even on the loaded machine the number of false positives was quite reasonable. The downside to using this tool is that while it does provide you a list of files to look at, it does not tell you what data it found in that file. This makes it a little more difficult to find the potential sensitive data when the file is very large, such as a 250MB Eudora email file. DTSearch Desktop 7.25 <http://www.dtsearch.com/> This is a commercial tool, but you can download an evaluation version that will work for 30 days. The nice thing about this tool is that it produces an indexed list of the data on your machine so that searches are much faster. The tool shows you both the names of the file as well as the text that was located in the file. If you click on a line in the results it displays several additional lines from the file as well. This takes a lot of the work out of determining if the report is valid. The number of false positives was not unreasonable on the basic machine. The tool simply would not work on the loaded machine, which is a serious deficiency. I did a second test on the loaded machine only indexing the directory with my test data and that worked. If you intend to use this tool on a loaded machine you will likely need to develop your own methodology for systematically searching the entire machine for data. While this tool did find the data in the Adobe PDF file, it was not very clear that it had. For some reason it did not list the name or extension of the PDF files, it simply called them NAME. I suspect it chose this because that was the first field in the file. File Hunter 3.5.6.0 <http://www.filehunter.com/download.htm> This is a shareware program that sounds promising but really isnt. The program does not appear to have been updated recently and the old graphics are very difficult to read. It also does not appear to search any files other than .TXT files. All that being said, it did find the test files very quickly and when you click on the results you do get a very easy to read display showing you the strings you were searching for. This was the only tool I used that had different color codes for different matching strings. In the end, there are better choices so I would not recommend this. Google Desktop Search <http://desktop.google.com/?promo=mp-gds-v1-1> This is really no more effective than the built in Windows search feature, since it has no effective way to search for strings of digits. The fact that it is indexed does make the searches fast, but that does not seem to warrant installing this tool. My opinion is that unless this tool is already installed and you are looking for a specific string, you will get better results by using one or more of the other tools in this document. PowerGREP 3.2.2 <http://www.powergrep.com/download.html> This is another commercial tool. An evaluation download is available that will work for 15 days. This tool worked well on both the basic machine as well as the loaded machine. It has a very nice display that shows all the files containing the matching data and the matches are highlighted to make them easy to spot. The number of false positives was not unreasonable on the basic machine. On the loaded machine it did generate a number of false positives, but thanks to the way the data is displayed it was relatively easy to spot them. One note is that by default the tool does not search hidden files. In my case, the application data directory was marked as hidden, so PowerGREP did not search that directory. Theres an option under preferences to change the default behavior so it will search hidden directories, and I would recommend you use that option. I also had problems on the loaded machine until I changed the display option to Do not show files or matches. You can change this later and view the results, so its not as bad as it sounds. Windows Grep 2.3.0.2269 <http://www.wingrep.com/download.htm> This is a shareware program. It works pretty well. The results are displayed in a window and the matches are color coded so they are very easy to spot. You can also click on the match and you will automatically open the file. This program worked on both the basic and the loaded machine. The downside is that it does not find strings contained in either PDF or .ZIP files. While I like this tool, for its simple and easy to use features, it does not appear to be under active development and for me that is an area of concern. If cost is a concern, this is likely to be your best choice. -- Roger A. Safian r-safian () northwestern edu (email) public key available on many key servers. (847) 491-4058 (voice) (847) 467-6500 (Fax) "You're never too old to have a great childhood!"
Current thread:
- GWU and content monitoring Jeff Brainard (Jul 18)
- <Possible follow-ups>
- Re: GWU and content monitoring Gary Flynn (Jul 18)
- Re: GWU and content monitoring Gary Golomb (Jul 18)
- Re: GWU and content monitoring Gerry Sneeringer (Jul 19)
- Re: GWU and content monitoring Roger Safian (Jul 19)
- Re: GWU and content monitoring Gary Golomb (Jul 19)
- Re: GWU and content monitoring Randy Marchany (Jul 23)