Security Basics mailing list archives

SQL Fuzzy Search for Illegal Content


From: eng.cortes () gmail com
Date: 26 Mar 2009 14:36:06 -0000

I have already posted this a few months ago, but this time I have gathered more information and I think this would be a 
bit less confusing than the
previous one:

I audited employees machines for 2 months looking for extensions like *.jpg, *.mp3, *.avi that could probably lead to 
illegal content. 

This work generated a huge database as an output (RESULT.mdb), with tons of user's filenames like:

ID | UserName | FileName
1 | smith | C:\Windows\sex.jpg
2 | smith | C:\Windows\pic1.jpg
3 | smith | C:\Windows\pic1.jpg
4 | john | C:\Windows\pic2.jpg
5 | hitler | C:\Windows\hate.avi
6 | hitler | C:\Windows\pic2.jpg

I wanna have a cool SQL query that when crossed against my DB would return me:

1 | smith | C:\Windows\sex.jpg
5 | hitler | C:\Windows\hate.avi

Furthermore, I'd love to have it getting similar results, like:

5 | hitler | C:\Windows\h4te.avi

For this, googling around I found out that there are fuzzy SQL queries which can match words with 85% similarity. How 
can this be done?

I also have a phrase list (blacklist.txt) with words related to "intolerance/pornography/intolerance/violence/illegal 
drugs", so that might be used as a "db column" pattern for the search.

So far, I did it using Access, with the "Like" statement:

SELECT RESULT.ID, RESULT.UserName, RESULT.FileName
FROM RESULT
WHERE ((

(RESULT.FileName) Like "*amateur*"
OR (RESULT.FileName) Like "*amatuer*"
:
:
tons of "Likes" with which of the words in my blacklist.txt file...
:
:
))
ORDER BY RESULT.UserName;

The output is too big, with lots of false-positives and that's why I'm writing you! :)

I'm quite a newbie concerning to database management, so any help on this matter would be most welcome!

Thanks
RC

------------------------------------------------------------------------
This list is sponsored by: InfoSec Institute

Learn all of the latest penetration testing techniques in InfoSec Institute's Ethical Hacking class. 
Totally hands-on course with evening Capture The Flag (CTF) exercises, Certified Ethical Hacker and Certified 
Penetration Tester exams, taught by an expert with years of real pen testing experience.

http://www.infosecinstitute.com/courses/ethical_hacking_training.html
------------------------------------------------------------------------


Current thread: