Security Basics mailing list archives
SQL Fuzzy Search for Illegal Content
From: eng.cortes () gmail com
Date: 26 Mar 2009 14:36:06 -0000
I have already posted this a few months ago, but this time I have gathered more information and I think this would be a bit less confusing than the previous one: I audited employees machines for 2 months looking for extensions like *.jpg, *.mp3, *.avi that could probably lead to illegal content. This work generated a huge database as an output (RESULT.mdb), with tons of user's filenames like: ID | UserName | FileName 1 | smith | C:\Windows\sex.jpg 2 | smith | C:\Windows\pic1.jpg 3 | smith | C:\Windows\pic1.jpg 4 | john | C:\Windows\pic2.jpg 5 | hitler | C:\Windows\hate.avi 6 | hitler | C:\Windows\pic2.jpg I wanna have a cool SQL query that when crossed against my DB would return me: 1 | smith | C:\Windows\sex.jpg 5 | hitler | C:\Windows\hate.avi Furthermore, I'd love to have it getting similar results, like: 5 | hitler | C:\Windows\h4te.avi For this, googling around I found out that there are fuzzy SQL queries which can match words with 85% similarity. How can this be done? I also have a phrase list (blacklist.txt) with words related to "intolerance/pornography/intolerance/violence/illegal drugs", so that might be used as a "db column" pattern for the search. So far, I did it using Access, with the "Like" statement: SELECT RESULT.ID, RESULT.UserName, RESULT.FileName FROM RESULT WHERE (( (RESULT.FileName) Like "*amateur*" OR (RESULT.FileName) Like "*amatuer*" : : tons of "Likes" with which of the words in my blacklist.txt file... : : )) ORDER BY RESULT.UserName; The output is too big, with lots of false-positives and that's why I'm writing you! :) I'm quite a newbie concerning to database management, so any help on this matter would be most welcome! Thanks RC ------------------------------------------------------------------------ This list is sponsored by: InfoSec Institute Learn all of the latest penetration testing techniques in InfoSec Institute's Ethical Hacking class. Totally hands-on course with evening Capture The Flag (CTF) exercises, Certified Ethical Hacker and Certified Penetration Tester exams, taught by an expert with years of real pen testing experience. http://www.infosecinstitute.com/courses/ethical_hacking_training.html ------------------------------------------------------------------------
Current thread:
- SQL Fuzzy Search for Illegal Content eng . cortes (Mar 26)