Full Disclosure mailing list archives
Re: Squashing supposed hacker profiling
From: "Steven Adair" <steven () securityzone org>
Date: Tue, 19 Jun 2007 10:30:30 -0400 (EDT)
Amazing, you were able to find multiple instances where a script-based gender guesser was wrong? This is more profound than the initial research itself. I suppose I could post a series of 10 writings where it was correct, but what would that prove? Did you try reading this from the same page: ----- A few quick notes: * The system generates a simple estimate (profiling). While Gender Guesser may be 60% - 70% accurate, it is not 100% accurate. This is better than random guessing (50%), but should not be interpreted as "fact". In particular, men should not be offended if it says you write like a girl. * People write differently in different forums. For example, a single writing sample may appear MALE for informal writing but test as FEMALE for formal writing. Be sure to interpret the results based on the appropriate writing style. (These notes, for example, are more informal/blog than formal/non-fiction.) * Many factors can impact the interpretation from any single person's writing. The content, knowledge of the material, age of the author, nationality, experience, occupation, and education level can all impact writing styles. For example, a woman who has spent 20 years working in a male-dominated field may write like her co-workers. Similarly, professional female writers (and experienced hobbyists) frequently use male writing styles. Gender Guesser does not take any of these factors into account. * Email can blur the lines between formal and informal writing styles. An informal email from a manager may have traces of formality, and a formal email from a 12-year-old is likely to be informal compared to a letter from a 40-year-old. Do not be surprised if email messages sent to public forums test incorrectly -- when writing for an audience, people commonly use informal words, phrases, and slang within a formal writing style. * Quotations, block quotes, and included text usually carries the gender from the initial author. Be sure to remove quoted text from any pasted content. Also, significant changes from a copy-editor can result in a different gender analysis. (A male editor may make a female author's news article appear MALE or as a Weak MALE.) * Lyrics, lists, poems, and prose are special writing styles. This tool is unlikely to classify these texts correctly. * The system needs a paragraph or two of text in order to observe word repetition. A good sample should have 300 words or more. Fewer words can lead to more variation in accuracy, and a single sentence is unlikely to generate an accurate result. Pasting the same text multiple times will not change the results! * People tend to write with consistent styles. If the system misclassifies a particular author, then other writings by the same author will likely be misclassify the same way. * And most importantly: This is an ESTIMATE. Please do not email me about instances where it made the wrong determination. (I've seen it generate incorrect results lots of times already.) ---- I can't tell if you're trolling or you have actually taken the bait. You do realize the person that you were responding to in earlier posts is not actually Neal Krawetz, right?
All female authors... Your so called gender guessing mechanism is flawed either way you want to cut it. You could try fuzzy math based on theories to profile anyone on this list, but unless you have feasible and PROVEN without reasonable doubt, its all a guessing game bottom line. Anyhow back to security, sociolinguistics is not meant for this list. According to Dr. Krawetz's Gender Guesser... (http://www.hackerfactor.com/GenderGuesser.html#Analyze) http://girlygeekdom.blogspot.com/ Genre: Informal Female = 104 Male = 602 Difference = 498; 85.26% Verdict: MALE Genre: Formal Female = 116 Male = 239 Difference = 123; 67.32% Verdict: MALE REALITY: WRONG http://www.darkreading.com/blog.asp?blog_sectionid=342&WT.svl=blogger1_5 Genre: Informal Female = 442 Male = 555 Difference = 113; 55.66% Verdict: Weak MALE Genre: Formal Female = 364 Male = 570 Difference = 206; 61.02% Verdict: MALE REALITY: WRONG http://invisiblethings.org/papers/joanna-talk_description-CCC04.txt Genre: Informal Female = 218v Male = 1186 Difference = 968; 84.47% Verdict: MALE Genre: Formal Female = 414 Male = 576 Difference = 162; 58.18% Verdict: Weak MALE REALITY: WRONG http://www.techsploitation.com/2007/05/31/what-the-hell-was-i-thinking-about-green-libertarians/ (text by Sue Lange) Genre: Informal Female = 210 Male = 481 Difference = 271; 69.6% Verdict: MALE Genre: Formal Female = 260 Male = 408 Difference = 148; 61.07% Verdict: MALE REALITY: WRONG http://thelizardqueen.wordpress.com/2005/06/08/a-thoroughly-and-utterly-girly-blog-post-sorry-4/ Genre: Informal Female = 415 Male = 559 Difference = 144; 57.39% Verdict: Weak MALE Genre: Formal Female = 180 Male = 312 Difference = 132; 63.41% Verdict: MALE REALITY: WRONG To be fair I had to go to the most feminine place I could think of, even then it was iffy. http://groups.ivillage.com/motherdaughter/ Genre: Informal Female = 226 Male = 337 Difference = 111; 59.85% Verdict: Weak MALE Genre: Formal Female = 326 Male = 314 Difference = -12; 49.06% Verdict: Weak FEMALE REALITY: MAYBE THE AUTHOR HERE WAS FLAMINGLY GAY -- ==================================================== J. Oquendo http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x1383A743 echo infiltrated.net|sed 's/^/sil@/g' "Wise men talk because they have something to say; fools, because they have to say something." -- Plato _______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
_______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/
Current thread:
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? coderman (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Dr. Neal Krawetz PhD (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Michael Silk (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? StaticRez (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Sam (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? scott (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? J. Oquendo (Jun 19)
- Squashing supposed hacker profiling J. Oquendo (Jun 19)
- Re: Squashing supposed hacker profiling Steven Adair (Jun 19)
- Re: Squashing supposed hacker profiling J. Oquendo (Jun 19)
- Re: Squashing supposed hacker profiling Valdis . Kletnieks (Jun 19)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Michael Silk (Jun 18)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? Dr. Neal Krawetz PhD (Jun 18)
- <Possible follow-ups>
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? jt5944-27a (Jun 19)
- Re: Dear Neal Krawetz, will the real n3td3v please stand up? jt5944-27a (Jun 19)