Interesting People mailing list archives
IP: SHRINK AND IDENTIFY Edupage, February 1, 2002
From: David Farber <dave () farber net>
Date: Sat, 02 Feb 2002 11:04:42 -0500
Italian researchers at La Sapienza University say they have found a novel use for the Gzip file compression program, which could help match bits of texts with their true authors. La Sapienza associate mathematics professor Emanuele Caglioti said the Gzip program, which strips huge text files down to their most basic components, can use common identifiers to link text pieces from various sources. Gzip and other compression programs boil text files down to key elements and write instructions on how to reconstruct the complete document. Those instructions, or the file's entropy, are nearly the same when from the same author, La Sapienza researchers found. They compared 90 pieces from 11 authors using Gzip data and found the technique correctly identified authors of works 93 percent of the time. Caglioti said the method has applications for Web searching and has already been applied to DNA sequencing. Gzip author Mark Adler, however, is circumspect about Caglioti's claims and said that further work should be done to test the theory using a much broader pool of works and authors. (ABC News, 30 January 2002)
For archives see: http://www.interesting-people.org/archives/interesting-people/
Current thread:
- IP: SHRINK AND IDENTIFY Edupage, February 1, 2002 David Farber (Feb 02)