Interesting People mailing list archives

IP: SHRINK AND IDENTIFY Edupage, February 1, 2002


From: David Farber <dave () farber net>
Date: Sat, 02 Feb 2002 11:04:42 -0500



Italian researchers at La Sapienza University say they have found
a novel use for the Gzip file compression program, which could
help match bits of texts with their true authors. La Sapienza
associate mathematics professor Emanuele Caglioti said the Gzip
program, which strips huge text files down to their most basic
components, can use common identifiers to link text pieces from
various sources. Gzip and other compression programs boil text
files down to key elements and write instructions on how to
reconstruct the complete document. Those instructions, or the
file's entropy, are nearly the same when from the same author,
La Sapienza researchers found. They compared 90 pieces from 11
authors using Gzip data and found the technique correctly
identified authors of works 93 percent of the time. Caglioti
said the method has applications for Web searching and has
already been applied to DNA sequencing. Gzip author Mark Adler,
however, is circumspect about Caglioti's claims and said that
further work should be done to test the theory using a much
broader pool of works and authors.
(ABC News, 30 January 2002)

For archives see:
http://www.interesting-people.org/archives/interesting-people/


Current thread: