Vulnerability Development mailing list archives

Re: Audio fingerprinting (was Re: hacksdmi?)


From: Geoff Schmidt <geoff () TUNEPRINT COM>
Date: Mon, 16 Oct 2000 05:11:02 -0400

Lincoln Yeoh wrote:
If you only need to write 32 bits into a 120 second long song it
doesn't look too bad. But once you start trying to shove PKI type
keys into it, woohoo. A bit a second at 90db S/N is probably
noticeable to the "Golden Ears" folks out there, so good luck
getting 10 bits a sec....

Another aspect to keep in mind is that the watermarking camp generally
tries to design their algorithms to permit the recovery of the
watermark data from *any* n-second segment of the track, where n is
typically less than, oh, say, five.

I think Shannon's famous formula will help tell you how much you
have to play with given an acceptable signal level for music. It
doesn't tell you how to do it, but at least it's easy to prove at
which point it becomes very difficult.

You actually have a lot more leeway than Shannon would lead you to
believe, because of psychoacoustic considerations: because of how the
'spectrum analyzer' in the ear works, at any given point in time there
are chunks of spectrum where human beings are very insensitive to
noise. (The location of these areas are a complicated and not fully
understood function of the sound.) And yes, the people who do this
kind of research use data from 'golden ears' :)

On the other hand, it struck me as I was writing this that in an
important way watermarking is harder than mp3 compression: gains in
mp3 compression (ie, file size reduction) come from two sources:
encoding multiple psychoacoutically identical (or similar) waveforms
as the same thing, and storing the signal in a more compact way that
uses fewer bytes. These map broadly to the lossy and non-lossy stages
of the algorithm, in that order, and the sum of these two effects is
the redundancy that a mp3 coder identifies and extracts from the
waveform.

You can think of watermarking from an information theoretic
perspective: if you're inserting an n-bit watermark into a signal per
unit time, you either have to find n bits of redundancy in the signal
per unit time, or you have to lose n bits of information from the
signal per unit time (or a combo of the two.) Otherwise you'd be
getting something for nothing. But here's the kicker, and the point of
this little digression: a watermark algorithm can only use the _first_
of the two sources of redundancy, because it has to survive file
format conversion.

At this point it's reasonable to wonder how much of an mp3 coder's
gains are due to the first source and how much are due to the
second. The party line is that it varies with time: for spectrally
simple blocks, the psychoacoustics don't help much but you get great
compression; for spectrally complex blocks, the psychoacoustics are a
big win but you don't get much compression. (This balance is usually
recognized as a design win.) The upshot is that watermarks don't get
the balance: if you're trying to cram a constant number of watermark
bits per unit time, you have to rely on just the psychoacoustics,
which means there are some points in a track (and some entire tracks!)
where you lose.

People should step in and correct me if I'm wrong on that last
rumination.

Geoff


Current thread: