Dailydave mailing list archives

Re: Vista speach recognition


From: Michal Zalewski <lcamtuf () dione ids pl>
Date: Thu, 1 Feb 2007 19:43:04 +0100 (CET)

On Thu, 1 Feb 2007, Juha-Matti Laurio wrote:

http://blogs.technet.com/msrc/archive/2007/01/31/issue-regarding-windows-vista-speech-recognition.aspx

I find this kind of bogus. Voice recognition systems don't compare raw
waveforms. Most of the information is discarded: they usually isolate a
fraction of the signal, normalize it, chop it into discrete bits that best
reflect changes in voice modulation or whatnot, then feed it to HMM
analyzer or some other ANN. This is heavily optimized based on various
assumptions on how human speech sounds, and how ambient noises might look
like.

What this means is that it is in all likelihood possible to produce a
waveform that will be impossible to interpret for a human (either because
it is masked by a superimposed signal, or because it does not resemble
speech in the first place), but will be "heard" as meaningful words by
Vista.

So, you get an eerie industrial background music and noises on a website,
instead of a dude reading out loud "my documents, delete, yes".

Heck, this happens spontaneously: speech recognition systems sometimes
pick up random burps and crashes from the environment and map them to
dictionary words. And wasn't there an early demo for Vista speech
recognition that wasn't trained for that particular salesdude, and kept
hearing "dear aunt double the killer" instead of what he was saying? Oh
yeah:

http://video.google.com/videoplay?docid=-1123221217782777472

Now, I bet that MSRC dudes are well aware of this possibility, but chose
not to mention it. Eh.

/mz

_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
http://lists.immunitysec.com/mailman/listinfo/dailydave


Current thread: