Bugtraq mailing list archives

MD5 To Be Considered Harmful Someday

From: Dan Kaminsky <dan () doxpara com>
Date: Mon, 06 Dec 2004 15:29:34 -0800

I've been doing some analysis on MD5 collision announced by Wang et al.Short version: Yes, Virginia, there is no such thing as a safe hashcollision -- at least in a function that's specified to becryptographically secure. The full details may be acquired at thefollowing link:


http://www.doxpara.com/md5_someday.pdf

A tool, Stripwire, has been assembled to demonstrate some of the attacksdescribed in the paper. It may be acquired at the following address:

http://www.doxpara.com/stripwire-1.1.tar.gzIncidentally, the expectations management is by no means accidental --the paper's titled "MD5 To Be Considered Harmful Someday" for a reason.Some people have said there's no applied implications to Joux and Wang'sresearch. They're wrong; arbitrary payloads can be successfullyintegrated into a hash collision. But the attacks are not wildlypractical, and in most cases exposure remains thankfully limited, fornow. But the risks are real enough that responsible engineers shouldtake note: This is not merely an academic threat, systems designed withMD5 now need to take far more care than they would if they wereemploying an unbroken hashing algorithm, and the problems are only goingto get worse.


Some highlights from the paper:

* The attack itself is pretty limited -- essentially, we can create"doppelganger" blocks (my term) anywhere inside a file that may beswapped out, one for another, without altering the final MD5 hash. Thislets us create any number of binary-inequal files with the same md5sum.

* MD5 uses an appendable cascade construction -- in other words, if youhappen to find yourself with two files that MD5 to the same hash, anarbitrary payload can be applied to both files and they'll still havethe same hash. This leads to...

* Attacks are possible using only the proof of concept test vectorsreleased by Wang -- the actual attack is not necessary.

* Stripwire emits two binary packages. They both contain an arbitrarypayload, but the payload is encrypted with AES. Only one of thepackages ("Fire") is decryptable and thus dangerous; the other ("Ice")shields its data behind AES. Both files share the same MD5 hash.

* Digital Signature systems are vulnerable, as they almost always sign ahashed representation of data rather than the data itself.

* This is an excellent vector for malicious developers to get unsafecode past a group of auditors, perhaps to acquire a required third partysignature. Alternatively, build tools themselves could be compromisedto embed safe versions of dangerous payloads in each build. At somelater point, the embedded payload could be safely "activated", withoutthe MD5 changing. This has implications for Tripwire, DRM, and severalpackage management architectures.

* HMAC's invulnerability has been slightly overstated. It's definitelypossible, given the key, to create two datasets with the same HMAC.Attacker possession of the key violates MAC presumptions, so the impactof this is particularly questionable.

* Very interesting possibilities open up once the full attack is madeavailable -- among other things, we can create self-decryptingexecutables (fire.exe and ice.exe) that exhibit differential behaviorbased on their internal colliding payloads. They'll still have the sameMD5 hash.

* Several doppelgangers may (relatively quickly, as per Joux) becomputed within a single multicollision-friendly block. As such, theparticular selection of doppelganger sets within a file can itself bemade to represent data. It's relatively straightforward to embed a 128bit signature inside an arbitrary file, in such a way that no matter thevalue of the signature, a constant MD5 hash is maintained. This iscuriously steganographic.

* Many popular P2P networks (and innumerable distributed contentdatabases) use MD5 hashes as both a reliable search handle and amechanism to ensure file integrity. This makes them blind to anysignature embedded within MD5 collisions. We can use this blindness totrack MP3 audio data as it propagates from a custom P2P node."Strikeback" capacity against executable trafficking is even morepronounced -- it's possible to create application installers thatself-modify with host identifying characteristics but still successfullyretransmit on P2P networks under the global search hash.

I hope this paper proves useful to the security community at large, andI welcome feedback.


--Dan Kaminsky
www.doxpara.com
dan () doxpara com

Current thread:

MD5 To Be Considered Harmful Someday Dan Kaminsky (Dec 07)
- Re: MD5 To Be Considered Harmful Someday Gandalf The White (Dec 07)
  - Re: MD5 To Be Considered Harmful Someday Tim (Dec 08)
    - Re: MD5 To Be Considered Harmful Someday Dragos Ruiu (Dec 08)
    - Re: MD5 To Be Considered Harmful Someday David F. Skoll (Dec 08)
  - Re: MD5 To Be Considered Harmful Someday Joel Maslak (Dec 08)
    - Re: MD5 To Be Considered Harmful Someday Steve Friedl (Dec 08)
  - RE: MD5 To Be Considered Harmful Someday David Schwartz (Dec 08)
    - Re: MD5 To Be Considered Harmful Someday Gandalf The White (Dec 08)
    - Re: MD5 To Be Considered Harmful Someday Keith Oxenrider (Dec 08)
    - Re: MD5 To Be Considered Harmful Someday Paul Wouters (Dec 08)

(Thread continues...)