Full Disclosure mailing list archives
Snowdrop: a leak tracking tool
From: lcamtuf () ghettot org (Michal Zalewski)
Date: Mon, 9 Sep 2002 00:27:02 -0400 (EDT)
Hello list, First of all, sorry if I picked a wrong place for shameless promotion of own stuff. Looking at the charter, this seems to be an appropriate forum, and since SECTOOLS () securityfocus com seems to be largely extinct, so I don't think there's much choice... :-) To the point - in my spare time, I have been working (big word) on a project that, while is not really a security tool per se, may be of some interest to some readers and the security industry as such. I wanted to announce a new tool, called Snowdrop; it is supposed to provide an interesting protection scheme for raw text documents and C sources, so that it is possible to identify and track down a person who disclosed a portion of the document or code to the public, even if the document has been modified, truncated, reformatted or otherwise badly hurt. Possible applications: - internal memos and sensitive documents, even in e-mails, - vulnerability data that can be leaked by one of vendors too early, - proprietary sources, - non-public exploits, - etc, etc. The goal is to make it possible to accurately determine who disclosed the information, and, if necessary, to demonstrate to the public that the disclosed information originated from you. The main concept, as you probably guessed, is to embed a specific type of a watermark in the document - but that ain't your typical file watermarking utility. The ideas behind Snowdrop: - using the content, instead of the medium; we introduce slight changes to the written text, instead of introducing a payload; this makes the technique much less prone to conversions, copy-and-paste, and so on, - using steganography to make the watermark non-evident and non-intrusive, - using several separate channels (synonyms, variable names, formatting, typos, punctuation style, code logic) to make the information less vulnerable to casual modifications, such as reformatting, spell checking, simple edits, etc, - using MD5 in a manner that makes watermarks (nearly) impossible to tamper with in a meaningful way - for example, to make the leak look like it's a fault of an innocent third party, - using short, highly redundant watermarks to make it possible to recover the watermark even from as little as a single paragraph of text. While the idea isn't new, I think that's the first open-source project that uses non-trivial watermarking on this level. I realize the description above is painfully vague, and I strongly encourage you to read the documentation before asking "what the heck?". Yes, as you are most likely aware, it is next to impossible to create a watermark that cannot be purposefully removed or destroyed, and I am not trying to say Snowdrop is trying to do that. That is not the point. Since the watermark presence is not evident, the watermark itself is fairly difficult to remove by accident and pretty small - in most cases, only people who routinely run some anti-Snowdrop software on all outgoing documents would be safe. So while it is possible that another person would outsmart you and delete the watermark, chances are, this won't be the case. Of course, since Snowdrop was just something I coded as a PoC in my free time, it's far from being perfect. Current code is pretty much beta, with English language support working fairly well, and C code support still not fully functional. The point of this announcement, as usual, is to probe for interest, feedback and ideas, and to look for developers willing to spend some time on this tool with me. Download the beta code at http://lcamtuf.coredump.cx/snowdrop.tgz Things that are broken or nasty the current version, and where your help is welcome: a) awful resynchronization code; it's too slow. b) poor man's synonyms; it would be good to support multi-word substitutions and use a better database of entries - the one that is used right now is entirely homebrew, c) certain channels still not supported by the C module; C module broken for many language constructions; consider using a smart parser (flex or such) to handle this. PS. Since the noise ratio on this group is already high, I'd like to ask you to reply directly to me, unless you really think this is a matter others will be interested in :-) In particular, I don't think it makes any sense to report bugs, compilation problems, etc, to the list. -- Michal Zalewski
Current thread:
- Snowdrop: a leak tracking tool Michal Zalewski (Sep 08)