Vulnerability Development mailing list archives

Re: [imp] sanitizing html


From: sthen+vulndev () NAIAD ECLIPSE NET UK (Stuart Henderson)
Date: Mon, 21 Feb 2000 23:25:09 +0000


I'm copying this to the vuln-dev list since I think it's a good place
to discuss HTML patterns that need to be, and possibly aren't currently,
filtered in web-based mail systems displaying inline rich-text email.
The Horde developers are primarily concerned with scripting here, though
comments on privacy and related issues may well crop up and be helpful.

This is specifically within the context of securing a GPL web-based mail
client, IMP, though I guess the discussion will be relevant to any similar
system, including the well-known commercial targets, and possibly of help
to anyone building or subverting a script-filtering http proxy (I guess
it's possible that some of those maybe fooled by tcp-frag-boundaries,
like the recent fw-1 ftp problem).

On Mon, Feb 21, 2000 at 05:00:46PM -0500, chagenbu () wso williams edu wrote:
I'm attaching the current set of regexps that html messages are passed through
before being displayed. I know for a fact that they're not perfect, but I'd
appreciate it if people would take a look through them and see if they miss
anything important, or destroy something they shouldn't destroy.

## $data = preg_replace('|<([^>]*)&{.*}([^>]*)>|', '<&{;}\3>', $data);

Not sufficiently global, since an attacker can still use,
for example hr&#69;f=script:foo -- however, this is tricky to
filter without hitting some legitimate addresses, for example
http://foo.bar.com/womble.cgi?user=someone&page=something.

There has been some problems with web servers parsing escaped
characters which should be invalid, it's possible that (some? :)
browsers do something similar.

## $data = preg_replace('|<([^">]*)[Ss][Cc][Rr][Ii][Pp][Tt]|', '<\1 horde_cleaned_script', $data);
## $data = preg_replace('|href="(.*)[Ss][Cc][Rr][Ii][Pp][Tt]:|', 'href="horde_cleaned_script:', $data);

Anything caught by the second preg_replace should already have
been cleaned by the first. (href=script:foo will only match the
first one, however).

## $data = preg_replace('|<([^>]*)[Oo][Nn](.*)=([^>]*)|', '<\1 horde_cleaned_onevent', $data);

This may also need the escaped version of = (decimal 61)

## $data = preg_replace('|<([^>]*)[Ee][Mm][Bb][Ee][Dd]|', '<horde_cleaned_embed', $data);
## $data = preg_replace('|<([^>]*)[Jj][Aa][Vv][Aa]|', '<horde_cleaned_java', $data);
## $data = preg_replace('|<([^>]*)[Oo][Bb][Jj][Ee][Cc][Tt]|', '<horde_cleaned_object', $data);
## $data = preg_replace('|<([^>]*)[Ss][Tt][Yy][Ll][Ee]|', '<horde_cleaned_style', $data);
## $data = preg_replace('|<([^>]*)[Mm][Oo][Cc][Hh][Aa]:([^>]*)>|', '<horde_cleaned_mocha:\2>', $data);

I would like to have the option, configurable in mime.php3 (and
I guess on a per-user granularity in the 2.3 prefs system), defaulting
to disabled, of allowing the / character or its escaped form anywhere
after the first character in a tag).

Does anyone know how unicode variants fit in here? I am guessing
that since browsers interpret unicode strings in html tags, these
may require additional filtering. And of course there's <applet>
which should be a reasonably straightforward task to filter.

I will collect any replies from vuln-dev relevant to IMP, and
post a digest to the imp mailing list which does not accept posts
from non-members. (for anyone interested in taking a look at
the php code making up IMP and Horde, see http://horde.org/cvs/
the next "proper" release will be from the 2.2 tree).

--stuart


Current thread: