Vulnerability Development mailing list archives
Re: [imp] sanitizing html
From: sthen+vulndev () NAIAD ECLIPSE NET UK (Stuart Henderson)
Date: Mon, 21 Feb 2000 23:25:09 +0000
I'm copying this to the vuln-dev list since I think it's a good place to discuss HTML patterns that need to be, and possibly aren't currently, filtered in web-based mail systems displaying inline rich-text email. The Horde developers are primarily concerned with scripting here, though comments on privacy and related issues may well crop up and be helpful. This is specifically within the context of securing a GPL web-based mail client, IMP, though I guess the discussion will be relevant to any similar system, including the well-known commercial targets, and possibly of help to anyone building or subverting a script-filtering http proxy (I guess it's possible that some of those maybe fooled by tcp-frag-boundaries, like the recent fw-1 ftp problem). On Mon, Feb 21, 2000 at 05:00:46PM -0500, chagenbu () wso williams edu wrote:
I'm attaching the current set of regexps that html messages are passed through before being displayed. I know for a fact that they're not perfect, but I'd appreciate it if people would take a look through them and see if they miss anything important, or destroy something they shouldn't destroy.
## $data = preg_replace('|<([^>]*)&{.*}([^>]*)>|', '<&{;}\3>', $data); Not sufficiently global, since an attacker can still use, for example hrEf=script:foo -- however, this is tricky to filter without hitting some legitimate addresses, for example http://foo.bar.com/womble.cgi?user=someone&page=something. There has been some problems with web servers parsing escaped characters which should be invalid, it's possible that (some? :) browsers do something similar. ## $data = preg_replace('|<([^">]*)[Ss][Cc][Rr][Ii][Pp][Tt]|', '<\1 horde_cleaned_script', $data); ## $data = preg_replace('|href="(.*)[Ss][Cc][Rr][Ii][Pp][Tt]:|', 'href="horde_cleaned_script:', $data); Anything caught by the second preg_replace should already have been cleaned by the first. (href=script:foo will only match the first one, however). ## $data = preg_replace('|<([^>]*)[Oo][Nn](.*)=([^>]*)|', '<\1 horde_cleaned_onevent', $data); This may also need the escaped version of = (decimal 61) ## $data = preg_replace('|<([^>]*)[Ee][Mm][Bb][Ee][Dd]|', '<horde_cleaned_embed', $data); ## $data = preg_replace('|<([^>]*)[Jj][Aa][Vv][Aa]|', '<horde_cleaned_java', $data); ## $data = preg_replace('|<([^>]*)[Oo][Bb][Jj][Ee][Cc][Tt]|', '<horde_cleaned_object', $data); ## $data = preg_replace('|<([^>]*)[Ss][Tt][Yy][Ll][Ee]|', '<horde_cleaned_style', $data); ## $data = preg_replace('|<([^>]*)[Mm][Oo][Cc][Hh][Aa]:([^>]*)>|', '<horde_cleaned_mocha:\2>', $data); I would like to have the option, configurable in mime.php3 (and I guess on a per-user granularity in the 2.3 prefs system), defaulting to disabled, of allowing the / character or its escaped form anywhere after the first character in a tag). Does anyone know how unicode variants fit in here? I am guessing that since browsers interpret unicode strings in html tags, these may require additional filtering. And of course there's <applet> which should be a reasonably straightforward task to filter. I will collect any replies from vuln-dev relevant to IMP, and post a digest to the imp mailing list which does not accept posts from non-members. (for anyone interested in taking a look at the php code making up IMP and Horde, see http://horde.org/cvs/ the next "proper" release will be from the 2.2 tree). --stuart
Current thread:
- Re: [imp] sanitizing html Stuart Henderson (Feb 21)
- Re: [imp] sanitizing html Mikael Olsson (Feb 23)
- Re: [imp] sanitizing html Marc Slemko (Feb 23)
- Re: [imp] sanitizing html Mikael Olsson (Feb 23)