WebApp Sec mailing list archives
Re: Preventing cross site scripting
From: Laurian Gridinoc <laur () grapefruitdesign com>
Date: 21 Jun 2003 00:55:18 +0300
On Fri, 2003-06-20 at 20:11, Tim Greer wrote:
Please provide some examples of this. I'd like to see your idea(s) at work and how it would solve this problem. I'm honestly not quite clear on the context in which you mean this to solve this problem and I'm interested knowing. I'm not sure I agree right now, so some examples illustrating it would be great--if you'd be so kind. Thanks.
This thread started with `how to export safely HTML mail messages to the web'. This may require to deal with the some of the following issues: 1. broken markup (<ni <foo href="a"d"" bar='> baz> " no semicolon) 2. unacceptable entities 3. unacceptable tags (applet, object) 4. unacceptable attributes on acceptable tags (onmouseover, ...) 5. unacceptable attribute values (href="javascript:...", width="100000") 6. unacceptable text tokens (offensive words) I suggest to deal with them in the stated order, and not treat the HTML string as a mere string, but dissect it in markup and content; clean the markup (first elements, then attributes of the accepted elements) then text. [1] is wonderfully solved by filtering through tidy outputting xml (xhtml) - this would be the data for the next steps. The rest of the issues may be controlled by a XSL transformation on the above generated xml. [2] with a proper DTD you may alter the `rendering' of any unaccepted entity, let's say that I want to change â (capital A, circumflex accent) to capital A instead, simply by defining it in the DTD: <!ENTITY Acirc CDATA "A"> Note that <, >, & and "e; cannot be handled this way. [3] unacceptable tags, now is preferable to use white lists; let's see a black list solution: <!-- drop script silently--> <xsl:template match="script" /> <!-- or drop script and leave a note --> <xsl:template match="script"> <xsl:comment>here was an evil script</xsl:comment> </xsl:template> <!-- drop applet preserving it's content (ex. the `backup' markup for useragents that don't understand applet tag) --> <xsl:template match="applet"> <xsl:apply-templates /> </xsl:template> <!-- and accept everything since this is a blacklist solution --> <xsl:template match="*|@*|text()|comment()"> <xsl:copy> <xsl:apply-templates select="*|@*|text()|comment()" /> </xsl:copy> </xsl:template> The whitelist solution would match only accepted tags: <!-- accept only p, ul, li and attributes on them (and text nodes too, and comments) --> <xsl:template match="p|ul|li|@*|text()|comment()"> <xsl:copy> <xsl:apply-templates select="*|@*|text()|comment()" /> </xsl:copy> </xsl:template> [4] unacceptable attributes, blacklist version: <!-- accept everything on `a' except on* attributes --> <xsl:template match="a"> <xsl:element name="a"> <xsl:for-each select="@*"> <xsl:if test="not(starts-with(name(), 'on'))"> <xsl:variable name="attribute"> <xsl:value-of select="name()" /> </xsl:variable> <xsl:attribute name="$attribute"> <xsl:value-of select="." /> </xsl:attribute> </xsl:if> </xsl:for-each> <xsl:apply-templates /> </xsl:element> </xsl:template> Whitelist version: <!-- accept only href and title on `a' --> <xsl:template match="a"> <xsl:element name="a"> <xsl:attribute name="href"> <xsl:value-of select="@href" /> </xsl:attribute> <xsl:attribute name="title"> <xsl:value-of select="@title" /> </xsl:attribute> <xsl:apply-templates /> </xsl:element> </xsl:template> [5, 6] unacceptable attribute and text values, now here is funny, the string manipulation functions in XSL are few and not so powerful as regex, but there isn't impossible to build proper value validation. On strings (node and attribute names, attribute and text node values) you have just concat, contains, starts-with, string-length, substring, substring-after, substring-before and translate; almost nothing compared to regex power, but in the end is not a contest of writing it all on a line. I'm not writing this to say regex are bad, I'm just stating that not everything that can be hold in a string should be treated this way; this means that HTML should be represented as (parsed to) a DOM tree (where only nodes/attributes names, attributes values, text nodes and comments are separate strings) where what cannot be divided anymore (atom) to another set of tokens should be the subject of validation as a string or number; however an attribute value which should represent an URL should be validated by using a parser specifically built for this task (based on URL grammar). Cheers, -- Laurian Gridinoc Chief Developer GRAPEFRUIT DESIGN tel/fax: +40.232.233068 tel/fax: +1.646.349.2916 mobile: +40.745.304379 e-mail: laur () gd ro www.grapefruitdesign.com www.gd.ro
Current thread:
- Preventing cross site scripting Andrew Beverley (Jun 19)
- Re: Preventing cross site scripting Jeremiah Grossman (Jun 19)
- Re: Preventing cross site scripting Tim Greer (Jun 19)
- Re: Preventing cross site scripting Tim Greer (Jun 20)
- Re: Preventing cross site scripting Wojciech Purczynski (Jun 20)
- Re: Preventing cross site scripting Laurian Gridinoc (Jun 20)
- Re: Preventing cross site scripting Tim Greer (Jun 20)
- Re: Preventing cross site scripting Laurian Gridinoc (Jun 20)
- Re: Preventing cross site scripting Tim Greer (Jun 20)
- Re: Preventing cross site scripting Laurian Gridinoc (Jun 21)
- Re: Preventing cross site scripting Tim Greer (Jun 21)
- Message not available
- Re: Preventing cross site scripting Tim Greer (Jun 21)
- Re: Preventing cross site scripting Laurian Gridinoc (Jun 21)
- Re: Preventing cross site scripting Tim Greer (Jun 21)
- Re: Preventing cross site scripting Wojciech Purczynski (Jun 20)
- Re: Preventing cross site scripting Jeremiah Grossman (Jun 19)
- Re: Preventing cross site scripting Tim Greer (Jun 20)
- <Possible follow-ups>
- Preventing cross site scripting Andrew Beverley (Jun 19)
- Re: Preventing cross site scripting Tim Greer (Jun 19)
- RE: Preventing cross site scripting David Cameron (Jun 19)