Bugtraq mailing list archives

Re: Cross site scripting: a long term fix


From: Michael Wojcik <Michael.Wojcik () MERANT COM>
Date: Mon, 9 Oct 2000 15:22:02 -0700

From: Tollef Fog Heen [mailto:tollef () ADD NO]

[from Zag Zig:]
| I propose that a solution for quoting markup should be built into
| the HTML specification and therefore made available to all servers
| for use with both static and dynamically generated text.

Which is has been, but was then deprecated and is now obsoleted, from
html-2.1e (from the IETF).

[the "XMP" element]

Odd that it took so long (relatively speaking) for someone to bring XMP up.
Don't people remember the wild, carefree days of HTML 1.0?

It didn't have the same options as yours (adding stuff to the ending
tags etc), and caused problems.

As I understand it, XMP is deprecated because it violates SGML; there's no
way in SGML grammars to define a tag that suppresses all other metatdata
interpretation until its own fully-formed end tag.

XMP has the problem Zag Zig notes for unadorned tags of its type (ie. the
data can unescape itself by inserting the end tag).  But no XMP-style tag,
including Zag Zig's (interesting) improved versions, is likely to be
incorporated into HTML, or XML for that matter, because those languages are
based on SGML.

The solution, for those of us who made use of XMP back when it was legal, is
to transform HTML metacharacters when including untrusted text, as Cooper
suggested in another message.  Yes, people have had difficulties doing that
correctly, but that's because they approached it from the wrong direction.
You need to use a least-privileges, minimal-permissions model.  Escape
anything not in the explicit "verbatim" set, which should be as small as is
feasible.

It is probably better to add a tag which means something like 'get
this URI, insert it here, but treat it like mime/type (or let the
server which returns it decide)'.

That's an interesting suggestion.  Of course, we have to be sure that the
URIs don't contain evil payloads; can't rely on the user agent to sanitize
them.  But that's a smaller problem domain.

On the other hand, I'm not sure that every user agent out there respects,
say, text/plain if it thinks it found HTML markup in it.  And some user
agents (eg. the execrable Outlook) insist on special processing for any
character string it thinks is significant regardless of source or context.
(A perfect example of Microsoft's "embrace and stab in the back"
philosophy.)

Michael Wojcik             michael.wojcik () merant com
MERANT
Department of English, Miami University


Current thread: