WebApp Sec mailing list archives

Re: Preventing cross site scripting


From: "Tim Greer" <chatmaster () charter net>
Date: Thu, 19 Jun 2003 21:46:52 -0700

Resending my original post to this topic. The original does not appear to
have gone through... This encompasses the idea I was *originally* speaking
of so the rest of the follow0ups make sense. Hello mod(s), I even fixed some
stuff to be 'nicer'--not that I wasn't being???

To prevent CSS attacks, it is the most simple and trivial thing; Simply
parse the input. Change all < and > tags to &lt; and &gt; for text/HTML
display of the tag itself without it parsing it. Then, like you stated, and
is the most basic approach to security for form input, etc., is to put them
back together with *only* the HTML tags you want, such as &lt;br&gt; would
then be put back together as a line break tag <br> You can do this easily
for almost all HTML tags. For tags that could potentially be used to input
things such as anchor tags for images or hot links, etc. simply control
what's put back together.

Such as (A perl example--I'm just writing this off the top of my head, this
isn't meant to be usable per se):

<.. scratch old example... Below, is a simple regex, it can be improved, I'm
sure. it doesn't take much at all to overcome CSS attacks in your scripts >

Or even better yet (everything BUT ', " < and >... for an example, you may
want to deny other things):

s!&lt;\s*a\s+href\s*=\s*['"]?(http?|ftp)://(\w[\w.-\@:]+\.\w{1,4})(:\d{1,8})
?([^'"<>]*)?\s*['"]?&gt;(.*)&lt;/a&gt;!<a
href="${1}://${2}${3}${4}">${5}</a>!ix;

I.e, (watch for word wrap, this is all one string, use the s!!ix; syntax is
you have a problem):

#!/usr/bin/perl
# tgreer.
# This regex should be modified, as it's sloppy, but again, I'm just typing
off the top of my head.
# Change URL to what you want to test.
$_ = '<a href="http://testingthis.com/this?that&thiis=that";>testing...</a>';

s/</&lt;/g;
s/>/&gt;/g;

s!&lt;\s*a\s+href\s*=\s*['"]?(http?|ftp)://(\w[\w.-\@:]+\.\w{1,4})(:\d{1,8})
?([^'"<>]*)?\s*['"]?&gt;(.*)&lt;/a&gt;!<a
href="${1}://${2}${3}${4}">${5}<\/a>!ix;

print "$_\n";

This will pick up valid *looking* domains (to a point, not very accurate),
with a port of without, a URL, a query_string, etc. However, although it's
not a good domain/URL checking regex, it *does* refuse to parse/render any
HTML tag that has any <, >, ":, or ' in it, other than at the legitimate
open and close URL or tag enclosing points.

You'd want to do better sanity checks on that and check for defined
variables, etc. (of source, and okay, not so pretty and not a solid
solution, but just a quick idea. You can allow only URL's with characters
that should be valid in almost 100% of any/all URL's that anyone should
want/need to post to a page for others to view, and you remove the
possibility of someone inputting any end HTML tag and creating their own
within the new space, as well as any characters that would otherwise close
or end the anchor tag. No one can slip anything in there--provided the idea
is complete. I.e. only word characters in domains, along with a .dots and
dashes. Although underscores can be technically legal, they are a word
character anyway and safe.

You can have the @ character and maybe a : character for links to password
protected sites for web or FTP access (for the domain name). Not good
checking for that being valid, but only to be safe. It will handle IP's or
domain names. It also allows for people to put in invalid (but safe) domain
names for either http, http or ftp protocols. Optionally, a URL with
whatever characters seem most likely probable; such as word characters
(obviously), forward slashes, .dots, ? query_string, separators, etc. tilde,
and so on... whatever you want. Just be careful about allowing things that
should not be in URL's at all (or not very likely at all); such as <, >, ",
and '. You can then either disallow other characters or create a long list
(which is sort of foolish to list each character you want at this
point--it's easier to list one's you do NOT want in the URL.tag) So, URL's
with ~, !, @ ,# ,$ , %, ^ , & , =, *, :, ;, ,, ?, [, ], etc. that would
potentially be in a URL pointing to a script at a site someone wants to link
to--but only a clickable link.

This is so sickly simple and can be done within a matter of a few minutes
with some not very complicated regular expressions. These (in my opinion)
fools that go around screaming about how much of a "security guru" they are,
because they find out that some Microsoft service doesn't filter their form
input well, doesn't mean jack. This is all very easily avoidable, it's hyped
up to sound like it's something everyone should worry about and so on. Yes,
it's real and people should worry, but for creating your own programs or
scripts, it's something very easily avoided. I can name off a few people
that actually think they are big shots for going around posting stupid CSS
script attempts in anchor tags on every site and service for years, finally
find one in Hotmail (oh, big surprise) and try and publicize themselves
about the hype. Not that any issue isn't legitimate, but you know in what
regard I mean.

Really, this is truly nothing more important than simply checking the
submitted data to ensure things are in control. It's not at all difficult to
do. I don't assume you believe it is, and sorry if it seems like I'm
lecturing you, this is not what I'm doing--I'm just so sick of seeing silly
alerts about these topics as if they are as important as things that
actually are important--and mostly by some of these alleged "big shot"
security guru firms that don't really know what they are doing (but they do,
because they manage to insert their lame CSS tag in some HTML source
somewhere--showing that simple, unimaginative things are doable because of a
lot of thoughtless programmers that have no business programming, doesn't
mean anything). Nonetheless, I'm always glad to see people asking and
educating themselves on how to avoid it. This comes down to a very simple
policy that every programmer should implement and adhere to; Disallow
everything by default and then only allow what you want, in a controlled
manner after that. This way, you control how things work and people can
*not* compromise your scripts/program. It is really (_that_) simple. :-)
--
Regards,
Tim Greer  chatmaster () charter net
Server administration, security, programming, consulting.


----- Original Message -----
From: "Andrew Beverley" <mail () andybev com>
To: <webappsec () securityfocus com>
Sent: Thursday, June 19, 2003 11:28 AM
Subject: Preventing cross site scripting


I am currently writing a web application that, as a small part of it,
needs to display an email message. Obviously the message is potentially
in html format, which to display could be sent straight to the browser.

I would like to know the best way of filtering out undesirable html. I
understand the best way is to only allow acceptable information, in this
case all the different html formatting tags.

However, there is a lot of tags that are acceptable. Another approach
would be to strip out all the bad stuff such as <SCRIPT>, <OBJECT>,
<APPLET>, and <EMBED> but this is far from ideal because of new tags
becoming available and so on.

Are there any functions available (for php) that will take a html page
as input and strip out all nasty stuff? Does anyone have suggestions as
to how to do this as easy as possible?

Thanks,

Andrew Beverley





Current thread: