WebApp Sec mailing list archives

Re: HTML entity bignums


From: Ingo Struck <ingo () ingostruck de>
Date: Tue, 29 Jul 2003 18:15:02 +0200

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi...

I have found that some popular web browsers allow big numbers to be used in
HTML's numeric entities. The programs in question store character values in
32 bits, so the characters 58, 58 + (2 ** 32), 58 + (2 ** 64) and so on are
all colons to them.
I don't see that this is a real issue...
A well-crafted web application will perform two different kinds of filtering:

- - input filtering:
  At this stage anything that is unknown should be filtered out and discarded.
  If the app encounters anything it doesnt expect it simply drops it.
  Within this stage some browser's behaviour regarding numeric character
  references is irrelevant, because it cannot influence the behaviour of the
  input filter

- - output filtering:
  HTML/XML output is only acceptable from trusted sources, i.e. 
  - tags generated by the app without directly incorporating any input
  - static files that belong to the app itself and are not modifiable from
    outside or interaction with the app (if those are corrupted that means
    that someone took over your server and filtering is pointless anyway)
  FOR ANY OTHER HTML SOURCES, THE CHARACTERS '<' AND '&'
  MUST ALWAYS BE ESCAPED. PERIOD.

  Any conforming SGML / HTML parser must not interpret any other char
  then these to be a starting markup delimiter (and it is very likely that   
  even non conforming parsers adhere to that). For backwards compatibility
  you must additionally escape the character '>' and it is nice if you escape
  apos and quot too.

Due to the required ISO 10464 performance, you can be sure that the characters 
that must be escaped are uniformely described by the one-byte hexnumbers 0x3c 
(<) and 0x26 (&), so they are easy to detect. If any user agent interprets 
any other char as markup delimiter, then there is no way to get that fixed 
but replacing that user agent. 
(NOTE however, that you may still run in trouble if you use any non-ASCII 
compatible encoding. But if you do that, you run into a bunch of other 
troubles simultaneously, so this is discouraged anyway).

If you strictly adhere to that rule (that means, it must be implemented 
somewhere deep down within your app, best in the output streams)
then a large number of xss issues disappear, because:
- - any illegal tag within the output is escaped so it isnt a tag any longer;
  thus any tags that contain script or other executable elements are rendered
  literal texts as well as the script content they include
- - any character reference within the output is escaped so it isnt a reference
  any longer and cannot form "hidden" script input anymore
A nice side effect is, that your XML parsers could not be messed up with
deliberately malformed document structures.

You should *not* only rely on normalization of the output, since that opens
up your system to cross-site-scripting. Normalization (removal of unwanted
crap) for the output is a nice-to-have, since you generate more conformant
output, but it does not protect against xss effectively. One reason for that 
is, that normalization is a *very* complicated multi-step process and thus 
very likely to be malimplemented. 

I have been able to reproduce this entity bignum behaviour with recent
versions of Mozilla, Galeon, Opera and w3m - but not with recent versions
of Internet Explorer, Lynx and Elinks.
Konqueror 3.2 ignores those unknown char refs too...

Kind regards

Ingo Struck

- -- 
ingo () ingostruck de
Use PGP: http://ingostruck.de/ingostruck.gpg with fingerprint
C700 9951 E759 1594 0807  5BBF 8508 AF92 19AA 3D24
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (GNU/Linux)

iD8DBQE/Jp2JhQivkhmqPSQRAhEGAKCzFmy7W6RdLLuQcMRx04v3GB/3GgCfUZK5
pG/8Rod0jScSTzz/fuzdwW4=
=VOtz
-----END PGP SIGNATURE-----


Current thread: