Bugtraq mailing list archives

'unicode' vs URL encoding.


From: Cris Bailiff <c.bailiff () awayweb com>
Date: Wed, 30 May 2001 23:46:20 +1000

To eDvice Security Services,

Your bugtraq item on the NetGAP appliance incorrectly talks about the NetGAP
system miss-interpreting '%65' as a 'unicode' encoding of the letter 'e'.

This misconception has become prevalent in recent bugtraq postings, so I hope to
try and clear it up for future reference - '%' encoding is used for the encoding
of any 'non-legal' characters in URL format strings. The bug is that netgap does
not 'URL decode' the string before doing comparisons.

'%' (URL) Encoding is *not* unicode encoding - unicode is a multibyte character
set, which uses binary values outside the 32-127 range of printable ASCII. When
unicode characters are used in URLs, they are usually/often expressed in 'utf-8'
encoding, which uses a short sequence of binary values to encode a full unicode
character. Many of the values used in utf-8 encoding of unicode are illegal in
URLs without using 'URL encoding' (% escaping), but not all % escaped characters
represent either utf-8 or unicode...

This is often mixed up because a number of Microsoft IIS vulnerabilities recently
have been due to incorrect 'unicode' decoding and/or incorrect detection of utf-8
encoded unicode characters, some of which was due to ambiguitites in the
checking/removing of URL encoding. However, many more web server bugs are related
solely to the common mistake of simply not removing URL encoding before doing
security checks, such as the one demonstrated in NetGAP.

I feel it important to distinguish these two classes wherever possible, as common
unicode decoding errors are likely to impact a variety of security related
software in future, even when that software has nothing to do with web
applications or URL processing. Care in unicode handling is still required even
when URL encoding issues have been correctly dealt with, and likewise, not using
unicode does not prevent URL encoding from being a security problem...

Cris Bailiff
c.bailiff () devsecure com


Current thread: