Full Disclosure mailing list archives

Re: [WEB SECURITY] Unicode Left/Right Pointing Double Angel Quotation Mark bypass?


From: "Chris Weber" <chris () casabasec com>
Date: Fri, 05 Jun 2009 00:00:53 -0700

Two patterns in Unicode account for these behaviors:



  1. Normalization (less of what's happening here) and

  2. best-fit mappings (most of what's happening here)



The first is a true Unicode standard, the second is a loosely defined set of
mappings provided as a convenience to software vendors.  In fact, best-fit
mappings are mostly a vendor problem, and arguably not Unicode's issue at
all.



The characters White Hat found in their study are a mix of things.  Only two
of the characters you listed have Normalization mappings in Unicode, which
suggests most weren't normalized by some API in the stack.  In fact, you can
always count on the full width Latin characters having normalization
mappings, because they all do.



U+FF1C and U+FF1E <> normalize to < >, but only using the two
'compatibility' decomposition forms NFKC and NFKD.



In any browser, click the following link containing full-width Latin
characters, and you'll see they all get transformed to their ASCII
equivalents.  That's because IDNA calls for Normalization form KC which all
browsers implement in URL/IRI handling.  It's useful for a quick n dirty
Normalization test.



http:// <http://ABC123.nottrusted.com> ABC123.nottrusted.com



The other characters you guys found point to some other things too, but not
Normalization.



U+00AB and U+00BB have no direct mappings suggested for U+003c and U+003E,
so this may be a case where the Web-app has implemented its own mapping.
Same for U+27E8 and U+27E9.  It's sort of uncommon for some developer to go
to that extreme, but I've seen it done too.



All of the others, like U+3008 to U+2039 have best-fit mappings suggested in
the Unicode documentation, located at
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit1252
.txt. It's important to realize that those mappings are not a standard,
they're provided as a convenience.  Chasing these characters down can prove
useful - you've found stuff, I've found stuff - but as you're well aware
it's more complicated than that.  Vendors can implement best-fit mappings
however they want.



In fact, many of the major frameworks do implement things differently,
including ICU, Java, .Net, and some of the native Windows libraries.  This
happens unbeknownst to many developers as strings get transformed along the
stack between API's that use wide chars and others that use native chars.
It also happens when:



  1. a given character doesn't have a direct mapping

  2. when it's been transcoded to a different character set, or

  3. just because the API's design chose to behave that way - see
http://msdn.microsoft.com/en-us/library/dd374047(VS.85).aspx and
http://msdn.microsoft.com/en-us/library/ms404377.aspx.



Good finds, fun fun,

Chris



PS. The upcoming release of our Watcher security testing tool includes
detection of character best-fit mappings in Web-apps.





-----Original Message-----
From: arian.evans () gmail com [mailto:arian.evans () gmail com] On Behalf Of
Arian J. Evans
Sent: Thursday, June 04, 2009 4:42 PM
To: Prasad Shenoy
Cc: 3APA3A; Full-Disclosure; websecurity () webappsec org
Subject: Re: [WEB SECURITY] Unicode Left/Right Pointing Double Angel
Quotation Mark bypass?



On Thu, Jun 4, 2009 at 4:22 PM, Prasad Shenoy <prasad.shenoy () gmail com>
wrote:

Has %uff1c %uff1e become very common?



We have seen 44 sites in the last year at WhiteHat Security that were

vulnerable to Fullwidth unicode-encoded attacks. This one tends to be

more ubiquitous than others when you find it. In the applications weak

to this -- we found roughly 200 locations vulnerable to attack in

those 44 applications, and each location would have multiple inputs,

so you are probably talking 1,000+ inputs vulnerable to attack using

this encoding.



I have found a few places where these

are still exploitable. Sometime in the coming week I will post my

observation from one particular encounter of this vulnerability to get
some

responses on what, why and how it is happening.





Interesting. I did some research here too, and found a new Unicode

standard that I think might be a culprit.



I won't be posting any more of the data in this thread. There is

simply too much of it



Jeremiah will be posting some of it at his blog below, and ultimately

there needs to be a good paper on canonicalization. None has yet been

written for the web world. The VXer crowd went through this in the

90's with all of their encoding-evasion techniques for viruses, and

then K2's Polymorphic Shell Code tool brought similar concepts to

payloads delivered across network protocols.



Now the same notions of multiple representations and re-assemblies of

data, in this case to form exploits, is rearing its head in the

webappsec world. Nothing is new under the sun. :) Attackers already

use encoding in the wild for SQL injection, and at least one XSS I

have seen.



Probably 50% of the encoding techniques I know of that can be

leveraged to form attacks -- I cannot even find documented. So I know

our community has some large knowledge gaps on this subject at the

moment and needs more work here.



-ae







This email gave a good head start.....



Cheers,

Prasad Shenoy



On Thu, Jun 4, 2009 at 6:10 PM, Arian J. Evans
<arian.evans () anachronic com>

wrote:



Hello 3APA3A -- Remember this thread you started 2 years ago? Long

Time no discussion on this topic... :)



Turns out you were spot-on. We verified six different variants of

this. Jeremiah Grossman published details on his blog:






http://jeremiahgrossman.blogspot.com/2009/06/results-unicode-leftright-point
ing.html



It is important to note that when you read the number counts that say:



11 exploitable XSS in 8 websites:

%u00ABscript%u00BB



The count of "11" is "11 /path/ locations or forms in a web

application", not "11 vulnerable inputs". The location might be a .cgi

or a servlet, with 1 or dozens of inputs in that same location that

are all "vulnerable" to the same attack technique.



(We call the individual inputs "attack vectors" instead of

"vulnerabilities" to help people group them and make them more

actionable. e.g.-people usually don't go fix one input, but instead

fix the CGI, servlet, form-input/request-handler and all the

associated inputs at once. So reporting each input individually

doesn't provide any benefit besides make reports bigger.)



Anyway, there are many more of these kind of

false-familiar/transliteral transcoding and canonicalization issues.



I will continue to feed anything interesting to Jeremiah and it will

probably wind up on his blog.



Thanks again for opening my mind up to some new angles for

filter-evasion tricks! :)



ciao



--

Arian Evans

I invest most of my money in motorcycles, mistresses, and martinis.

The rest of it I squander.









On Tue, May 22, 2007 at 9:52 AM, Arian J. Evans <arian () anachronic com>

wrote:



I'll let you know if this hits. I am running this test currently on

about 600 + sites.



-ae



On 5/22/07, 3APA3A < 3APA3A () security nnov ru> wrote:



Dear full-disclosure () lists grok org uk,



  By  the  way:  I saw Unicode Left Pointing Double Angel Quotation

Mark

  (%u00AB) / Unicode Right Pointing Double Angel Quotation Mark

(%u00BB)

  are  sometimes  translated  to '<' and '>'. Does somebody

experimented

  with



  %u00ABscript%u00BB



  in different environments to bypass filtering in this way?



--

http://securityvulns.com/

         /\_/\

        { , . }     |\

+--oQQo->{ ^ }<-----+ \

|  ZARAZA  U  3APA3A   } You know my name - look up my number (The

Beatles)

+-------------o66o--+ /

                    |/






----------------------------------------------------------------------------

Join us on IRC: irc.freenode.net #webappsec



Have a question? Search The Web Security Mailing List Archives:

http://www.webappsec.org/lists/websecurity/archive/



Subscribe via RSS:

http://www.webappsec.org/rss/websecurity.rss [RSS Feed]



Join WASC on LinkedIn

http://www.linkedin.com/e/gis/83336/4B20E4374DBA









--

Thought for the day -

"Emails can hurt feelings. If this one did, please ignore your feelings."





----------------------------------------------------------------------------

Join us on IRC: irc.freenode.net #webappsec



Have a question? Search The Web Security Mailing List Archives:

http://www.webappsec.org/lists/websecurity/archive/



Subscribe via RSS:

http://www.webappsec.org/rss/websecurity.rss [RSS Feed]



Join WASC on LinkedIn

http://www.linkedin.com/e/gis/83336/4B20E4374DBA

_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

Current thread: