Vulnerability Development mailing list archives

Re: CROSS SITE-SCRIPTING Protection with PHP


From: "Sverre H. Huseby" <shh () thathost com>
Date: Wed, 16 Oct 2002 22:48:52 +0200

[b0iler]

|   Also, you are just sending the inputed values of parameters.  What
|   about the names of the parameter (the $key variables)?  They could
|   contain potentially dangerous XSS which is often printed to the
|   client.  Also, user input (GPC) is not the only tainted data in a
|   script.  Any data that comes from an outside source is potientally
|   dangerous. Files, databases, ENV variables, etc.. need to be
|   treated as if it contains the most clever tricks to evade your
|   filtering and protection schemes.

Correct.  And I've tried to say  the same quite a few times on several
securityfocus lists the last two years.

We need to shift the focus away from _input_.  Input is never trouble-
some in  itself.  It first gets  troublesome when put in  a context in
which it is  interpreted in some way.  And then  again only when parts
of it  will not be interpreted  as plain data, but  as something else.
As b0iler (whoever that is :)  ) correctly states above, data from the
inside may cause  just as much trouble as data  from the outside.  And
it may do so deep inside a multi-tier system, far from the web layer.

It's when  data is  passed somewhere for  interpretation that  it gets
troublesome.  We should thus pay attention to the format of the data
whenever we _pass_it_along_,  rather than when we receive  it from the
outside.  Web applications tend to pass data along all the time:

  * to database servers, often  by concatenating the data with strings
    containing  SQL constructs,  or  by using  some  kind of  prepared
    statement mechanism (much better).

  * to shell command interpreters (yikes!).

  * to the  OS by sending file names to  file handling functions, host
    names to name resolutions libraries and so on.  (a large amount of
    "so on" for the OS.)

  * to legacy systems written in some obscure language using some equ-
    ally obscure protocol.

  * to other web servers (B2B) using XML, URL parameters or whatever.

  * to other processes running on the same server, using some
    internally made protocol.

  * many, many, many more...

  * and  last, but not least, to  the web browser of  the user.  Which
    luckily is just another  sub-system, covered by the same rule as
    the rest.

And to repeat: "Data" is not only user input.  It is anything, no mat-
ter  the source.  Every  system we  pass data  to has  its own  way of
interpreting  it, and  the  interpretation depends  on context.   Some
examples:

  * when building strings  containing SQL queries, the quote character
    may cause trouble if it  appears prematurely in an SQL string con-
    stant.  _Any_  data passed  as part of  an SQL  statement _that_is
    _to_be_interpreted_as_a_string_constant_ will  need to have quotes
    escaped in some way.  (No,  we can't generally forbid quotes.  How
    would I  be able to write "can't"  a few words back  if you forbid
    the quote?)   And no,  we can't generally  escape quotes  at input
    time  either, because  then they  will look  rather funny  for the
    _other_  sub-systems,  in which  quotes  have  no special  meaning
    (eg. a text file or the user's browser).

    For  more on  this, see  another vuln-dev-mail  of  mine available
    here:

      http://shh.thathost.com/text/passing-data-03.txt

  * when talking to the OS, null-bytes may create confusion when pass-
    ing strings, as the OS (written in C, normally) treats the '\0' as
    a string terminator.  Most  "modern" languages do not.  We'll gen-
    erally need  to pay attention  to null-bytes when talking  to sub-
    systems written  in C.  The reason  is generally that  our view of
    the string will differ from the view taken by the OS.

    But there are  other things as well.  If we  pass a _file_name_ to
    the OS, we may need to  pay attention to slashes (and for some ob-
    scure  OSes, backslashes) and  double-dots as  well, as  they will
    switch context from _file_ to _directory_.

    And hundreds  of other examples  on how talking to  one particular
    sub-function (eg. open())  of a sub-system (eg. the  OS) will need
    careful handling of a selected set of characters.

  * and then comes the browser  again.  The HTML parser in the browser
    gives  special meaning  to  < (tag  start)  , >  (tag  end) and  &
    (character entity).  And if inside those < and >, suddenly " and '
    (both attribute  value encapsulators)  may have a  special meaning
    too.   We'll need to  escape them  somehow, so  that they  are not
    treated  as special  characters, but  rather as  plain characters.
    The correct  way is to  use HTML encoding  (as most of  you know).
    The  wrong way (generally)  is to  replace the  special characters
    with nothing.  Imagine all the complaints you will get if you make
    a discussion forum for mathematicians, and disallow < and > ...

It  is generally  _not_possible_ to  fetch data  from the  request and
start by doing  something to it that will match  all the possible sub-
systems in one go.  Not  without giving severe restrictions as to what
the data may contain.  ("Sorry, Sinead,  but your name will have to be
OConnor for  now").  And  not without introducing  strange appearances
for some of the sub-systems.  ("Welcome, Sinead O\'Connor").

Input validation has  been given _far_ to much focus.   It may be good
as a first  measure, to be able to give users  nice feedback when data
don't match the  business rules and other high  level rules ("the file
name is not supposed to contain directory elements"), but it generally
won't solve the low level problems.  In systems over toy size, data is
passed between many different  sub-systems, which often have different
meta-characters  that may be  abused.  People  who believe  that input
validation at the web layer  will avoid security problems several lay-
ers down below (or when data come back to the first layer again), have
given the issue too little thought, IMNSHO.

Focus on input validation, but focus even more on handling every poss-
ible meta-character,  meta-byte, meta-word or  whatever before passing
the data  along to  the next sub-system,  whatever that is.   And that
rule goes for every layer of the application, not just the web layer.


Sverre - who feels this discussion  would fit better at webappsec than
         at vuln-dev.

-- 
shh () thathost com             Computer Geek?  Try my Nerd Quiz
http://shh.thathost.com/        http://nerdquiz.thathost.com/


Current thread: