WebApp Sec mailing list archives

Canicalization Of User Input In PHP


From: <warnings () envisagement com>
Date: Sun, 16 Jan 2005 22:40:11 -0800

I am working on implementing a basic PHP user input validation scheme
and have come across several references to canonicalizing input before
performing validation. After researching this topic on the net I have finally
reached a point where I feel okay asking for help.

At this point I have found a few basic functions related to this subject, but
I am getting lost in alphabet soup (UTF-8, RFC 2279, ISO 10646, ...) and
I am reaching a momentary saturation point where I am finding the learning
curve is only getting steeper with the more I learn.

For the basic validation I have found the following set of PHP filters via the
owasp.org site.

http://www.owasp.org/software/labs/phpfilters.html
// sanitize.inc.php
// Sanitization functions for PHP
// by: Gavin Zuchlinski, Jamie Pratt, Hokkaido
// webpage: http://libox.net
// Last modified: December 21, 2003

Now these functions are fairly clear and easy to understand and have
generally validated what I have come to understand as best practices.
as I have experience with fault tolerant coding, just not security. But, the issue I am having trouble coming to terms with is canonicalization of the data.
Beyond the above routines, I have also found the urldecode() function in
the PHP manual.

At this point I feel (weakly, not securely) that one should use the following
to canonicalize the data prior to validating any input.

reset($_GET);
foreach($_GET as $key => $value){
   // Transform to canonical form.
   $ckey = my_utf8_decode(urldecode($key));
   $cvalue = my_utf8_decode(urldecode($value));
   if( $ckey != sanitize_paranoid_string($ckey) ||
           $cvalue != sanitize_paranoid_string($cvalue) ){
       header('location:www.somesight.net/index.php');
   }
}

I understand this example is simplistic, but is this a proper way
to canonicalize the input values?  Or am I missing something here?

Should I be looking at the following too?

$_SERVER['CONTENT_TYPE'] == 'application/x-www-form-urlencoded'

Is this data even trustworthy? I would at first guess think it could be forged in
the header data.

Any input would be appreciated.

thanks,

Sean

Current thread: