WebApp Sec mailing list archives

Re: Canonicalization


From: Andrew van der Stock <vanderaj () greebo net>
Date: Sat, 22 Apr 2006 23:19:32 +1000

Rossen,

in answer to your query, sometimes canoncalization to a simplest form can be very challenging. For example, LDAP data by its nature must cope with names in many languages, including quirks of English (O'Neill, surnames with spaces, etc). Just because computer programs cannot handle a wide range of inputs safely doesn't mean these people (which includes me btw ... I hate being filed incorrectly under "S", or referred to as Mr Stock) should bend to the will of the computer.

Half the issue comes from bad query / data access patterns. We know that if we intermingle data and instructions/programs, injection is made trivially easy. Why do people promulgate new standards (XPath queries, etc) which encourage a poor pattern?

If anyone watches "Little Britain", this is basically along the lines of "The computer says noooo". It's not the computer. It's us.

So basically, what's needed is:

* Frameworks which canoncalize properly by default
* Frameworks that provide only non-intermingled data query / access patterns * Frameworks that provide easy access to de/encoding functions for a wide variety of data types to clean old data * System designers to consider the many different needs of human input. For example, airline booking systems really need to stop making a single word out of my surname. For legal reasons, my surname is three words. Making it one word means legally that it's not me flying.

I'd really like to return to the days when it was safe to take any old thing, but it's not probably (or achievable) any time soon.

thanks,
Andrew


On 21/04/2006, at 12:22 PM, Rossen Raykov wrote:

Andrew,

Is that “simplest form” achievable? One can perform many and different encodings making the task of decoding them very difficult and resource consuming. Usually it is cheaper and safeties to do semantic checkup and treat the input as erroneous if it does not confirm to the expected input format.

For example if you are expecting number anything different than a number is error. If you expect alphanumeric – verify if the input is composed only by alphas and numbers...

Attachment: smime.p7s
Description:


Current thread: