WebApp Sec mailing list archives

RE: Input validation


From: "Dawes, Rogan (ZA - Johannesburg)" <rdawes () deloitte co za>
Date: Fri, 20 Jun 2003 09:10:41 +0200

I rather like the approach of performing "boundary quoting" (I don't think
this was the original description, but it serves well enough).

The idea is that you make sure to appropriately "escape" all data crossing
boundaries between systems.

For example, when formulating a SQL query based on data received from the
browser, make sure to escape any quote characters.

When writing data out to the browser, make sure to escape any angle brackets
as HTML entities.

When writing data out to an XML stream, make sure to escape any angle
brackets, ampersands.

When writing out data to an error log, make sure to escape any embedded
carriage returns, or other control characters, spaces, etc that could cause
problems when automatically parsing the file (or even manually reading it
c.f. xterm escape codes).

That way, you can still operate with the "malformed" data, and have audit
trails that show what was really submitted. You can always decide not to
continue processing, but the data will remain just that - data - and not
cross the line into "metadata" (HTML code, line separators, SQL injections,
XSS, etc)

This can have a downside in terms of performance, but in terms of
correctness, I think it is a better approach.

So, if as a different poster asked, you need to allow some tags, and
disallow others, your HTML output quoting function would need to implement
selective quoting (preferably from a white-list, with specific allowed
tag-attributes, with the tag attributes appropriately quoted, etc.

Rogan

-----Original Message-----
From: Kooper, Larry [mailto:Larry.Kooper () metmuseum org] 
Sent: 19 June 2003 07:39 PM
To: 'webappsec () securityfocus com'
Subject: Input validation


I am a newbie to this list - apologies if this question is 
often asked.  (I
don't know if the list has a FAQ).  

When securing a web site against attacks such as SQL 
injection and XSS, what
approach do you recommend following to validate user input?  

1) Attempt to massage data so that it becomes valid
2) Reject input that is known to be bad
3) Accept only input that is known to be good

(The three categories are taken from a paper here-
http://www.nextgenss.com/papers/advanced_sql_injection.pdf ,p22)

The problem with solutions 1 and 2 is that you may miss some 
forms of bad
input.  Another subtle problem with solution 1 and 2 is that 
sometimes bad
input can be embedded in good input.  For example, if someone 
searches for
"director's selections" the string "select" would be rejected 
(as a SQL
command), resulting in "director's ions." 

Solution 3 seems like the most secure but also the most expensive to
implement.  And the problem seems more difficult when 
validating free-format
fields such as a name or an address.  One could reject 
non-alphanumeric
characters, but then things like # (for apartment number) or 
- (hyphen)
would be kicked out. Any thoughts?

Thanks,

Larry Kooper
Manager, Internet Technologies 
The Metropolitan Museum of Art 


Important Notice: This email is subject to important restrictions, qualifications and disclaimers ("the Disclaimer") 
that must be accessed and read by clicking here or by copying and pasting the following address into your Internet 
browser's address bar: http://www.Deloitte.co.za/Disc.htm. The Disclaimer is deemed to form part of the content of this 
email in terms of Section 11 of the Electronic Communications and Transactions Act, 25 of 2002. If you cannot access 
the Disclaimer, please obtain a copy thereof from us by sending an email to ClientServiceCentre () Deloitte co za.


Current thread: