Secure Coding mailing list archives

Re: Scripting Languages and Secure Coding + code


From: "David M. Wilson" <dw-securecoding.org () botanicus net>
Date: Tue, 09 Dec 2003 14:40:43 +0000

On Sun, Dec 07, 2003 at 06:51:06PM +0100, [EMAIL PROTECTED] wrote:

You should also get rid of the habbit of using "select * from"
statements. There are several reasons for this:

(1) The order of the result columns is unpredictable. (This won't hit
you when you develop the code but later on when you switch to another
database or somebody cleans us your database schema and reorders
columns)

(2) Somebody might add columns to this table in the future and break
your code

This is only true when you are requesting data from your DB API by way
of column number.

'SELECT *' can indeed be wasteful, but it is useful too. If a table is
designed to represent an object, eg.:

   id
   First
   Last
   Email
   Sex


Then hard-wiring the column names could even be considered bad. Imagine
you have a 'get_person()' procedure which is designed to return all
known fields for the given person. If you want to add a new field to
your 'persons' table, you now have to modify both the procedure and your
database.

I tend to think of 'SELECT *' as 'tell me everything you know about...',
and thus it has a different meaning from 'SELECT First,Last' ('tell me
the first and last name of...').

If you think about it like this, you should be able to choose between
'SELECT *' and 'SELECT <cols>' as you code in a rational manner.


(3) Communicating with the database is expensive (network traffic). So
why should you ask for all available columns to be transfered to you,
and then just throw them away?

Once again it is dependant on what you actually want. If you are
actually asking for all columns, then it isn't wasteful.


(4) Reading data from the datafiles is expensive for the server. So
the less data your server has to read in order to compute the result
of the query, the better.

For smaller column types, I'd be surprised to see a modern database
server that attempts to read around a given column, eg.:

   [COL1][COL2][COL3]

If this was the format of a single record on disk, then the SQL server
might simply do a read(..., 15) to get the record. If say, you requested
the following:

   SELECT col1, col3

Do you think the SQL server is going to perform two seperate read()'s to
skip the column you didn't request? The cost (time taken) of the extra
system call is almost certainly more than the cost of skipping the data
once it is in your server's memory.

This makes the assumption that the records are stored on disk
sequentially.. if one of the columns was a BLOB column and your SQL
server stored them seperately, then skipping the extra column would of
course save a lookup, read, etc. in another file, but my point is simply
that just because you are skipping columns doesn't mean they aren't
getting read.

Apologies for the long-winded explanation. :)


David.








Current thread: