Secure Coding mailing list archives

Re: Scripting Languages and Secure Coding + code


From: Carsten Kuckuk <ck () kuckuk com>
Date: Tue, 09 Dec 2003 14:40:43 +0000

DMW> Then hard-wiring the column names could even be considered bad. Imagine
DMW> you have a 'get_person()' procedure which is designed to return all
DMW> known fields for the given person. If you want to add a new field to
DMW> your 'persons' table, you now have to modify both the procedure and your
DMW> database.

It depends on what you do later on in your program with these values.
If you work with a weakly typed language like PHP, you'll run into
problems later on when you end up in a part of your program where you
rely on the existence of a certain column of your table - be it the
name, the oid, an account number, street name, or so. You can only get
away with the "select *" approach if your application is fully generic
- but then it's not useful at all. Applications get their value from
the business logic that is incorporated in them and this business
logic relies on the availability of certain tables and columns - a
certain schema (or part thereof). And as this is the case, you should
as well make this dependence explicit in your code so that your
compilers and preprocessors (in the embedded SQL case) can point out
mistakes as soon as possible, especially when your code and database
evolve over time. The compiler (and the type system of your
programming language) is your friend.

If you want decouple the column names your program uses from the
column names your database schema uses, then you should use database
"views" for mapping names.

DMW> I tend to think of 'SELECT *' as 'tell me everything you know about...',
DMW> and thus it has a different meaning from 'SELECT First,Last' ('tell me
DMW> the first and last name of...').

Unless you write a generic database browser, you should always know
which data you handle and how you handle it. If this is not the case,
the code you're about to write has no well-defined semantics and what
you do is playing around but not programming.

(4) Reading data from the datafiles is expensive for the server. So
the less data your server has to read in order to compute the result
of the query, the better.

DMW> For smaller column types, I'd be surprised to see a modern database
DMW> server that attempts to read around a given column, eg.:

DMW>    [COL1][COL2][COL3]

DMW> If this was the format of a single record on disk, then the SQL server
DMW> might simply do a read(..., 15) to get the record. If say, you requested
DMW> the following:

DMW>    SELECT col1, col3

DMW> Do you think the SQL server is going to perform two seperate read()'s to
DMW> skip the column you didn't request? The cost (time taken) of the extra
DMW> system call is almost certainly more than the cost of skipping the data
DMW> once it is in your server's memory.

DMW> This makes the assumption that the records are stored on disk
DMW> sequentially.. if one of the columns was a BLOB column and your SQL
DMW> server stored them seperately, then skipping the extra column would of
DMW> course save a lookup, read, etc. in another file, but my point is simply
DMW> that just because you are skipping columns doesn't mean they aren't
DMW> getting read.

This is only the case when you deal with simple column types like
numbers, boolean values and strings of fixed length. As soon as you
use Varchars or even Blobs all bets are off. Varchars could be stored
inline which would then cause variable-length records (bad for
direct-access). They could also be implemented as pointers to the end
of the record where all varchars get concatinated in the hope that
there's enough space available there for all strings, but every now
and then you'll need an overflow sector. Or the varchars get stored in
a totally different area of the database. The point here is: (1) It's
implementation-dependent, and (2) The odds are very high that the
presence of Varchars leads to several disk reads.









Current thread: