Secure Coding mailing list archives

BSIMM: Confessions of a Software SecurityAlchemist(informIT)


From: gem at cigital.com (Gary McGraw)
Date: Wed, 25 Mar 2009 11:42:20 -0400

Hi Andy,

The code/data mix is certainly a problem.  Also a problem is the way stacks grow on many particular machines, 
especially with common C/C++ compilers.  You noted a Burroughs where things were done better.  There are many others.  
C is usually just a sloppy mess by default.

Language choice can sometimes make up for bad machine architecture, but ultimately at some level of computational 
abstraction they come to be the same thing.  You may recall that I am a scheme guy.  TI made a scheme machine that 
never caught on some years back (around the same time as the LISP machine...like emacs only even more bindings at least 
on the Symbolics <http://en.wikipedia.org/wiki/Lisp_machine>).  Those machines had a fundamentally different 
architecture at the processor level.

In any case, type safety is at the root of these decisions and makes a HUGE difference.  Go back and read your lambda 
calculus, think about closure, symbolic representation, continuations, and first class objects and I think you'll see 
what I mean.  http://en.wikipedia.org/wiki/Lambda_calculus

gem
(supposedly still on vacation, but it is a rainy day)

http://www.cigital.com/~gem


On 3/24/09 2:50 PM, "Andy Steingruebl" <steingra at gmail.com> wrote:


On Mon, Mar 23, 2009 at 7:22 AM, Gary McGraw <gem at cigital.com> wrote:
hi guys,

I think there is a bit of confusion here WRT "root" problems.  In C, the main problem is not simply strings and string 
representation, but rather that the "sea of bits" can be recast to represent most anything.  The technical term for the 
problem is the problem of type safety.  C is not type safe.

Really?  It isn't that the standard von Neumann architecture doesn't differentiate between data and code?  We've gone 
over this ground before with stack-machines like the Burroughs B5500 series which were not susceptible to buffer 
overflows that changed control flow because code and data were truly distinct chunks of memory.

Sure its a different programming/hardware model, but if you want to fix the root cause you'll have to go deeper than 
language choice right?  You might have other tradeoffs but the core problem here isn't just type safety.

Just like in the HTML example.  The core problem is that the language/format mixes code and data with no way to 
differentiate between them.

Or is my brain working too slowly today?



Current thread: