Secure Coding mailing list archives

Bugs and flaws -- Micro-tainting


From: dwheeler at ida.org (David A. Wheeler)
Date: Wed, 01 Feb 2006 13:36:24 -0500

Crispin Cowan declared:
...Validate your inputs.
There are automatic tools (taint and equivalent) that will check whether
you have validated your inputs. But they do *not* check the *quality* of
your validation of the input. Doing a consistency check on the file name
extension and the data interpreter type for the file is beyond (most?)
such checkers.

Aleks Kissinger (who interned here) and I have actually
done research in that direction, which we call "micro-tainting".
Instead of declaring an entire field as "tainted" or not,
you track individual units (typically individual characters).
If the character came from an untrusted source, it's tainted;
if it came from a trusted source (e.g., program constants) it's not.

You can do this dynamically by using a different string library
or changing the implementation of the standard string library
for the language you're using.  It's particularly easy if
you use 32bits for unicode characters, because you already
have some unused bits.  We've done this, and we're not the
only ones.  There's also PHP module that does this;
it even stores the taint information in a database,
so later data retrievals STILL know which characters came from
untrusted sources.  (The PHP folks came up with the idea
independently of us; we only learned of each other via Usenix,
after we'd both developed the idea further.)

More interesting to us was doing this _statically_, i.e.,
determining before run-time.  Aleks' ACSAC 2005 public talk
primarily discussed how to do it statically.

Of course, just tracking tainting isn't enough... you still need
to know what is acceptable to "output", and to WHERE.
But this only requires identifying specific output functions
that can be dangerous (system(), etc.) and a spec of WHAT
is okay.  We found regular expressions were actually a
simple and convenient way to express where taint would be
allowed (and where it wouldn't), though BNFs and any other
language definition system works just fine too.

Oh, and here's an interesting tidbit.  For the static case,
if you work backwards, when the check ?fails? you can
even trivially derive the input patterns that cause
security failures (and from that information it
should be easy to figure out how to fix it).

This was presented at ACSAC 2005 in the "Works in Progress" session.

Of course, this is all WAY beyond what typical language
implementations provide developers today.  But it's worth
knowing about.

--- David A. Wheeler





Current thread: