Vulnerability Development mailing list archives

Re: BitchX /ignore bug


From: friedl () MTNDEW COM (Stephen J. Friedl)
Date: Tue, 4 Jul 2000 19:08:36 -0700


BlueBoar asked that I post this to the list as a whole, so I've expanded on
it a bit.

BlueBoar wrote:
I've seen a number of these print string vulnerabilities pop up
lately.  I gather that the programmer writes their printf or equiv
wrong, and these attacks are getting interpreted as formatting strings
somehow.

The printf() functions (with siblings fprintf and sprintf) take a string
parameter that is a format string, and it contains little % tokens that
represent the type and interpretation of the parameters that follow. Unlike
many other languages, C permits variadic functions and the I/O is not built
into the language.

        printf("string=%s,  number=%d,   float=%f\n",
                "hello, world",
                1234,
                3.1415);

Except on the most unbelivably bizarre platforms, these parameters are
generally passed on the stack in the usual order for that architecture. On
the Intel machines, for instance, they params are push right to left and
the stack grows down. Other architectures can and do either of these
differently.

So this call would be:

        push    high word of double     (a slice of pi?)
        push    low  word of double
        push    1234
        push    addr of "hello, world"
        push    addr of "string=%s, number=%d, float=%f\n"
        call    printf
        add     sp, $20

Once inside printf(), it essentially takes the address of the one fixed
parameter -- the format string -- and walks up the stack getting parameters
as suggested by the % tokens. Fetch a string (by pointer), a double, or
whatever. The stack looks something like this after the call to printf:

        +------------+
        | local var  |  local to calling function, but not touched here
        +------------+
        | high of pi |
        +------------+
        | low of pi  |
        +------------+
        |   1234     |
        +------------+
        | stringaddr | --------> "hello, world"
        +------------+
FMT->   | stringaddr | --------> "string=%s, number=%d, float=%f\n"
        +------------+
        |   old PC   |  (aka "return address")
        +------------+
        |old frameptr|  (on Intel, it's EBP)
        +------------+
        | local #1   |  local param of *printf* function
        +------------+
        | local #2   |
        +------------+
        |  ,,,,,,,,, |

What's important here is that (generally speaking) printf has no way of
knowing how many parameters were *actually* pushed onto the stack. It has
to trust the format string, and it's entirely possible to make a mistake:

        printf("hello, world = %s\n");

        push    address of "hello, world"

Oops!  The argument pointer will be looking at random data for the sting

        +------------+
        | local var  |  local to calling function       * IS THIS A STRING? *
        +------------+
FMT->   | stringaddr | --------> "hello, world = %s\n"
        +------------+
        |   old PC   |  (aka "return address")
        +------------+
        |old frameptr|  (on Intel, it's EBP)
        +------------+
        | local #1   |  local param of *printf* function
        +------------+
        | local #2   |
        +------------+
        |  ,,,,,,,,, |

When printf sees the %s in the format string, it looks next up the stack
and grabs the next word found there. This word value has nothing to do with
anything interesting -- it's essentially random -- and treates it like a
pointer to a string (an array of characters). Then it follows it until it
finds a NUL byte, and you usually either get garbage or the program core
dumps. The latter is due to accessing memory that's either zero (NULL
pointer dereference) or reading memory you're not allowed to be in.

If you have the source you can see what the last local variable is and make
some guesses about what value might be used, but this won't be easy to
determine without the source and even harder to find a way to do something
with. But in no case does this permit random code execution. Printf does
have internal buffers, but it's smart enough to never overflow them.

What's happening with these recent bugs is that some extremely sloppy
programmer has passed an unchecked string to printf *as the format
parameter*, and this is simply wrong. You NEVER EVER do something like this:

        printf("Enter your name: "); gets(namebuf);

        printf(namebuf);        /* BOOM */

Here your formats are at the mercy of the user, and it's simply never done.
This is much, much sloppier than the buffer overflow problem, because doing
this correctly is so easy:

        printf("%s", namebuf);

gets it right every time.

In the printf() case (and fprintf, which writes to a FILE pointer) there is
simply no opportunity for anything other than random DoS, though you can
increase your chances by putting lots of %s in the string: "%s%s%s%s%s"
will follow five strings, not just one, and you're much more likely to
break things this way.

Now it is technically possible to get somewhere with the sprintf()
function, which formats to a string buffer instead of a file. What would
happen here is that the random garbage would be copied to the user's
buffer, and random garbage is always much longer than any string buffer
that you could find. If the buffer is on the stack, you have a buffer
overflow exploit, but you have very little control over the random garbage.

But in any case, I think the sprintf() case is really unlikely. Though the
"printf"-as-"print-string" idiom is common (but wrong), it's just not used
as a copy-string idiom. That's what strcpy() is for, and I don't think I've
ever seen it used in this wrong way.

Finally we get to the user-defined formatting functions vsprintf() and
friends. This is where you write your own printf-like function that
integrates into your application:

        logprintf(LEVEL5, "hello, world = %d", n);

This might format into a fixed-length buffer before sending to a logfile or
something, and this lends itself to the mistaken use of:

        logprintf(LEVEL5, userbuf);

You are much more likely to get an overflow with the *contents* of userbuf
than you are in including some % token and hoping that random data will
happen to be shellcode.

What's more likely is the fishing of data from an application that they
don't want visible. Since these phantom % tokens are walking up the stack
picking up local variables, if any of them are intersting you can possibly
see them.

On my Intel Linux box:

        $ cat secret.c
        #include <stdio.h>

        int main()
        {
        char    userbuf[254];
        double  pi = 3.1415;
        int     magicnum = 1234;
        char    *secret = "secret password!";

                gets(userbuf);
                printf(userbuf);        // user-controlled format string! Bad!
                return 0;
        }

                $ cc -o secret secret.c
                $ ./secret
     me->       secret="%s" magic=%d pi=%f
output->        secret="secret password!" magic=1234 pi=3.141500

You can always use %d markers to skip past values you don't care about:

        %s                      is first string interesting?
        %d %s                   how about the second one?
        %d %d %s                ... and so on

Using "0x%08lx" prints the values in hex, which might ring some bells.

I guess upon reflection somebody will inevitably find a real shellcode
exploit for this, but it's going to be way, way harder than "regular"
buffer overflows.

Steve
Stephen J. Friedl / Software Consultant / Tustin, CA / 714-544-6561


Current thread: