oss-sec mailing list archives

Re: [PATCH 2/2] execve: check the VM has enough memory at first


From: Linus Torvalds <torvalds () linux-foundation org>
Date: Thu, 16 Sep 2010 08:01:30 -0700

2010/9/15 KOSAKI Motohiro <kosaki.motohiro () jp fujitsu com>:

Briefly says, to introduce new limit has bad benefit/risk balance. Sadly.

Well, I mostly agree. That said, I do think we could extend the
limiter some ways.

For example, I think the "stack limit / 4" is perfectly sane, but it
would make total sense to perhaps also take into account the AS and
RSS limits.

And I do think that your attempt to use __vm_enough_memory() was good.
It happens to be coded in a way that makes it useless for a one-pass
model, and some of what it does would be too expensive to do up-front
when you can't short-circuit it, but I do think that it would probably
be appropriate to at least try to take the _rough_ code there and use
it as a limit for maximum stack size too.

For example, we could have a function somewhat like

    unsigned long max_stack_size(void)
   {
        unsigned long allowed, used, limit;

        switch (sysctl_overcommit_memory) {
        case OVERCOMMIT_ALWAYS:
                allowed = ULONG_MAX;
                break;
        case OVERCOMMIT_GUESS:
                .. maybe we can come up with some upper bound here too ..
                break;
        default:
                allowed = (totalram_pages - hugetlb_total_pages())
                        * sysctl_overcommit_ratio / 100;
                if (!cap_sys_admin)
                        allowed -= allowed / 32;
                allowed += total_swap_pages;
                /* Don't let a single process grow too big:
                   leave 3% of the size of this process for other processes */
                if (mm)
                        allowed -= mm->total_vm / 32;
                /* What is already committed to? */
                used = percpu_counter_read_positive(&vm_committed_as);
                if (used > allowed)
                        return 0;
                allowed -= used;
                break;
        }
        limit = ACCESS_ONCE(rlim[RLIMIT_STACK].rlim_cur) / 4;
        if (allowed > limit)
                allowed = limit;
        return allowed;
    }

which we'd call once at the beginning of the execve(), and then
remember that result and use it instead of the current 'rlimit/4'
value.

Now, admittedly the OVERCOMMIT_GUESS case is the interesting one, and
the one that is hard to write efficiently. But maybe we could make
'nr_free_pages()' cheap enough that doin that whole OVERCOMMIT_GUESS
"approximate free pages" thing from __vm_enough_memory would work out
too?

I dunno. It doesn't look hopeless.

                      Linus


Current thread: