Dailydave mailing list archives

technical details on the PaX privilege elevation security bug


From: pageexec () freemail hu
Date: Thu, 10 Mar 2005 02:53:06 +0100

       technical details on the PaX privilege elevation security bug

to understand the problem, first some introduction into the linux VM is
in order. for simplicity, i'll just talk about the normal 2-level i386
 paging setup, other archs are similarly affected though.

in linux every task has resources like an address space, file descriptors,
signal handlers, whatnot. of importance here is the address space (mm_struct),
which may or may not be shared between tasks (a shared address space is
what other systems call a process with threads in it). the address space
is a container of various mappings that the task can create/modify/destroy
using various syscalls like mmap/mprotect/munmap. each mapping in the
address space is described by a vm_area_struct structure (mm_struct.mmap
points to the head of the vm_area_struct list). this structure holds the
mapping's start/end addresses, the backing file info, access rights, etc,
you can observe them in /proc/pid/maps.

the linux VM is demand-paged, that is, the actual virtual to physical mappings
are created lazily, on the first access to a given virtual address/page.
during this first access the VM constructs the paging structures using
information from the vm_area_struct. of importance is the page table page
and the corresponding page directory entry that the VM allocates for every
4 MB of virtual address space. in a typical dynamically linked program we
would find 3 page table pages allocated on startup as the main executable,
ld.so/glibc and the stack all fall into separate 4 MB regions. when a task
unmaps a given mapping, the linux VM will perform a quick lookup to see if
the corresponding page table page can be freed as well or not, that is,
whether a given 4 MB virtual address region has just become completely empty
or not.

the core of destroying a mapping is in mm/mmap.c:do_munmap(). the original
algorithm is as follows: first, we enumerate all vm_area_struct structures
that fall within the to-be-unmapped region and remove them from the mm->mmap
list. next, we walk through the removed vm_area_struct structures and remove
(free) every non-empty page table entry corresponding to the given mapping's
virtual address region. this process leaves empty page table entries behind.
last but not least comes the above mentioned attempt at freeing up a page
table page as well (mm/mmap.c:free_pgtables()), this will occur if the page
becomes empty due to all mappings having been removed from the given 4 MB
region.

let's see the bug now. the reference code is in linux 2.4, 2.2 is almost
the same, 2.6 is quite different and you shouldn't be using it anyway.

vma mirroring must handle unmapping as well. the (buggy) algorithm is as
follows: first, while we enumerate the normal vm_area_struct structures
we also check if any of them are mirrored, if so, we extract the mirrors
as well and put them on another temporary list. next, we remove (free) the
page table entries as usual, for both sets of vm_area_struct structures.

so far so good, however there's a catch in the last step: free_pgtables()
takes a virtual address region as its argument and verifies that there are
no vm_area_struct structures in there - using the mm->mmap list. this is
not a problem without vma mirroring as free_pgtables() is called once per
do_unmap(), so by the time free_pgtables() is entered, all page table
entries in the given region have been freed and free_pgtables() is going
to free up empty page table pages. not so with the buggy vma mirroring,
however. since i wanted to make the vma mirroring code generic, i didn't
want to assume that all vma mirrors are of the same distance from the
mirrored vmas, therefore i couldn't make a single free_pgtables() call that
would cover all the mirrors (and only them) at once, instead, i chose to
loop through this second vma list and call free_pgtables() for each vma
separately. this is however a very bad idea: free_pgtables() checks only
the mm->mmap list for overlapping vmas, and it will blissfully ignore the
vmas extracted and put on the second list, waiting to be cleared up properly.

so if we happen to unmap two mirrored vmas at once that fall into the same
4 MB virtual address region then only the first one would see its page table
entries freed up properly, those of the second would leak and stay untouched
in the freed page table page. this alone amounts to a refcount leak on every
page that was mapped in there (and is exploitable itself), but there's worse.

since the 2.2 days linux maintains a per-CPU quicklist of freshly freed page
table pages (on the assumption that they are frequently needed and less
pressure on the VM page allocator should be better for performance). so in
our case we would end up putting page table pages onto this list that are
not completely clear. therefore next time such a page table page is allocated
(which can be in a completely different task than the one that freed it), it
will unexpectedly contain valid virtual/physical mappings - essentially
introducing new data/code into the victim task. exploiting this is left as
an exercise for the reader (this is just to hide the fact that i suck at
writing exploits as well ;P).

_______________________________________________
Dailydave mailing list
Dailydave () lists immunitysec com
https://lists.immunitysec.com/mailman/listinfo/dailydave


Current thread: