Dailydave mailing list archives
technical details on the PaX privilege elevation security bug
From: pageexec () freemail hu
Date: Thu, 10 Mar 2005 02:53:06 +0100
technical details on the PaX privilege elevation security bug to understand the problem, first some introduction into the linux VM is in order. for simplicity, i'll just talk about the normal 2-level i386 paging setup, other archs are similarly affected though. in linux every task has resources like an address space, file descriptors, signal handlers, whatnot. of importance here is the address space (mm_struct), which may or may not be shared between tasks (a shared address space is what other systems call a process with threads in it). the address space is a container of various mappings that the task can create/modify/destroy using various syscalls like mmap/mprotect/munmap. each mapping in the address space is described by a vm_area_struct structure (mm_struct.mmap points to the head of the vm_area_struct list). this structure holds the mapping's start/end addresses, the backing file info, access rights, etc, you can observe them in /proc/pid/maps. the linux VM is demand-paged, that is, the actual virtual to physical mappings are created lazily, on the first access to a given virtual address/page. during this first access the VM constructs the paging structures using information from the vm_area_struct. of importance is the page table page and the corresponding page directory entry that the VM allocates for every 4 MB of virtual address space. in a typical dynamically linked program we would find 3 page table pages allocated on startup as the main executable, ld.so/glibc and the stack all fall into separate 4 MB regions. when a task unmaps a given mapping, the linux VM will perform a quick lookup to see if the corresponding page table page can be freed as well or not, that is, whether a given 4 MB virtual address region has just become completely empty or not. the core of destroying a mapping is in mm/mmap.c:do_munmap(). the original algorithm is as follows: first, we enumerate all vm_area_struct structures that fall within the to-be-unmapped region and remove them from the mm->mmap list. next, we walk through the removed vm_area_struct structures and remove (free) every non-empty page table entry corresponding to the given mapping's virtual address region. this process leaves empty page table entries behind. last but not least comes the above mentioned attempt at freeing up a page table page as well (mm/mmap.c:free_pgtables()), this will occur if the page becomes empty due to all mappings having been removed from the given 4 MB region. let's see the bug now. the reference code is in linux 2.4, 2.2 is almost the same, 2.6 is quite different and you shouldn't be using it anyway. vma mirroring must handle unmapping as well. the (buggy) algorithm is as follows: first, while we enumerate the normal vm_area_struct structures we also check if any of them are mirrored, if so, we extract the mirrors as well and put them on another temporary list. next, we remove (free) the page table entries as usual, for both sets of vm_area_struct structures. so far so good, however there's a catch in the last step: free_pgtables() takes a virtual address region as its argument and verifies that there are no vm_area_struct structures in there - using the mm->mmap list. this is not a problem without vma mirroring as free_pgtables() is called once per do_unmap(), so by the time free_pgtables() is entered, all page table entries in the given region have been freed and free_pgtables() is going to free up empty page table pages. not so with the buggy vma mirroring, however. since i wanted to make the vma mirroring code generic, i didn't want to assume that all vma mirrors are of the same distance from the mirrored vmas, therefore i couldn't make a single free_pgtables() call that would cover all the mirrors (and only them) at once, instead, i chose to loop through this second vma list and call free_pgtables() for each vma separately. this is however a very bad idea: free_pgtables() checks only the mm->mmap list for overlapping vmas, and it will blissfully ignore the vmas extracted and put on the second list, waiting to be cleared up properly. so if we happen to unmap two mirrored vmas at once that fall into the same 4 MB virtual address region then only the first one would see its page table entries freed up properly, those of the second would leak and stay untouched in the freed page table page. this alone amounts to a refcount leak on every page that was mapped in there (and is exploitable itself), but there's worse. since the 2.2 days linux maintains a per-CPU quicklist of freshly freed page table pages (on the assumption that they are frequently needed and less pressure on the VM page allocator should be better for performance). so in our case we would end up putting page table pages onto this list that are not completely clear. therefore next time such a page table page is allocated (which can be in a completely different task than the one that freed it), it will unexpectedly contain valid virtual/physical mappings - essentially introducing new data/code into the victim task. exploiting this is left as an exercise for the reader (this is just to hide the fact that i suck at writing exploits as well ;P). _______________________________________________ Dailydave mailing list Dailydave () lists immunitysec com https://lists.immunitysec.com/mailman/listinfo/dailydave
Current thread:
- technical details on the PaX privilege elevation security bug pageexec (Mar 09)