oss-sec mailing list archives
Re: Offset2lib: bypassing full ASLR on 64bit Linux
From: Daniel Micay <danielmicay () gmail com>
Date: Tue, 09 Dec 2014 15:24:39 -0500
On 09/12/14 11:18 AM, Steve Grubb wrote:
I studied this area 2 years ago for a gray hat talk and in preparation to help set the policy going forward for Fedora and RHEL. The general reason I've heard mentioned about why its not used as fully as possible is that it adds memory pages that can't be coalesced or consolidated because they are not the same.
AFAIK, it doesn't cause a significant increase in memory usage. The whole point of position independent code is that it can be reused across processes. Dynamic libraries are already fully position independent. Windows uses a relocation-only model rather than PIC, so their full ASLR has zero runtime cost after start-up. The downside is that use an awful hack to work around the reuse issue. It caches the random base and reuses it for all instances of the library / executable.
For Fedora and RHEL 7, the intended policy is PIE for all daemons, privileged apps, network facing, and parsers of untrusted media. Enforcement is not so easy. How do you identify in an automated way parsers of untrusted media? I have a script that can grade an installed system that uses rpms: http://people.redhat.com/sgrubb/files/rpm-chksec It has options to grade the system or an individual rpm for compliance with the intended policy.
Why not use it across the board on x86_64? The cost is in the range of 0-5% at the moment (usually ~1%) and will be reduced to nearly 0% in every case when the next GCC / binutils versions are released since it won't add indirection to global accesses.
During my research, I found a couple interesting things. These ASLR related tidbits below are pulled from the speech I gave about when open source security mechanisms don't work as intended (all measured on a 64 bit system): 1) On non-PIE applications, the heap doesn't get much randomization. Just 14 bits.
It seems that the dss section (sbrk) isn't randomized at all on a non-PaX kernel. #include <stdio.h> #include <stdlib.h> int main() { printf("%p\n", sbrk(0)); return 0; }
2) Also non-PIE applications seems to be some bias in the numbers chosen. This could be an effect of 14 bits of randomization. It did follow a bell curve such that guessing some addresses was much luckier than others. (I did not get a sample size large enough on PIE apps to see if the same bias could be measured.) 3) When using PIE, you pretty much got 29 bits of randomness everywhere. That lead to the question of why the heap on non-PIE is so limited in address scope. As I remember, there was some coupling with sbrk() that caused this....which might need revisiting.
Linux puts the dss section near the start of the address space because it grows up and the mmap base is chosen near the end of the address space. In an ideal world, dss wouldn't have existed at all. PaX has *much* higher entropy for mmap and I think it occasionally inserts a gap rather than just using a random base.
4) Then I started wondering about the heap when you use other memory manager libraries such as jemalloc. This turned out to be interesting. You get about 19 bits of randomness using it. Its not as bad as non-PIE glibc but not as good as PIE glibc. You also got the same amount of randomness whether the app was PIE or not. This is an area ripe for more experimenting, exploiting, and patching. Supposedly some of these heap managers use mmap as the underlying allocator. So, why aren't they getting 29 bits, too? :-)
In jemalloc, *all* memory is allocated via chunks and in a default build it never unmaps them. It does virtual memory management via a global red-black tree tracking extents of chunks. The default is 4M chunks aligned to 4M boundaries. It can obtain chunks from sbrk until failure and then use mmap, but the default is the opposite. It can distinguish a huge allocation (>=4M) from small/large ones managed inside a chunk with a header by checking for 4M alignment and can then find the metadata for the small/large allocs via an offset - which is how it has ~2% metadata overhead even for small sizes. Reducing the chunk size wouldn't hurt much for small/large allocations, but it will wipe out large size classes and turn them into huge allocs where global locking is required. You've made me realize that jemalloc has a potential security issue here. It uses getenv without an issetugid check for the MALLOC_CONF env variable so an attacker could control some aspects of the allocator for a setuid / setcap binary. I'll send a pull request fixing this. OpenBSD malloc does ASLR at an allocator level, which is quite neat. It uses a similar chunk allocation model but they're always page size and the number of cached chunks is limited.
Attachment:
signature.asc
Description: OpenPGP digital signature
Current thread:
- Re: Offset2lib: bypassing full ASLR on 64bit Linux, (continued)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Seth Arnold (Dec 05)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Daniel Micay (Dec 05)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Hanno Böck (Dec 06)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Pavel Labushev (Dec 05)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Daniel Micay (Dec 05)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Reed Loden (Dec 05)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Daniel Micay (Dec 05)
- Message not available
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Daniel Micay (Dec 05)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Florent Daigniere (Dec 06)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Daniel Micay (Dec 09)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Daniel Micay (Dec 09)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Loganaden Velvindron (Dec 09)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Steve Grubb (Dec 10)
- Re: Offset2lib: bypassing full ASLR on 64bit Linux Daniel Micay (Dec 10)