oss-sec mailing list archives

Re: kernel: Dangerous interaction between clear_child_tid, set_fs(), and kernel oopses


From: Nelson Elhage <nelhage () ksplice com>
Date: Wed, 8 Dec 2010 10:34:38 -0500

On Wed, Dec 08, 2010 at 07:51:18AM +0300, Solar Designer wrote:
Nelson, Dan, Steve -

It's been a few days, so I'll over-quote a little bit.  Please see below:

On Thu, Dec 02, 2010 at 12:21:14AM -0500, Nelson Elhage wrote:
I've discovered an interesting interaction in the Linux kernel between the
clear_child_tid feature of clone(2), and the set_fs() function used internally
in the kernel to temporarily disable access_ok() checking of userspace pointers.

Under some (not totally uncommon) circumstances, it is possible for a user to
leverage this interaction to turn a kernel oops or BUG() into a write of an
integer 0 to a user-controlled address in kernel memory.

I'm not sure if this merits a CVE or not; It is (as far as I can tell) only a
problem in the presence of another security bug, but it potentially makes a
large class of bugs significantly more dangerous (DoS -> privesc).

Reference:
https://lkml.org/lkml/2010/12/1/543

To me, things like this are more important than individual NULL pointer
dereference bugs or the like.  So if those get CVEs, this one definitely
should as well.

Yeah, as you saw, Dan requested a CVE separately and this is CVE-2010-4258.


Nelson - why are you proposing adding set_fs(USER_DS); not to the very
beginning of do_exit(), but below a few calls/checks?  I don't think
there's any performance improvement from that, and it feels
"theoretically safer" to return to the sane/safe state as soon as
possible.  I am currently looking at do_exit() in OpenVZ's RHEL5-based
2.6.18-194.26.1.el5.028stab079.1 - it does a bit more work before
reaching the place you patch.  So I am tempted to introduce
set_fs(USER_DS); as the very first statement in do_exit() instead.

I put the set_fs() after the in_interrupt() check, since set_fs() frobs the
current thread_info, and IIUC, we aren't guaranteed to have one on an interrupt
stack. So I wanted to preserve that check/immediate panic(), rather than
possible triggering a recursive fault or other weird behavior. Other than that,
I stuck it as early as possible.

If I'm wrong about it possibly failing on an interrupt stack, then yeah, it
might make sense to put it even earlier. Or to rearrange things so that the flow
is "check interrupt -> set_fs() -> everything else".


Did you check whether 2.4 kernels are affected as well?

I have not. My man pages claim that CLONE_CHILD_CLEARTID is new since Linux
2.5.49, so that specific hole probably isn't there, though.

- Nelson


Thanks,

Alexander


Current thread: