oss-sec mailing list archives

Re: kernel: Dangerous interaction between clear_child_tid, set_fs(), and kernel oopses


From: Solar Designer <solar () openwall com>
Date: Thu, 9 Dec 2010 12:34:35 +0300

On Wed, Dec 08, 2010 at 10:34:38AM -0500, Nelson Elhage wrote:
On Wed, Dec 08, 2010 at 07:51:18AM +0300, Solar Designer wrote:
Nelson - why are you proposing adding set_fs(USER_DS); not to the very
beginning of do_exit(), but below a few calls/checks?  I don't think
there's any performance improvement from that, and it feels
"theoretically safer" to return to the sane/safe state as soon as
possible.  I am currently looking at do_exit() in OpenVZ's RHEL5-based
2.6.18-194.26.1.el5.028stab079.1 - it does a bit more work before
reaching the place you patch.  So I am tempted to introduce
set_fs(USER_DS); as the very first statement in do_exit() instead.

I put the set_fs() after the in_interrupt() check, since set_fs() frobs the
current thread_info, and IIUC, we aren't guaranteed to have one on an interrupt
stack. So I wanted to preserve that check/immediate panic(), rather than
possible triggering a recursive fault or other weird behavior. Other than that,
I stuck it as early as possible.

If I'm wrong about it possibly failing on an interrupt stack, then yeah, it
might make sense to put it even earlier. Or to rearrange things so that the flow
is "check interrupt -> set_fs() -> everything else".

You have an excellent point.

In practice, it appears that we do have current_thread_info() even in
interrupt context.  It just does not make much sense.  I actually
implemented set_fs() at the very beginning of do_exit() yesterday, and
we put this kernel on a certain machine that keeps crashing in all sorts
of ways (looks like its motherboard has issues supporting 8 GB of RAM).
(We did this before you replied to my question.)  And guess what - after
a few hours, the system was kind enough to crash with "Aiee, killing
interrupt handler!", and another person was kind enough to take a
screenshot via IP-KVM while I was asleep.  The screenshot shows "Process
swapper (pid: 0, veid=0, threadinfo ffffffff80582000, task ffffff..."
and so on.  So I guess current_thread_info() corresponds to whatever
process/context was interrupted.  (The crash itself had nothing to do
with the change.  Just a faulty machine.)

That said, I am going to revise our do_exit() patch to play safe as you
propose.

Thanks,

Alexander


Current thread: