oss-sec mailing list archives

Re: Re: Qualys Security Advisor -- The Stack Clash


From: Daniel Micay <danielmicay () gmail com>
Date: Tue, 20 Jun 2017 02:44:37 -0400

On Mon, 2017-06-19 at 15:15 -0600, Jeff Law wrote:
On 06/19/2017 12:52 PM, Daniel Micay wrote:
On Mon, 2017-06-19 at 11:26 -0600, Jeff Law wrote:
I would consider those two GCC BZs (68065, 66479) a separate an
distinct
issue.

It is far more important to address design issues around the
existing
-fstack-check first.  I think we've got a pretty good handle on
how to
address those problems and discussions with the upstream GCC
community
have already started.

In an ideal world we'll get to a place where the new -fstack-check
does
not change program semantics, never misses probes and is efficient
enough to just turn on and forget everywhere.  The existing
-fstack-check fails all three of those criteria.

Jeff

AFAIK, the main efficiency issue (reserving a register) was fixed
for
GCC 6. I might be missing something but it seems very cheap now, at
least for x86_64. It definitely doesn't really work though.

Is there an example of it changing program semantics? I haven't seen
anything since the generic arch stuff was fixed.

Absolutely -fstack-check, as currently implemented, can change program
semantics.  It's related to -fstack-check moving objects from
statically
allocated space into dynamically allocated space (because the generic
code can't handle large static frames).  It creates the alloca'd
objects
at the wrong scope.  This doesn't happen on all architectures, but it
does happen on architectures I have to care about.

WRT efficiency, -fstack-check is marginal -- even with its clever code
of assuming that it can elide probes into the first two pages of a
static frame (because the caller must have probed those two frames).
Consistently when we looked at code it's over-probing.  Sadly, it's
over-probing in all the wrong places (because it's trying so damn hard
to ensure there's always 2 free pages the signal handler can use).

WRT probing correctness -- -fstack-check skips probes on the
assumption
that an earlier caller in the call chain should have probed those
pages.
 But that's a fundamentally flawed assumption unless the entire
application is compiled with -fstack-check.  In fact, by eliding those
probes, it actually misses the most important cases in mixed
environment!

-fstack-check also has the nasty habit of probing into unallocated
areas.  This tends to cause valgrind problems.  Both in the sense of
getting far too many false positives, but on two platforms the code
generated by -fstack-check actually crashes valgrind.  These issues
are
directly related to -fstack-check wanting to probe all the pages
before
doing any allocations.

FWIW, we initially thought we were going to be able to use -fstack-
check
with some slight tweaks.  But the deeper we got into -fstack-check the
more we ended up rewriting the probe generation from scratch.  In
fact,
we were unable to use the existing -fstack-check probing code from
*any*
target.

jeff

I think it's also worth mentioning the segmented stack support in GCC
and LLVM that was added for Go. It's possible to use it for C with the
__morestack call set up to simply abort when stack space is exhausted.

That's what Rust was doing after it dropped segmented stacks, but they
wanted to move to stack probes for efficiency and prematurely dropped
these function prelude checks.

It's not efficient, but it works, unlike -fstack-check.

I don't think it makes sense for general purpose distributions to adopt
it but it's an available option for others with more concern about this
issue.


Current thread: