[musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).

mailing list of musl libc
 help / color / mirror / code / Atom feed

* [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
       [not found]       ` <20200929144207.GD17637@brightrain.aerifal.cx>
@ 2020-10-01  2:30         ` Rich Felker
  2020-10-01  6:08           ` Florian Weimer
  2020-10-10  0:26           ` Rich Felker
  0 siblings, 2 replies; 8+ messages in thread
From: Rich Felker @ 2020-10-01  2:30 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Carlos O'Donell via Libc-alpha, musl

On Tue, Sep 29, 2020 at 10:42:07AM -0400, Rich Felker wrote:
> On Tue, Sep 29, 2020 at 08:54:59AM +0200, Florian Weimer wrote:
> > * Rich Felker:
> > 
> > > Is there a reason to take the lock across fork rather than just
> > > resetting it in the child? After seeing this I'm working on fixing the
> > > same issue in musl and was about to take the lock, but realized ours
> > > isn't actually protecting any userspace data state, just excluding
> > > sigaction on SIGABRT during abort.
> > 
> > It's also necessary to stop the fork because the subprocess could
> > otherwise observe the impossible SIG_DFL state.  In case the signal
> > handler returns, the implementation needs to produce a termination
> > status with SIGABRT as the termination signal, and the only way I can
> > see to achieve that is to remove the signal handler and send the
> > signal again.  This suggests that a lock in sigaction is needed as
> > well.
> 
> Yes, in musl we already have the lock in sigaction -- that's the whole
> point of the lock. To prevent other threads from fighting to change
> the disposition back to SIG_IGN or a signal handler while abort is
> trying to change it to SIG_DFL.
> 
> > But for the fork case, restting the lock in the new subprocess should
> > be sufficient.
> 
> I don't follow. Do you mean taking the lock in the parent, but just
> resetting it in the child? That should work but I don't see how it has
> any advantage over just releasing it in the child.

OK, this is a lot worse than you thought:

Even without fork, execve and posix_spawn can also see the SIGABRT
disposition change made by abort(), passing it on to a process that
should have started with a disposition of SIG_IGN if you hit exactly
the wrong spot in the race.

So, to fix this, these interfaces also have to take the abort lock,
and to make it AS-safe (since execve is required to be), need to block
all signals to take the lock. But execve can't leave signals blocked
or the new process image would inherit that state. So it has to
unblock them after taking the lock. But then a signal handler can
interrupt between taking the lock and the execve syscall, making abort
deadlock if called from the signal handler.

So how to solve this? Having the abort lock be recursive sounds like
it helps (avoid the deadlock above), but then the signal handler that
runs between taking the abort lock and making the execve syscall still
delays abort by other threads for an unbounded length of time, and in
fact it could even longjmp out, leaving a stale lock owner that
prevents any other thread from ever calling abort. Ultimately this
boils down to a general principle: you can't make AS-safe locks that
allow arbitrary application code to run while they're held.

I really don't see any way out without giving abort a mechanism to
"seize" other threads before changing the signal disposition. This
could for example be done with the same mechanism used for
multithreaded set*id (broadcast signal of an implementation-internal,
unblockable signal) or maybe with some seccomp hacks on a recent
enough kernel. Is there some better approach I'm missing??

All of this hell because Linux thought we didn't need a SYS_abort...

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
  2020-10-01  2:30         ` [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275) Rich Felker
@ 2020-10-01  6:08           ` Florian Weimer
  2020-10-01 14:39             ` Rich Felker
  2020-10-01 14:49             ` Carlos O'Donell
  2020-10-10  0:26           ` Rich Felker
  1 sibling, 2 replies; 8+ messages in thread
From: Florian Weimer @ 2020-10-01  6:08 UTC (permalink / raw)
  To: Rich Felker; +Cc: Carlos O'Donell via Libc-alpha, musl

* Rich Felker:

> Even without fork, execve and posix_spawn can also see the SIGABRT
> disposition change made by abort(), passing it on to a process that
> should have started with a disposition of SIG_IGN if you hit exactly
> the wrong spot in the race.

My feeling is that it's not worth bothering with this kind of leakage.
We've had this bug forever in glibc, and no one has complained about
it.

Carlos is investigating removal of the abort lock from glibc, I think.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
  2020-10-01  6:08           ` Florian Weimer
@ 2020-10-01 14:39             ` Rich Felker
  2020-10-01 15:11               ` Florian Weimer
  2020-10-01 14:49             ` Carlos O'Donell
  1 sibling, 1 reply; 8+ messages in thread
From: Rich Felker @ 2020-10-01 14:39 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Carlos O'Donell via Libc-alpha, musl

On Thu, Oct 01, 2020 at 08:08:24AM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > Even without fork, execve and posix_spawn can also see the SIGABRT
> > disposition change made by abort(), passing it on to a process that
> > should have started with a disposition of SIG_IGN if you hit exactly
> > the wrong spot in the race.
> 
> My feeling is that it's not worth bothering with this kind of leakage.
> We've had this bug forever in glibc, and no one has complained about
> it.
> 
> Carlos is investigating removal of the abort lock from glibc, I think.

I don't think that's a good solution. The lock is really important in
that it protects against serious wrong behavior *within the process*
like an application-installed signal handler for SIGABRT getting
called more than once. The worst outcome of the exec issue discussed
here (and similar for fork) is simply the wrong disposition (SIG_DFL
rather than SIG_IGN) getting passed on to the new process image -- and
it's very rare to actually want SIG_IGN to be inherited. Undoing an
important fix for wrong code execution just because it's an incomplete
fix for wrong disposition inheritance would be bad.

If we want to solve this without relying on broadcast/seizure of all
threads, it looks like it may be a tradeoff between the subtly wrong
behavior for inheritance of SIG_IGN, and subtle wrongness with respect
to POSIX requirements on termination status from abort. It's possible
to just probe whether the old status was SIG_IGN; if not, there's no
issue with the current approach of changing it to SIG_DFL. But if it
was SIG_IGN, you can sacrifice the desired SIGABRT termination and get
termination by SIGSEGV or SIGILL or SIGBUS (still abnormal status,
still generating a core, just not the right signal number) just by
leaving signals masked and executing an instruction that will fault in
one of these ways.

FWIW musl "already does this" (attempting to terminate via a_crash()
with signals still masked) if the SIGABRT after resetting its
disposition somehow fails to terminate the process. It then falls back
to raising SIGKILL, SYS_exit_group, SYS_exit, and for(;;), so we'd get
this behavior automatically just by skipping the sigaction if the old
disposition was SIG_IGN.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
  2020-10-01  6:08           ` Florian Weimer
  2020-10-01 14:39             ` Rich Felker
@ 2020-10-01 14:49             ` Carlos O'Donell
  2020-10-01 14:55               ` Rich Felker
  1 sibling, 1 reply; 8+ messages in thread
From: Carlos O'Donell @ 2020-10-01 14:49 UTC (permalink / raw)
  To: Florian Weimer, Rich Felker; +Cc: musl, Carlos O'Donell via Libc-alpha

On 10/1/20 2:08 AM, Florian Weimer wrote:
> * Rich Felker:
> 
>> Even without fork, execve and posix_spawn can also see the SIGABRT
>> disposition change made by abort(), passing it on to a process that
>> should have started with a disposition of SIG_IGN if you hit exactly
>> the wrong spot in the race.
> 
> My feeling is that it's not worth bothering with this kind of leakage.
> We've had this bug forever in glibc, and no one has complained about
> it.
> 
> Carlos is investigating removal of the abort lock from glibc, I think.
 
I am investigating the removal, but I think the replacement solution
might be needing to have a helper thread carry out specific tasks.

Like Rich I'm still a little worried about the other cases that the
lock is intended to fix.

-- 
Cheers,
Carlos.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
  2020-10-01 14:49             ` Carlos O'Donell
@ 2020-10-01 14:55               ` Rich Felker
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Felker @ 2020-10-01 14:55 UTC (permalink / raw)
  To: Carlos O'Donell
  Cc: Florian Weimer, musl, Carlos O'Donell via Libc-alpha

On Thu, Oct 01, 2020 at 10:49:42AM -0400, Carlos O'Donell wrote:
> On 10/1/20 2:08 AM, Florian Weimer wrote:
> > * Rich Felker:
> > 
> >> Even without fork, execve and posix_spawn can also see the SIGABRT
> >> disposition change made by abort(), passing it on to a process that
> >> should have started with a disposition of SIG_IGN if you hit exactly
> >> the wrong spot in the race.
> > 
> > My feeling is that it's not worth bothering with this kind of leakage.
> > We've had this bug forever in glibc, and no one has complained about
> > it.
> > 
> > Carlos is investigating removal of the abort lock from glibc, I think.
>  
> I am investigating the removal, but I think the replacement solution
> might be needing to have a helper thread carry out specific tasks.

I'm confused what a helper thread could achieve here. The underlying
problem is that the kernel forces CLONE_SIGHAND on threads (EINVAL
without it) so that the disposition can't be changed in a thread-local
manner. Any new thread would have that same issue. It also would not
be something you could reliably create at abort time (especially since
abort is most often used on resource exhaustion and other unexpected
failures).

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
  2020-10-01 14:39             ` Rich Felker
@ 2020-10-01 15:11               ` Florian Weimer
  2020-10-01 15:28                 ` Rich Felker
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Weimer @ 2020-10-01 15:11 UTC (permalink / raw)
  To: Rich Felker; +Cc: musl, Carlos O'Donell via Libc-alpha

* Rich Felker:

> On Thu, Oct 01, 2020 at 08:08:24AM +0200, Florian Weimer wrote:
>> * Rich Felker:
>> 
>> > Even without fork, execve and posix_spawn can also see the SIGABRT
>> > disposition change made by abort(), passing it on to a process that
>> > should have started with a disposition of SIG_IGN if you hit exactly
>> > the wrong spot in the race.
>> 
>> My feeling is that it's not worth bothering with this kind of leakage.
>> We've had this bug forever in glibc, and no one has complained about
>> it.
>> 
>> Carlos is investigating removal of the abort lock from glibc, I think.
>
> I don't think that's a good solution. The lock is really important in
> that it protects against serious wrong behavior *within the process*
> like an application-installed signal handler for SIGABRT getting
> called more than once.

I think glibc currently has this bug.  We only avoid it for abort, but
I'm not sure if it's a bug to handle the handler multiple times if abort
is called more than once.

But even for the more general case (threads call sigaction to install a
SIGABRT handler): Do we actually need a lock there?  We reach this state
only after raise (SIGABRT) has returned.  At this point, we can set a
flag (not a lock), and every other thread that calls signal or sigaction
would instead perform the late-stage SIG_DFL-for-SIGABRT part of abort?
It probably still needs some fiddling with sigprocmask.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
  2020-10-01 15:11               ` Florian Weimer
@ 2020-10-01 15:28                 ` Rich Felker
  0 siblings, 0 replies; 8+ messages in thread
From: Rich Felker @ 2020-10-01 15:28 UTC (permalink / raw)
  To: Florian Weimer; +Cc: musl, Carlos O'Donell via Libc-alpha

On Thu, Oct 01, 2020 at 05:11:19PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Thu, Oct 01, 2020 at 08:08:24AM +0200, Florian Weimer wrote:
> >> * Rich Felker:
> >> 
> >> > Even without fork, execve and posix_spawn can also see the SIGABRT
> >> > disposition change made by abort(), passing it on to a process that
> >> > should have started with a disposition of SIG_IGN if you hit exactly
> >> > the wrong spot in the race.
> >> 
> >> My feeling is that it's not worth bothering with this kind of leakage.
> >> We've had this bug forever in glibc, and no one has complained about
> >> it.
> >> 
> >> Carlos is investigating removal of the abort lock from glibc, I think.
> >
> > I don't think that's a good solution. The lock is really important in
> > that it protects against serious wrong behavior *within the process*
> > like an application-installed signal handler for SIGABRT getting
> > called more than once.
> 
> I think glibc currently has this bug.  We only avoid it for abort, but
> I'm not sure if it's a bug to handle the handler multiple times if abort
> is called more than once.

I don't see anything in the spec that allows for the signal handler to
be called multiple times. The signal is raised (thereby following
normal semantics for if/how signal handler runs), and if a handler
runs and returns, the process is then required to terminate abnormally
as if by SIGABRT. This isn't a license to execute the signal handler
again or do other random observable things.

> But even for the more general case (threads call sigaction to install a
> SIGABRT handler): Do we actually need a lock there?  We reach this state
> only after raise (SIGABRT) has returned.  At this point, we can set a
> flag (not a lock), and every other thread that calls signal or sigaction
> would instead perform the late-stage SIG_DFL-for-SIGABRT part of abort?
> It probably still needs some fiddling with sigprocmask.

There's a race between checking the flag and acting on it. If thread A
has already called signal(SIGABRT,foo) and gotten past the "are we
aborting?" check, then thread B calls abort(), thread A can reset the
disposition of SIGABRT to foo after thread B sets it to SIG_DFL, but
before thread B re-raises, unblocks, and acts on the signal.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
  2020-10-01  2:30         ` [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275) Rich Felker
  2020-10-01  6:08           ` Florian Weimer
@ 2020-10-10  0:26           ` Rich Felker
  1 sibling, 0 replies; 8+ messages in thread
From: Rich Felker @ 2020-10-10  0:26 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Carlos O'Donell via Libc-alpha, musl

On Wed, Sep 30, 2020 at 10:30:19PM -0400, Rich Felker wrote:
> On Tue, Sep 29, 2020 at 10:42:07AM -0400, Rich Felker wrote:
> > On Tue, Sep 29, 2020 at 08:54:59AM +0200, Florian Weimer wrote:
> > > * Rich Felker:
> > > 
> > > > Is there a reason to take the lock across fork rather than just
> > > > resetting it in the child? After seeing this I'm working on fixing the
> > > > same issue in musl and was about to take the lock, but realized ours
> > > > isn't actually protecting any userspace data state, just excluding
> > > > sigaction on SIGABRT during abort.
> > > 
> > > It's also necessary to stop the fork because the subprocess could
> > > otherwise observe the impossible SIG_DFL state.  In case the signal
> > > handler returns, the implementation needs to produce a termination
> > > status with SIGABRT as the termination signal, and the only way I can
> > > see to achieve that is to remove the signal handler and send the
> > > signal again.  This suggests that a lock in sigaction is needed as
> > > well.
> > 
> > Yes, in musl we already have the lock in sigaction -- that's the whole
> > point of the lock. To prevent other threads from fighting to change
> > the disposition back to SIG_IGN or a signal handler while abort is
> > trying to change it to SIG_DFL.
> > 
> > > But for the fork case, restting the lock in the new subprocess should
> > > be sufficient.
> > 
> > I don't follow. Do you mean taking the lock in the parent, but just
> > resetting it in the child? That should work but I don't see how it has
> > any advantage over just releasing it in the child.
> 
> OK, this is a lot worse than you thought:
> 
> Even without fork, execve and posix_spawn can also see the SIGABRT
> disposition change made by abort(), passing it on to a process that
> should have started with a disposition of SIG_IGN if you hit exactly
> the wrong spot in the race.
> 
> So, to fix this, these interfaces also have to take the abort lock,
> and to make it AS-safe (since execve is required to be), need to block
> all signals to take the lock. But execve can't leave signals blocked
> or the new process image would inherit that state. So it has to
> unblock them after taking the lock. But then a signal handler can
> interrupt between taking the lock and the execve syscall, making abort
> deadlock if called from the signal handler.
> 
> So how to solve this? Having the abort lock be recursive sounds like
> it helps (avoid the deadlock above), but then the signal handler that
> runs between taking the abort lock and making the execve syscall still
> delays abort by other threads for an unbounded length of time, and in
> fact it could even longjmp out, leaving a stale lock owner that
> prevents any other thread from ever calling abort. Ultimately this
> boils down to a general principle: you can't make AS-safe locks that
> allow arbitrary application code to run while they're held.
> 
> I really don't see any way out without giving abort a mechanism to
> "seize" other threads before changing the signal disposition. This
> could for example be done with the same mechanism used for
> multithreaded set*id (broadcast signal of an implementation-internal,
> unblockable signal) or maybe with some seccomp hacks on a recent
> enough kernel. Is there some better approach I'm missing??

There is something else I was missing, if we're willing to wrap signal
handlers -- something I've long suspected would be useful. The
primitive needed to solve this problem without a seize-all-threads
operation is restartable critical sections that restart if interrupted
by a signal, something like a "userspace EINTR". Here, any signal
arriving between execve taking the lock and making the execve syscall
would cause the lock to be released and the saved program counter to
revert to just before the lock was taken before the application's
signal handler runs.

How would such a thing be implemented? The wrapper for the signal
handler would examine TLS to observe that there's a restartable
critical section in progress, and modify the saved ucontext's
call-saved registers and uc_sigmask based on a sigjmp_buf saved before
the restartable section. It would then reverse the lock (probably via
a function pointer setup as part of the critical section) and
tail-call to the real signal handler. On return, the kernel then
automatically returns to just before the critical section, and the
lock is re-acquired before moving forward again.

Among other things, this allows implementing arbitrary "atomic unmask
signals and make syscall" operations (ala pselect) without need for
the kernel's help. It could also be used as the main ingredient of an
alternate thread cancellation design.

Is this a good idea? I don't know. But at least it means there seem to
be two (or more) possible solutions available.

Rich

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-10  0:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200927141952.121047-1-carlos@redhat.com>
     [not found] ` <871rinm1fx.fsf@mid.deneb.enyo.de>
     [not found]   ` <20200928234833.GC17637@brightrain.aerifal.cx>
     [not found]     ` <87d025jcn0.fsf@mid.deneb.enyo.de>
     [not found]       ` <20200929144207.GD17637@brightrain.aerifal.cx>
2020-10-01  2:30         ` [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275) Rich Felker
2020-10-01  6:08           ` Florian Weimer
2020-10-01 14:39             ` Rich Felker
2020-10-01 15:11               ` Florian Weimer
2020-10-01 15:28                 ` Rich Felker
2020-10-01 14:49             ` Carlos O'Donell
2020-10-01 14:55               ` Rich Felker
2020-10-10  0:26           ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).