* [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). [not found] ` <20200929144207.GD17637@brightrain.aerifal.cx> @ 2020-10-01 2:30 ` Rich Felker 2020-10-01 6:08 ` Florian Weimer 2020-10-10 0:26 ` Rich Felker 0 siblings, 2 replies; 8+ messages in thread From: Rich Felker @ 2020-10-01 2:30 UTC (permalink / raw) To: Florian Weimer; +Cc: Carlos O'Donell via Libc-alpha, musl On Tue, Sep 29, 2020 at 10:42:07AM -0400, Rich Felker wrote: > On Tue, Sep 29, 2020 at 08:54:59AM +0200, Florian Weimer wrote: > > * Rich Felker: > > > > > Is there a reason to take the lock across fork rather than just > > > resetting it in the child? After seeing this I'm working on fixing the > > > same issue in musl and was about to take the lock, but realized ours > > > isn't actually protecting any userspace data state, just excluding > > > sigaction on SIGABRT during abort. > > > > It's also necessary to stop the fork because the subprocess could > > otherwise observe the impossible SIG_DFL state. In case the signal > > handler returns, the implementation needs to produce a termination > > status with SIGABRT as the termination signal, and the only way I can > > see to achieve that is to remove the signal handler and send the > > signal again. This suggests that a lock in sigaction is needed as > > well. > > Yes, in musl we already have the lock in sigaction -- that's the whole > point of the lock. To prevent other threads from fighting to change > the disposition back to SIG_IGN or a signal handler while abort is > trying to change it to SIG_DFL. > > > But for the fork case, restting the lock in the new subprocess should > > be sufficient. > > I don't follow. Do you mean taking the lock in the parent, but just > resetting it in the child? That should work but I don't see how it has > any advantage over just releasing it in the child. OK, this is a lot worse than you thought: Even without fork, execve and posix_spawn can also see the SIGABRT disposition change made by abort(), passing it on to a process that should have started with a disposition of SIG_IGN if you hit exactly the wrong spot in the race. So, to fix this, these interfaces also have to take the abort lock, and to make it AS-safe (since execve is required to be), need to block all signals to take the lock. But execve can't leave signals blocked or the new process image would inherit that state. So it has to unblock them after taking the lock. But then a signal handler can interrupt between taking the lock and the execve syscall, making abort deadlock if called from the signal handler. So how to solve this? Having the abort lock be recursive sounds like it helps (avoid the deadlock above), but then the signal handler that runs between taking the abort lock and making the execve syscall still delays abort by other threads for an unbounded length of time, and in fact it could even longjmp out, leaving a stale lock owner that prevents any other thread from ever calling abort. Ultimately this boils down to a general principle: you can't make AS-safe locks that allow arbitrary application code to run while they're held. I really don't see any way out without giving abort a mechanism to "seize" other threads before changing the signal disposition. This could for example be done with the same mechanism used for multithreaded set*id (broadcast signal of an implementation-internal, unblockable signal) or maybe with some seccomp hacks on a recent enough kernel. Is there some better approach I'm missing?? All of this hell because Linux thought we didn't need a SYS_abort... Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). 2020-10-01 2:30 ` [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275) Rich Felker @ 2020-10-01 6:08 ` Florian Weimer 2020-10-01 14:39 ` Rich Felker 2020-10-01 14:49 ` Carlos O'Donell 2020-10-10 0:26 ` Rich Felker 1 sibling, 2 replies; 8+ messages in thread From: Florian Weimer @ 2020-10-01 6:08 UTC (permalink / raw) To: Rich Felker; +Cc: Carlos O'Donell via Libc-alpha, musl * Rich Felker: > Even without fork, execve and posix_spawn can also see the SIGABRT > disposition change made by abort(), passing it on to a process that > should have started with a disposition of SIG_IGN if you hit exactly > the wrong spot in the race. My feeling is that it's not worth bothering with this kind of leakage. We've had this bug forever in glibc, and no one has complained about it. Carlos is investigating removal of the abort lock from glibc, I think. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). 2020-10-01 6:08 ` Florian Weimer @ 2020-10-01 14:39 ` Rich Felker 2020-10-01 15:11 ` Florian Weimer 2020-10-01 14:49 ` Carlos O'Donell 1 sibling, 1 reply; 8+ messages in thread From: Rich Felker @ 2020-10-01 14:39 UTC (permalink / raw) To: Florian Weimer; +Cc: Carlos O'Donell via Libc-alpha, musl On Thu, Oct 01, 2020 at 08:08:24AM +0200, Florian Weimer wrote: > * Rich Felker: > > > Even without fork, execve and posix_spawn can also see the SIGABRT > > disposition change made by abort(), passing it on to a process that > > should have started with a disposition of SIG_IGN if you hit exactly > > the wrong spot in the race. > > My feeling is that it's not worth bothering with this kind of leakage. > We've had this bug forever in glibc, and no one has complained about > it. > > Carlos is investigating removal of the abort lock from glibc, I think. I don't think that's a good solution. The lock is really important in that it protects against serious wrong behavior *within the process* like an application-installed signal handler for SIGABRT getting called more than once. The worst outcome of the exec issue discussed here (and similar for fork) is simply the wrong disposition (SIG_DFL rather than SIG_IGN) getting passed on to the new process image -- and it's very rare to actually want SIG_IGN to be inherited. Undoing an important fix for wrong code execution just because it's an incomplete fix for wrong disposition inheritance would be bad. If we want to solve this without relying on broadcast/seizure of all threads, it looks like it may be a tradeoff between the subtly wrong behavior for inheritance of SIG_IGN, and subtle wrongness with respect to POSIX requirements on termination status from abort. It's possible to just probe whether the old status was SIG_IGN; if not, there's no issue with the current approach of changing it to SIG_DFL. But if it was SIG_IGN, you can sacrifice the desired SIGABRT termination and get termination by SIGSEGV or SIGILL or SIGBUS (still abnormal status, still generating a core, just not the right signal number) just by leaving signals masked and executing an instruction that will fault in one of these ways. FWIW musl "already does this" (attempting to terminate via a_crash() with signals still masked) if the SIGABRT after resetting its disposition somehow fails to terminate the process. It then falls back to raising SIGKILL, SYS_exit_group, SYS_exit, and for(;;), so we'd get this behavior automatically just by skipping the sigaction if the old disposition was SIG_IGN. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). 2020-10-01 14:39 ` Rich Felker @ 2020-10-01 15:11 ` Florian Weimer 2020-10-01 15:28 ` Rich Felker 0 siblings, 1 reply; 8+ messages in thread From: Florian Weimer @ 2020-10-01 15:11 UTC (permalink / raw) To: Rich Felker; +Cc: musl, Carlos O'Donell via Libc-alpha * Rich Felker: > On Thu, Oct 01, 2020 at 08:08:24AM +0200, Florian Weimer wrote: >> * Rich Felker: >> >> > Even without fork, execve and posix_spawn can also see the SIGABRT >> > disposition change made by abort(), passing it on to a process that >> > should have started with a disposition of SIG_IGN if you hit exactly >> > the wrong spot in the race. >> >> My feeling is that it's not worth bothering with this kind of leakage. >> We've had this bug forever in glibc, and no one has complained about >> it. >> >> Carlos is investigating removal of the abort lock from glibc, I think. > > I don't think that's a good solution. The lock is really important in > that it protects against serious wrong behavior *within the process* > like an application-installed signal handler for SIGABRT getting > called more than once. I think glibc currently has this bug. We only avoid it for abort, but I'm not sure if it's a bug to handle the handler multiple times if abort is called more than once. But even for the more general case (threads call sigaction to install a SIGABRT handler): Do we actually need a lock there? We reach this state only after raise (SIGABRT) has returned. At this point, we can set a flag (not a lock), and every other thread that calls signal or sigaction would instead perform the late-stage SIG_DFL-for-SIGABRT part of abort? It probably still needs some fiddling with sigprocmask. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). 2020-10-01 15:11 ` Florian Weimer @ 2020-10-01 15:28 ` Rich Felker 0 siblings, 0 replies; 8+ messages in thread From: Rich Felker @ 2020-10-01 15:28 UTC (permalink / raw) To: Florian Weimer; +Cc: musl, Carlos O'Donell via Libc-alpha On Thu, Oct 01, 2020 at 05:11:19PM +0200, Florian Weimer wrote: > * Rich Felker: > > > On Thu, Oct 01, 2020 at 08:08:24AM +0200, Florian Weimer wrote: > >> * Rich Felker: > >> > >> > Even without fork, execve and posix_spawn can also see the SIGABRT > >> > disposition change made by abort(), passing it on to a process that > >> > should have started with a disposition of SIG_IGN if you hit exactly > >> > the wrong spot in the race. > >> > >> My feeling is that it's not worth bothering with this kind of leakage. > >> We've had this bug forever in glibc, and no one has complained about > >> it. > >> > >> Carlos is investigating removal of the abort lock from glibc, I think. > > > > I don't think that's a good solution. The lock is really important in > > that it protects against serious wrong behavior *within the process* > > like an application-installed signal handler for SIGABRT getting > > called more than once. > > I think glibc currently has this bug. We only avoid it for abort, but > I'm not sure if it's a bug to handle the handler multiple times if abort > is called more than once. I don't see anything in the spec that allows for the signal handler to be called multiple times. The signal is raised (thereby following normal semantics for if/how signal handler runs), and if a handler runs and returns, the process is then required to terminate abnormally as if by SIGABRT. This isn't a license to execute the signal handler again or do other random observable things. > But even for the more general case (threads call sigaction to install a > SIGABRT handler): Do we actually need a lock there? We reach this state > only after raise (SIGABRT) has returned. At this point, we can set a > flag (not a lock), and every other thread that calls signal or sigaction > would instead perform the late-stage SIG_DFL-for-SIGABRT part of abort? > It probably still needs some fiddling with sigprocmask. There's a race between checking the flag and acting on it. If thread A has already called signal(SIGABRT,foo) and gotten past the "are we aborting?" check, then thread B calls abort(), thread A can reset the disposition of SIGABRT to foo after thread B sets it to SIG_DFL, but before thread B re-raises, unblocks, and acts on the signal. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). 2020-10-01 6:08 ` Florian Weimer 2020-10-01 14:39 ` Rich Felker @ 2020-10-01 14:49 ` Carlos O'Donell 2020-10-01 14:55 ` Rich Felker 1 sibling, 1 reply; 8+ messages in thread From: Carlos O'Donell @ 2020-10-01 14:49 UTC (permalink / raw) To: Florian Weimer, Rich Felker; +Cc: musl, Carlos O'Donell via Libc-alpha On 10/1/20 2:08 AM, Florian Weimer wrote: > * Rich Felker: > >> Even without fork, execve and posix_spawn can also see the SIGABRT >> disposition change made by abort(), passing it on to a process that >> should have started with a disposition of SIG_IGN if you hit exactly >> the wrong spot in the race. > > My feeling is that it's not worth bothering with this kind of leakage. > We've had this bug forever in glibc, and no one has complained about > it. > > Carlos is investigating removal of the abort lock from glibc, I think. I am investigating the removal, but I think the replacement solution might be needing to have a helper thread carry out specific tasks. Like Rich I'm still a little worried about the other cases that the lock is intended to fix. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). 2020-10-01 14:49 ` Carlos O'Donell @ 2020-10-01 14:55 ` Rich Felker 0 siblings, 0 replies; 8+ messages in thread From: Rich Felker @ 2020-10-01 14:55 UTC (permalink / raw) To: Carlos O'Donell Cc: Florian Weimer, musl, Carlos O'Donell via Libc-alpha On Thu, Oct 01, 2020 at 10:49:42AM -0400, Carlos O'Donell wrote: > On 10/1/20 2:08 AM, Florian Weimer wrote: > > * Rich Felker: > > > >> Even without fork, execve and posix_spawn can also see the SIGABRT > >> disposition change made by abort(), passing it on to a process that > >> should have started with a disposition of SIG_IGN if you hit exactly > >> the wrong spot in the race. > > > > My feeling is that it's not worth bothering with this kind of leakage. > > We've had this bug forever in glibc, and no one has complained about > > it. > > > > Carlos is investigating removal of the abort lock from glibc, I think. > > I am investigating the removal, but I think the replacement solution > might be needing to have a helper thread carry out specific tasks. I'm confused what a helper thread could achieve here. The underlying problem is that the kernel forces CLONE_SIGHAND on threads (EINVAL without it) so that the disposition can't be changed in a thread-local manner. Any new thread would have that same issue. It also would not be something you could reliably create at abort time (especially since abort is most often used on resource exhaustion and other unexpected failures). Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). 2020-10-01 2:30 ` [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275) Rich Felker 2020-10-01 6:08 ` Florian Weimer @ 2020-10-10 0:26 ` Rich Felker 1 sibling, 0 replies; 8+ messages in thread From: Rich Felker @ 2020-10-10 0:26 UTC (permalink / raw) To: Florian Weimer; +Cc: Carlos O'Donell via Libc-alpha, musl On Wed, Sep 30, 2020 at 10:30:19PM -0400, Rich Felker wrote: > On Tue, Sep 29, 2020 at 10:42:07AM -0400, Rich Felker wrote: > > On Tue, Sep 29, 2020 at 08:54:59AM +0200, Florian Weimer wrote: > > > * Rich Felker: > > > > > > > Is there a reason to take the lock across fork rather than just > > > > resetting it in the child? After seeing this I'm working on fixing the > > > > same issue in musl and was about to take the lock, but realized ours > > > > isn't actually protecting any userspace data state, just excluding > > > > sigaction on SIGABRT during abort. > > > > > > It's also necessary to stop the fork because the subprocess could > > > otherwise observe the impossible SIG_DFL state. In case the signal > > > handler returns, the implementation needs to produce a termination > > > status with SIGABRT as the termination signal, and the only way I can > > > see to achieve that is to remove the signal handler and send the > > > signal again. This suggests that a lock in sigaction is needed as > > > well. > > > > Yes, in musl we already have the lock in sigaction -- that's the whole > > point of the lock. To prevent other threads from fighting to change > > the disposition back to SIG_IGN or a signal handler while abort is > > trying to change it to SIG_DFL. > > > > > But for the fork case, restting the lock in the new subprocess should > > > be sufficient. > > > > I don't follow. Do you mean taking the lock in the parent, but just > > resetting it in the child? That should work but I don't see how it has > > any advantage over just releasing it in the child. > > OK, this is a lot worse than you thought: > > Even without fork, execve and posix_spawn can also see the SIGABRT > disposition change made by abort(), passing it on to a process that > should have started with a disposition of SIG_IGN if you hit exactly > the wrong spot in the race. > > So, to fix this, these interfaces also have to take the abort lock, > and to make it AS-safe (since execve is required to be), need to block > all signals to take the lock. But execve can't leave signals blocked > or the new process image would inherit that state. So it has to > unblock them after taking the lock. But then a signal handler can > interrupt between taking the lock and the execve syscall, making abort > deadlock if called from the signal handler. > > So how to solve this? Having the abort lock be recursive sounds like > it helps (avoid the deadlock above), but then the signal handler that > runs between taking the abort lock and making the execve syscall still > delays abort by other threads for an unbounded length of time, and in > fact it could even longjmp out, leaving a stale lock owner that > prevents any other thread from ever calling abort. Ultimately this > boils down to a general principle: you can't make AS-safe locks that > allow arbitrary application code to run while they're held. > > I really don't see any way out without giving abort a mechanism to > "seize" other threads before changing the signal disposition. This > could for example be done with the same mechanism used for > multithreaded set*id (broadcast signal of an implementation-internal, > unblockable signal) or maybe with some seccomp hacks on a recent > enough kernel. Is there some better approach I'm missing?? There is something else I was missing, if we're willing to wrap signal handlers -- something I've long suspected would be useful. The primitive needed to solve this problem without a seize-all-threads operation is restartable critical sections that restart if interrupted by a signal, something like a "userspace EINTR". Here, any signal arriving between execve taking the lock and making the execve syscall would cause the lock to be released and the saved program counter to revert to just before the lock was taken before the application's signal handler runs. How would such a thing be implemented? The wrapper for the signal handler would examine TLS to observe that there's a restartable critical section in progress, and modify the saved ucontext's call-saved registers and uc_sigmask based on a sigjmp_buf saved before the restartable section. It would then reverse the lock (probably via a function pointer setup as part of the critical section) and tail-call to the real signal handler. On return, the kernel then automatically returns to just before the critical section, and the lock is re-acquired before moving forward again. Among other things, this allows implementing arbitrary "atomic unmask signals and make syscall" operations (ala pselect) without need for the kernel's help. It could also be used as the main ingredient of an alternate thread cancellation design. Is this a good idea? I don't know. But at least it means there seem to be two (or more) possible solutions available. Rich ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-10-10 0:26 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20200927141952.121047-1-carlos@redhat.com> [not found] ` <871rinm1fx.fsf@mid.deneb.enyo.de> [not found] ` <20200928234833.GC17637@brightrain.aerifal.cx> [not found] ` <87d025jcn0.fsf@mid.deneb.enyo.de> [not found] ` <20200929144207.GD17637@brightrain.aerifal.cx> 2020-10-01 2:30 ` [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275) Rich Felker 2020-10-01 6:08 ` Florian Weimer 2020-10-01 14:39 ` Rich Felker 2020-10-01 15:11 ` Florian Weimer 2020-10-01 15:28 ` Rich Felker 2020-10-01 14:49 ` Carlos O'Donell 2020-10-01 14:55 ` Rich Felker 2020-10-10 0:26 ` Rich Felker
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/musl/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).