From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.3 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 13263 invoked from network); 1 Oct 2020 02:30:39 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 1 Oct 2020 02:30:39 -0000 Received: (qmail 10193 invoked by uid 550); 1 Oct 2020 02:30:32 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 10172 invoked from network); 1 Oct 2020 02:30:31 -0000 Date: Wed, 30 Sep 2020 22:30:19 -0400 From: Rich Felker To: Florian Weimer Cc: Carlos O'Donell via Libc-alpha , musl@lists.openwall.com Message-ID: <20201001023018.GL17637@brightrain.aerifal.cx> References: <20200927141952.121047-1-carlos@redhat.com> <871rinm1fx.fsf@mid.deneb.enyo.de> <20200928234833.GC17637@brightrain.aerifal.cx> <87d025jcn0.fsf@mid.deneb.enyo.de> <20200929144207.GD17637@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200929144207.GD17637@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) Subject: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275). On Tue, Sep 29, 2020 at 10:42:07AM -0400, Rich Felker wrote: > On Tue, Sep 29, 2020 at 08:54:59AM +0200, Florian Weimer wrote: > > * Rich Felker: > > > > > Is there a reason to take the lock across fork rather than just > > > resetting it in the child? After seeing this I'm working on fixing the > > > same issue in musl and was about to take the lock, but realized ours > > > isn't actually protecting any userspace data state, just excluding > > > sigaction on SIGABRT during abort. > > > > It's also necessary to stop the fork because the subprocess could > > otherwise observe the impossible SIG_DFL state. In case the signal > > handler returns, the implementation needs to produce a termination > > status with SIGABRT as the termination signal, and the only way I can > > see to achieve that is to remove the signal handler and send the > > signal again. This suggests that a lock in sigaction is needed as > > well. > > Yes, in musl we already have the lock in sigaction -- that's the whole > point of the lock. To prevent other threads from fighting to change > the disposition back to SIG_IGN or a signal handler while abort is > trying to change it to SIG_DFL. > > > But for the fork case, restting the lock in the new subprocess should > > be sufficient. > > I don't follow. Do you mean taking the lock in the parent, but just > resetting it in the child? That should work but I don't see how it has > any advantage over just releasing it in the child. OK, this is a lot worse than you thought: Even without fork, execve and posix_spawn can also see the SIGABRT disposition change made by abort(), passing it on to a process that should have started with a disposition of SIG_IGN if you hit exactly the wrong spot in the race. So, to fix this, these interfaces also have to take the abort lock, and to make it AS-safe (since execve is required to be), need to block all signals to take the lock. But execve can't leave signals blocked or the new process image would inherit that state. So it has to unblock them after taking the lock. But then a signal handler can interrupt between taking the lock and the execve syscall, making abort deadlock if called from the signal handler. So how to solve this? Having the abort lock be recursive sounds like it helps (avoid the deadlock above), but then the signal handler that runs between taking the abort lock and making the execve syscall still delays abort by other threads for an unbounded length of time, and in fact it could even longjmp out, leaving a stale lock owner that prevents any other thread from ever calling abort. Ultimately this boils down to a general principle: you can't make AS-safe locks that allow arbitrary application code to run while they're held. I really don't see any way out without giving abort a mechanism to "seize" other threads before changing the signal disposition. This could for example be done with the same mechanism used for multithreaded set*id (broadcast signal of an implementation-internal, unblockable signal) or maybe with some seccomp hacks on a recent enough kernel. Is there some better approach I'm missing?? All of this hell because Linux thought we didn't need a SYS_abort... Rich