mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: Florian Weimer <fw@deneb.enyo.de>
Cc: Carlos O'Donell via Libc-alpha <libc-alpha@sourceware.org>,
	musl@lists.openwall.com
Subject: [musl] Re: [PATCH] Make abort() AS-safe (Bug 26275).
Date: Wed, 30 Sep 2020 22:30:19 -0400	[thread overview]
Message-ID: <20201001023018.GL17637@brightrain.aerifal.cx> (raw)
In-Reply-To: <20200929144207.GD17637@brightrain.aerifal.cx>

On Tue, Sep 29, 2020 at 10:42:07AM -0400, Rich Felker wrote:
> On Tue, Sep 29, 2020 at 08:54:59AM +0200, Florian Weimer wrote:
> > * Rich Felker:
> > 
> > > Is there a reason to take the lock across fork rather than just
> > > resetting it in the child? After seeing this I'm working on fixing the
> > > same issue in musl and was about to take the lock, but realized ours
> > > isn't actually protecting any userspace data state, just excluding
> > > sigaction on SIGABRT during abort.
> > 
> > It's also necessary to stop the fork because the subprocess could
> > otherwise observe the impossible SIG_DFL state.  In case the signal
> > handler returns, the implementation needs to produce a termination
> > status with SIGABRT as the termination signal, and the only way I can
> > see to achieve that is to remove the signal handler and send the
> > signal again.  This suggests that a lock in sigaction is needed as
> > well.
> 
> Yes, in musl we already have the lock in sigaction -- that's the whole
> point of the lock. To prevent other threads from fighting to change
> the disposition back to SIG_IGN or a signal handler while abort is
> trying to change it to SIG_DFL.
> 
> > But for the fork case, restting the lock in the new subprocess should
> > be sufficient.
> 
> I don't follow. Do you mean taking the lock in the parent, but just
> resetting it in the child? That should work but I don't see how it has
> any advantage over just releasing it in the child.

OK, this is a lot worse than you thought:

Even without fork, execve and posix_spawn can also see the SIGABRT
disposition change made by abort(), passing it on to a process that
should have started with a disposition of SIG_IGN if you hit exactly
the wrong spot in the race.

So, to fix this, these interfaces also have to take the abort lock,
and to make it AS-safe (since execve is required to be), need to block
all signals to take the lock. But execve can't leave signals blocked
or the new process image would inherit that state. So it has to
unblock them after taking the lock. But then a signal handler can
interrupt between taking the lock and the execve syscall, making abort
deadlock if called from the signal handler.

So how to solve this? Having the abort lock be recursive sounds like
it helps (avoid the deadlock above), but then the signal handler that
runs between taking the abort lock and making the execve syscall still
delays abort by other threads for an unbounded length of time, and in
fact it could even longjmp out, leaving a stale lock owner that
prevents any other thread from ever calling abort. Ultimately this
boils down to a general principle: you can't make AS-safe locks that
allow arbitrary application code to run while they're held.

I really don't see any way out without giving abort a mechanism to
"seize" other threads before changing the signal disposition. This
could for example be done with the same mechanism used for
multithreaded set*id (broadcast signal of an implementation-internal,
unblockable signal) or maybe with some seccomp hacks on a recent
enough kernel. Is there some better approach I'm missing??

All of this hell because Linux thought we didn't need a SYS_abort...

Rich

       reply	other threads:[~2020-10-01  2:30 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200927141952.121047-1-carlos@redhat.com>
     [not found] ` <871rinm1fx.fsf@mid.deneb.enyo.de>
     [not found]   ` <20200928234833.GC17637@brightrain.aerifal.cx>
     [not found]     ` <87d025jcn0.fsf@mid.deneb.enyo.de>
     [not found]       ` <20200929144207.GD17637@brightrain.aerifal.cx>
2020-10-01  2:30         ` Rich Felker [this message]
2020-10-01  6:08           ` Florian Weimer
2020-10-01 14:39             ` Rich Felker
2020-10-01 15:11               ` Florian Weimer
2020-10-01 15:28                 ` Rich Felker
2020-10-01 14:49             ` Carlos O'Donell
2020-10-01 14:55               ` Rich Felker
2020-10-10  0:26           ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201001023018.GL17637@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=fw@deneb.enyo.de \
    --cc=libc-alpha@sourceware.org \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).