Re: __synccall: deadlock and reliance on racy /proc/self/task

mailing list of musl libc
 help / color / mirror / code / Atom feed

From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com, Alexey Izbyshev <izbyshev@ispras.ru>
Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task
Date: Sun, 10 Feb 2019 10:05:35 -0500	[thread overview]
Message-ID: <20190210150535.GF23599@brightrain.aerifal.cx> (raw)
In-Reply-To: <20190210123214.GQ21289@port70.net>

On Sun, Feb 10, 2019 at 01:32:14PM +0100, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2019-02-09 23:01:50 -0500]:
> > On Sat, Feb 09, 2019 at 08:20:32PM -0500, Rich Felker wrote:
> > > On Sun, Feb 10, 2019 at 02:16:23AM +0100, Szabolcs Nagy wrote:
> > > > * Rich Felker <dalias@libc.org> [2019-02-09 19:52:50 -0500]:
> > > > > On Sat, Feb 09, 2019 at 10:40:45PM +0100, Szabolcs Nagy wrote:
> > > > > > the assumption is that if /proc/self/task is read twice such that
> > > > > > all tids in it seem to be active and caught, then all the active
> > > > > > threads of the process are caught (no new threads that are already
> > > > > > started but not visible there yet)
> 
> it seems if the main thread exits, it is still listed in /proc/self/task
> and has zombie status for the lifetime of the process so futex lock always
> fails with ESRCH.
> 
> so my logic waiting for all exiting threads to exit does not work (at
> least the main thread needs to be special cased).
> 
> > > > > 
> > > > > I'm skeptical of whether this should work in principle. If the first
> > > > > scan of /proc/self/task misses tid J, and during the next scan, tid J
> > > > > creates tid K then exits, it seems like we could see the same set of
> > > > > tids on both scans.
> > > > > 
> > > > > Maybe it's salvagable though. Since __block_new_threads is true, in
> > > > > order for this to happen, tid J must have been between the
> > > > > __block_new_threads check in pthread_create and the clone syscall at
> > > > > the time __synccall started. The number of threads in such a state
> > > > > seems to be bounded by some small constant (like 2) times
> > > > > libc.threads_minus_1+1, computed at any point after
> > > > > __block_new_threads is set to true, so sufficiently heavy presignaling
> > > > > (heavier than we have now) might suffice to guarantee that all are
> > > > > captured. 
> > > > 
> > > > heavier presignaling may catch more threads, but we don't
> > > > know how long should we wait until all signal handlers are
> > > > invoked (to ensure that all tasks are enqueued on the call
> > > > serializer chain before we start walking that list)
> > > 
> > > That's why reading /proc/self/task is still necessary. However, it
> > > seems useful to be able to prove you've queued enough signals that at
> > > least as many threads as could possibly exist are already in a state
> > > where they cannot return from a syscall with signals unblocked without
> > > entering the signal handler. In that case you would know there's no
> > > more racing going on to create new threads, so reading /proc/self/task
> > > is purely to get the list of threads you're waiting to enqueue
> > > themselves on the chain, not to find new threads you need to signal.
> > 
> > One thing to note: SYS_kill is not required to queue an unlimited
> > number of signals, and might not report failure to do so. We should
> > probably be using SYS_rt_sigqueue, counting the number of signals
> > successfully queued, and continue sending them during the loop that
> > monitors progress building the chain until the necessary number have
> > been successfully sent, if we're going to rely on the above properties
> > to guarantee that we've caught every thread.
> 
> yes, but even if we sent enough signals that cannot be dropped,
> and see all tasks in /proc/self/task to be caught in the handler,
> there might be tasks that haven't reached the handler yet and
> not visible in /proc/self/task yet. if they add themselves to the
> chain after we start processing it then they will wait forever.
> 
> as a ductape solution we could sleep a bit after all visible tasks
> are stopped to give a chance to the not yet visible ones to run
> (or to show up in /proc/self/task).

This is not going to help on a box that's swapping to hell where one
of the threads takes 30 seconds to run again (which is a real
possibility!)

> but ideally we would handle non-libc created threads too, so using
> libc.threads_minus_1 and __block_new_threads is already suboptimal,

Non-libc-created threads just can't be supported; they break in all
sorts of ways and have to just be considered totally undefined. The
synccall signal handler couldn't even perform any of the operations it
does, since libc functions all (by contract, if not in practice) rely
on having a valid thread pointer. We bend this rule slightly (and very
carefully) in posix_spawn to make syscalls with a context shared with
the thread in the parent process, but allowing it to be broken in
arbitrary ways by application code is just not practical.

> a mechanism like ptrace or SIGSTOP is needed that affects all tasks.

Yes, that would work, but is incompatible with running in an
already-traced task as far as I know.

Rich

next prev parent reply	other threads:[~2019-02-10 15:05 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-02 21:40 Alexey Izbyshev
2019-02-07 18:36 ` Rich Felker
2019-02-08 18:14   ` Alexey Izbyshev
2019-02-08 18:33     ` Rich Felker
2019-02-09 16:21       ` Szabolcs Nagy
2019-02-09 18:33         ` Alexey Izbyshev
2019-02-09 21:40           ` Szabolcs Nagy
2019-02-09 22:29             ` Alexey Izbyshev
2019-02-10  0:52             ` Rich Felker
2019-02-10  1:16               ` Szabolcs Nagy
2019-02-10  1:20                 ` Rich Felker
2019-02-10  4:01                   ` Rich Felker
2019-02-10 12:32                     ` Szabolcs Nagy
2019-02-10 15:05                       ` Rich Felker [this message]
2019-02-10 12:15                   ` Alexey Izbyshev
2019-02-10 14:57                     ` Rich Felker
2019-02-10 21:04       ` Alexey Izbyshev
2019-02-12 18:48 ` Rich Felker
2019-02-21  0:41   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190210150535.GF23599@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=izbyshev@ispras.ru \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).