mailing list of musl libc
 help / color / mirror / code / Atom feed
From: Rich Felker <dalias@libc.org>
To: musl@lists.openwall.com, Alexey Izbyshev <izbyshev@ispras.ru>
Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task
Date: Sat, 9 Feb 2019 19:52:50 -0500	[thread overview]
Message-ID: <20190210005250.GZ23599@brightrain.aerifal.cx> (raw)
In-Reply-To: <20190209214045.GO21289@port70.net>

On Sat, Feb 09, 2019 at 10:40:45PM +0100, Szabolcs Nagy wrote:
> * Alexey Izbyshev <izbyshev@ispras.ru> [2019-02-09 21:33:32 +0300]:
> > On 2019-02-09 19:21, Szabolcs Nagy wrote:
> > > * Rich Felker <dalias@libc.org> [2019-02-08 13:33:57 -0500]:
> > > > On Fri, Feb 08, 2019 at 09:14:48PM +0300, Alexey Izbyshev wrote:
> > > > > On 2/7/19 9:36 PM, Rich Felker wrote:
> > > > > >Does it work if we force two iterations of the readdir loop with no
> > > > > >tasks missed, rather than just one, to catch the case of missed
> > > > > >concurrent additions? I'm not sure. But all this makes me really
> > > > > >uncomfortable with the current approach.
> > > > >
> > > > > I've tested with 0, 1, 2 and 3 retries of the main loop if miss_cnt
> > > > > == 0. The test eventually failed in all cases, with 0 retries
> > > > > requiring only a handful of iterations, 1 -- on the order of 100, 2
> > > > > -- on the order of 10000 and 3 -- on the order of 100000.
> > > > 
> > > > Do you have a theory on the mechanism of failure here? I'm guessing
> > > > it's something like this: there's a thread that goes unseen in the
> > > > first round, and during the second round, it creates a new thread and
> > > > exits itself. The exit gets seen (again, it doesn't show up in the
> > > > dirents) but the new thread it created still doesn't. Is that right?
> > > > 
> > > > In any case, it looks like the whole mechanism we're using is
> > > > unreliable, so something needs to be done. My leaning is to go with
> > > > the global thread list and atomicity of list-unlock with exit.
> > > 
> > > yes that sounds possible, i added some instrumentation to musl
> > > and the trace shows situations like that before the deadlock,
> > > exiting threads can even cause old (previously seen) entries to
> > > disappear from the dir.
> > > 
> > Thanks for the thorough instrumentation! Your traces confirm both my theory
> > about the deadlock and unreliability of /proc/self/task.
> > 
> > I'd also done a very light instrumentation just before I got your email, but
> > it took me a while to understand the output I got (see below).
> 
> the attached patch fixes the issue on my machine.
> i don't know if this is just luck.
> 
> the assumption is that if /proc/self/task is read twice such that
> all tids in it seem to be active and caught, then all the active
> threads of the process are caught (no new threads that are already
> started but not visible there yet)

I'm skeptical of whether this should work in principle. If the first
scan of /proc/self/task misses tid J, and during the next scan, tid J
creates tid K then exits, it seems like we could see the same set of
tids on both scans.

Maybe it's salvagable though. Since __block_new_threads is true, in
order for this to happen, tid J must have been between the
__block_new_threads check in pthread_create and the clone syscall at
the time __synccall started. The number of threads in such a state
seems to be bounded by some small constant (like 2) times
libc.threads_minus_1+1, computed at any point after
__block_new_threads is set to true, so sufficiently heavy presignaling
(heavier than we have now) might suffice to guarantee that all are
captured. 

Rich


  parent reply	other threads:[~2019-02-10  0:52 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-02 21:40 Alexey Izbyshev
2019-02-07 18:36 ` Rich Felker
2019-02-08 18:14   ` Alexey Izbyshev
2019-02-08 18:33     ` Rich Felker
2019-02-09 16:21       ` Szabolcs Nagy
2019-02-09 18:33         ` Alexey Izbyshev
2019-02-09 21:40           ` Szabolcs Nagy
2019-02-09 22:29             ` Alexey Izbyshev
2019-02-10  0:52             ` Rich Felker [this message]
2019-02-10  1:16               ` Szabolcs Nagy
2019-02-10  1:20                 ` Rich Felker
2019-02-10  4:01                   ` Rich Felker
2019-02-10 12:32                     ` Szabolcs Nagy
2019-02-10 15:05                       ` Rich Felker
2019-02-10 12:15                   ` Alexey Izbyshev
2019-02-10 14:57                     ` Rich Felker
2019-02-10 21:04       ` Alexey Izbyshev
2019-02-12 18:48 ` Rich Felker
2019-02-21  0:41   ` Rich Felker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190210005250.GZ23599@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=izbyshev@ispras.ru \
    --cc=musl@lists.openwall.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).