From mboxrd@z Thu Jan  1 00:00:00 1970
X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13753
Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail
From: Rich Felker <dalias@libc.org>
Newsgroups: gmane.linux.lib.musl.general
Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task
Date: Sun, 10 Feb 2019 09:57:53 -0500
Message-ID: <20190210145753.GE23599@brightrain.aerifal.cx>
References: <20190207183626.GQ23599@brightrain.aerifal.cx>
 <f368f992-9c4c-8b7f-7b0e-39e39c27ebf7@ispras.ru>
 <20190208183357.GX23599@brightrain.aerifal.cx>
 <20190209162101.GN21289@port70.net>
 <6e0306699add531af519843de20c343a@ispras.ru>
 <20190209214045.GO21289@port70.net>
 <20190210005250.GZ23599@brightrain.aerifal.cx>
 <20190210011623.GP21289@port70.net>
 <20190210012032.GB23599@brightrain.aerifal.cx>
 <07389efbf06ad6903da1f92d37e1eb66@ispras.ru>
Reply-To: musl@lists.openwall.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226";
	logging-data="52112"; mail-complaints-to="usenet@blaine.gmane.org"
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: musl@lists.openwall.com
To: Alexey Izbyshev <izbyshev@ispras.ru>
Original-X-From: musl-return-13769-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 10 15:58:12 2019
Return-path: <musl-return-13769-gllmg-musl=m.gmane.org@lists.openwall.com>
Envelope-to: gllmg-musl@m.gmane.org
Original-Received: from mother.openwall.net ([195.42.179.200])
	by blaine.gmane.org with smtp (Exim 4.89)
	(envelope-from <musl-return-13769-gllmg-musl=m.gmane.org@lists.openwall.com>)
	id 1gsqYd-000DSW-AJ
	for gllmg-musl@m.gmane.org; Sun, 10 Feb 2019 15:58:11 +0100
Original-Received: (qmail 23634 invoked by uid 550); 10 Feb 2019 14:58:09 -0000
Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm
Precedence: bulk
List-Post: <mailto:musl@lists.openwall.com>
List-Help: <mailto:musl-help@lists.openwall.com>
List-Unsubscribe: <mailto:musl-unsubscribe@lists.openwall.com>
List-Subscribe: <mailto:musl-subscribe@lists.openwall.com>
List-ID: <musl.lists.openwall.com>
Original-Received: (qmail 23616 invoked from network); 10 Feb 2019 14:58:08 -0000
Content-Disposition: inline
In-Reply-To: <07389efbf06ad6903da1f92d37e1eb66@ispras.ru>
Original-Sender: Rich Felker <dalias@aerifal.cx>
Xref: news.gmane.org gmane.linux.lib.musl.general:13753
Archived-At: <http://permalink.gmane.org/gmane.linux.lib.musl.general/13753>

On Sun, Feb 10, 2019 at 03:15:55PM +0300, Alexey Izbyshev wrote:
> On 2019-02-10 04:20, Rich Felker wrote:
> >On Sun, Feb 10, 2019 at 02:16:23AM +0100, Szabolcs Nagy wrote:
> >>* Rich Felker <dalias@libc.org> [2019-02-09 19:52:50 -0500]:
> >>> Maybe it's salvagable though. Since __block_new_threads is true, in
> >>> order for this to happen, tid J must have been between the
> >>> __block_new_threads check in pthread_create and the clone syscall at
> >>> the time __synccall started. The number of threads in such a state
> >>> seems to be bounded by some small constant (like 2) times
> >>> libc.threads_minus_1+1, computed at any point after
> >>> __block_new_threads is set to true, so sufficiently heavy presignaling
> >>> (heavier than we have now) might suffice to guarantee that all are
> >>> captured.
> >>
> >>heavier presignaling may catch more threads, but we don't
> >>know how long should we wait until all signal handlers are
> >>invoked (to ensure that all tasks are enqueued on the call
> >>serializer chain before we start walking that list)
> >
> >That's why reading /proc/self/task is still necessary. However, it
> >seems useful to be able to prove you've queued enough signals that at
> >least as many threads as could possibly exist are already in a state
> >where they cannot return from a syscall with signals unblocked without
> >entering the signal handler. In that case you would know there's no
> >more racing going on to create new threads, so reading /proc/self/task
> >is purely to get the list of threads you're waiting to enqueue
> >themselves on the chain, not to find new threads you need to signal.
> 
> Similar to Szabolcs, I fail to see how heavier presignaling would
> help. Even if we're sure that we'll *eventually* catch all threads
> (including their future children) that were between
> __block_new_threads check in pthread_create and the clone syscall at
> the time we set __block_new_threads to 1, we still have no means to
> know whether we reached a stable state. In other words, we don't
> know when we should stop spinning in /proc/self/task loop because we
> may miss threads that are currently being created.

This seems correct.

> Also, note that __pthread_exit() blocks all signals and decrements
> libc.threads_minus_1 before exiting, so an arbitrary number of
> threads may be exiting while we're in /proc/self/task loop, and we
> know that concurrently exiting threads are related to misses.

This too -- there could in theory be unboundedly many threads that
have already decremented threads_minus_1 but haven't yet exited, and
this approach has no way to ensure that we wait for them to exit
before returning from __synccall.

I'm thinking that the problems here are unrecoverable and that we need
the thread list.

Rich