From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13747 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task Date: Sat, 9 Feb 2019 20:20:32 -0500 Message-ID: <20190210012032.GB23599@brightrain.aerifal.cx> References: <1cc54dbe2e4832d804184f33cda0bdd1@ispras.ru> <20190207183626.GQ23599@brightrain.aerifal.cx> <20190208183357.GX23599@brightrain.aerifal.cx> <20190209162101.GN21289@port70.net> <6e0306699add531af519843de20c343a@ispras.ru> <20190209214045.GO21289@port70.net> <20190210005250.GZ23599@brightrain.aerifal.cx> <20190210011623.GP21289@port70.net> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="225538"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com, Alexey Izbyshev Original-X-From: musl-return-13763-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 10 02:20:48 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gsdnb-000wZN-MZ for gllmg-musl@m.gmane.org; Sun, 10 Feb 2019 02:20:47 +0100 Original-Received: (qmail 5697 invoked by uid 550); 10 Feb 2019 01:20:45 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 5651 invoked from network); 10 Feb 2019 01:20:44 -0000 Content-Disposition: inline In-Reply-To: <20190210011623.GP21289@port70.net> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:13747 Archived-At: On Sun, Feb 10, 2019 at 02:16:23AM +0100, Szabolcs Nagy wrote: > * Rich Felker [2019-02-09 19:52:50 -0500]: > > On Sat, Feb 09, 2019 at 10:40:45PM +0100, Szabolcs Nagy wrote: > > > the assumption is that if /proc/self/task is read twice such that > > > all tids in it seem to be active and caught, then all the active > > > threads of the process are caught (no new threads that are already > > > started but not visible there yet) > > > > I'm skeptical of whether this should work in principle. If the first > > scan of /proc/self/task misses tid J, and during the next scan, tid J > > creates tid K then exits, it seems like we could see the same set of > > tids on both scans. > > > > Maybe it's salvagable though. Since __block_new_threads is true, in > > order for this to happen, tid J must have been between the > > __block_new_threads check in pthread_create and the clone syscall at > > the time __synccall started. The number of threads in such a state > > seems to be bounded by some small constant (like 2) times > > libc.threads_minus_1+1, computed at any point after > > __block_new_threads is set to true, so sufficiently heavy presignaling > > (heavier than we have now) might suffice to guarantee that all are > > captured. > > heavier presignaling may catch more threads, but we don't > know how long should we wait until all signal handlers are > invoked (to ensure that all tasks are enqueued on the call > serializer chain before we start walking that list) That's why reading /proc/self/task is still necessary. However, it seems useful to be able to prove you've queued enough signals that at least as many threads as could possibly exist are already in a state where they cannot return from a syscall with signals unblocked without entering the signal handler. In that case you would know there's no more racing going on to create new threads, so reading /proc/self/task is purely to get the list of threads you're waiting to enqueue themselves on the chain, not to find new threads you need to signal. Rich