From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13746 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Szabolcs Nagy Newsgroups: gmane.linux.lib.musl.general Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task Date: Sun, 10 Feb 2019 02:16:23 +0100 Message-ID: <20190210011623.GP21289@port70.net> References: <1cc54dbe2e4832d804184f33cda0bdd1@ispras.ru> <20190207183626.GQ23599@brightrain.aerifal.cx> <20190208183357.GX23599@brightrain.aerifal.cx> <20190209162101.GN21289@port70.net> <6e0306699add531af519843de20c343a@ispras.ru> <20190209214045.GO21289@port70.net> <20190210005250.GZ23599@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="205284"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.10.1 (2018-07-13) Cc: Alexey Izbyshev To: musl@lists.openwall.com Original-X-From: musl-return-13762-gllmg-musl=m.gmane.org@lists.openwall.com Sun Feb 10 02:16:38 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gsdja-000rHy-Jb for gllmg-musl@m.gmane.org; Sun, 10 Feb 2019 02:16:38 +0100 Original-Received: (qmail 1730 invoked by uid 550); 10 Feb 2019 01:16:36 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 1709 invoked from network); 10 Feb 2019 01:16:36 -0000 Mail-Followup-To: musl@lists.openwall.com, Alexey Izbyshev Content-Disposition: inline In-Reply-To: <20190210005250.GZ23599@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:13746 Archived-At: * Rich Felker [2019-02-09 19:52:50 -0500]: > On Sat, Feb 09, 2019 at 10:40:45PM +0100, Szabolcs Nagy wrote: > > the assumption is that if /proc/self/task is read twice such that > > all tids in it seem to be active and caught, then all the active > > threads of the process are caught (no new threads that are already > > started but not visible there yet) > > I'm skeptical of whether this should work in principle. If the first > scan of /proc/self/task misses tid J, and during the next scan, tid J > creates tid K then exits, it seems like we could see the same set of > tids on both scans. > > Maybe it's salvagable though. Since __block_new_threads is true, in > order for this to happen, tid J must have been between the > __block_new_threads check in pthread_create and the clone syscall at > the time __synccall started. The number of threads in such a state > seems to be bounded by some small constant (like 2) times > libc.threads_minus_1+1, computed at any point after > __block_new_threads is set to true, so sufficiently heavy presignaling > (heavier than we have now) might suffice to guarantee that all are > captured. heavier presignaling may catch more threads, but we don't know how long should we wait until all signal handlers are invoked (to ensure that all tasks are enqueued on the call serializer chain before we start walking that list)