From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13742 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Alexey Izbyshev Newsgroups: gmane.linux.lib.musl.general Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task Date: Sun, 10 Feb 2019 01:29:32 +0300 Message-ID: References: <1cc54dbe2e4832d804184f33cda0bdd1@ispras.ru> <20190207183626.GQ23599@brightrain.aerifal.cx> <20190208183357.GX23599@brightrain.aerifal.cx> <20190209162101.GN21289@port70.net> <6e0306699add531af519843de20c343a@ispras.ru> <20190209214045.GO21289@port70.net> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="220608"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Roundcube Webmail/1.1.2 Cc: Szabolcs Nagy To: musl@lists.openwall.com Original-X-From: musl-return-13758-gllmg-musl=m.gmane.org@lists.openwall.com Sat Feb 09 23:29:47 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gsb86-000vHo-8w for gllmg-musl@m.gmane.org; Sat, 09 Feb 2019 23:29:46 +0100 Original-Received: (qmail 13474 invoked by uid 550); 9 Feb 2019 22:29:43 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 13450 invoked from network); 9 Feb 2019 22:29:43 -0000 In-Reply-To: <20190209214045.GO21289@port70.net> X-Sender: izbyshev@ispras.ru Xref: news.gmane.org gmane.linux.lib.musl.general:13742 Archived-At: On 2019-02-10 00:40, Szabolcs Nagy wrote: > the attached patch fixes the issue on my machine. > i don't know if this is just luck. > > the assumption is that if /proc/self/task is read twice such that > all tids in it seem to be active and caught, then all the active > threads of the process are caught (no new threads that are already > started but not visible there yet) > > anyway i had to retry until there are no exiting threads in dir to > reliably fix the deadlock. Unfortunately, on 4.15.x kernel, I've got both the deadlock (~23000 iterations) and the mismatch (after I removed kill() loop; ~19000 iterations). On 4.4.x, it took ~30 mln. iterations to get the mismatch (on deadlock-free version): --iter: 30198000 --iter: 30199000 mismatch: tid 539: 1000 != 0 Alexey