From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13740 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Alexey Izbyshev Newsgroups: gmane.linux.lib.musl.general Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task Date: Sat, 09 Feb 2019 21:33:32 +0300 Message-ID: <6e0306699add531af519843de20c343a@ispras.ru> References: <1cc54dbe2e4832d804184f33cda0bdd1@ispras.ru> <20190207183626.GQ23599@brightrain.aerifal.cx> <20190208183357.GX23599@brightrain.aerifal.cx> <20190209162101.GN21289@port70.net> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="135307"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Roundcube Webmail/1.1.2 To: musl@lists.openwall.com, Szabolcs Nagy Original-X-From: musl-return-13756-gllmg-musl=m.gmane.org@lists.openwall.com Sat Feb 09 19:33:47 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gsXRj-000Z6d-15 for gllmg-musl@m.gmane.org; Sat, 09 Feb 2019 19:33:47 +0100 Original-Received: (qmail 23773 invoked by uid 550); 9 Feb 2019 18:33:44 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 23725 invoked from network); 9 Feb 2019 18:33:43 -0000 In-Reply-To: <20190209162101.GN21289@port70.net> X-Sender: izbyshev@ispras.ru Xref: news.gmane.org gmane.linux.lib.musl.general:13740 Archived-At: On 2019-02-09 19:21, Szabolcs Nagy wrote: > * Rich Felker [2019-02-08 13:33:57 -0500]: >> On Fri, Feb 08, 2019 at 09:14:48PM +0300, Alexey Izbyshev wrote: >> > On 2/7/19 9:36 PM, Rich Felker wrote: >> > >Does it work if we force two iterations of the readdir loop with no >> > >tasks missed, rather than just one, to catch the case of missed >> > >concurrent additions? I'm not sure. But all this makes me really >> > >uncomfortable with the current approach. >> > >> > I've tested with 0, 1, 2 and 3 retries of the main loop if miss_cnt >> > == 0. The test eventually failed in all cases, with 0 retries >> > requiring only a handful of iterations, 1 -- on the order of 100, 2 >> > -- on the order of 10000 and 3 -- on the order of 100000. >> >> Do you have a theory on the mechanism of failure here? I'm guessing >> it's something like this: there's a thread that goes unseen in the >> first round, and during the second round, it creates a new thread and >> exits itself. The exit gets seen (again, it doesn't show up in the >> dirents) but the new thread it created still doesn't. Is that right? >> >> In any case, it looks like the whole mechanism we're using is >> unreliable, so something needs to be done. My leaning is to go with >> the global thread list and atomicity of list-unlock with exit. > > yes that sounds possible, i added some instrumentation to musl > and the trace shows situations like that before the deadlock, > exiting threads can even cause old (previously seen) entries to > disappear from the dir. > Thanks for the thorough instrumentation! Your traces confirm both my theory about the deadlock and unreliability of /proc/self/task. I'd also done a very light instrumentation just before I got your email, but it took me a while to understand the output I got (see below). I've used test-setuid-mismatch.c from my first email, with musl modified to avoid the deadlock (by removing kill() loop from __synccall) and retry readdir() loop several times in case miss_cnt == 0. I've also added tracing in several points: * pthread_exit (just before __unmapself) * __syncccall: before readdir() loop (prints the retry count) * __syncccall: tid of the current dentry (only if tid != pid and we've not seen it in the chain) * __syncccall: errors of tgkill() and futex() in the readdir() loop A couple of traces with max retry == 1: --iter: 43 retry 0 tid 21089 tid 21091 retry 1 exit 21091 --iter: 44 retry 0 tid 21091 futex: ESRCH tid 21093 exit 21093 retry 1 mismatch: tid 21094: 0 != 23517 ------------------ --iter: 3 retry 0 tid 15974 futex: ESRCH tid 15977 retry 1 tid 15978 --iter: 4 exit 15977 retry 0 tid 15977 tid 15978 exit 15978 retry 1 tid 15978 tgkill: ESRCH mismatch: tid 15979: 0 != 23517 And with max retry == 2: --iter: 19812 retry 0 tid 29606 retry 1 retry 2 exit 29606 --iter: 19813 retry 0 tid 29606 futex: ESRCH tid 29609 exit 29609 retry 1 tid 29609 tgkill: ESRCH retry 2 mismatch: tid 29610: 23517 != 0 ------------------ --iter: 9762 retry 0 tid 14859 tid 14860 retry 1 retry 2 --iter: 9763 exit 14859 retry 0 tid 14859 exit 14860 tid 14860 retry 1 retry 2 mismatch: tid 14862: 23517 != 0 So, it's clear that /proc/self/task easily misses concurrently created threads, at least if other threads exit at the same time. And Szabolcs's traces indicate that even threads already running in userspace can be missed. Now, about the strange output I mentioned. Consider one of the above fragments: --iter: 4 exit 15977 retry 0 tid 15977 tid 15978 exit 15978 retry 1 tid 15978 tgkill: ESRCH mismatch: tid 15979: 0 != 23517 Note that "tid 15978" is printed two times. Recall that it's printed only if we haven't seen it in the chain. But how it's possible that we haven't seen it *two* times? Since we waited on the futex the first time and we got the lock, the signal handler must have unlocked it. There is even a comment before futex() call: /* Obtaining the lock means the thread responded. ESRCH * means the target thread exited, which is okay too. */ If it the signal handler reached futex unlock code, it must have updated the chain, and we must see the tid in the chain on the next retry and don't print it. Apparently, there is another reason of futex(FUTEX_LOCK_PI) success: the owner is exiting concurrently (which is also indicated by the subsequent failure of tgkill with ESRCH). So obtaining the lock doesn't necessarily mean that the owner responded: it may also mean that the owner is (about to be?) dead. Alexey