From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/13739 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Szabolcs Nagy Newsgroups: gmane.linux.lib.musl.general Subject: Re: __synccall: deadlock and reliance on racy /proc/self/task Date: Sat, 9 Feb 2019 17:21:02 +0100 Message-ID: <20190209162101.GN21289@port70.net> References: <1cc54dbe2e4832d804184f33cda0bdd1@ispras.ru> <20190207183626.GQ23599@brightrain.aerifal.cx> <20190208183357.GX23599@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="18951"; mail-complaints-to="usenet@blaine.gmane.org" User-Agent: Mutt/1.10.1 (2018-07-13) Cc: Alexey Izbyshev To: musl@lists.openwall.com Original-X-From: musl-return-13755-gllmg-musl=m.gmane.org@lists.openwall.com Sat Feb 09 17:21:18 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1gsVNU-0004q3-UI for gllmg-musl@m.gmane.org; Sat, 09 Feb 2019 17:21:17 +0100 Original-Received: (qmail 20147 invoked by uid 550); 9 Feb 2019 16:21:14 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 20125 invoked from network); 9 Feb 2019 16:21:14 -0000 Mail-Followup-To: musl@lists.openwall.com, Alexey Izbyshev Content-Disposition: inline In-Reply-To: <20190208183357.GX23599@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:13739 Archived-At: * Rich Felker [2019-02-08 13:33:57 -0500]: > On Fri, Feb 08, 2019 at 09:14:48PM +0300, Alexey Izbyshev wrote: > > On 2/7/19 9:36 PM, Rich Felker wrote: > > >>Of course, the larger problem remains: if we may miss some threads > > >>because of /proc, we may fail to call setuid() syscall in those > > >>threads. And that's indeed easily happens in my second test > > >>(attached: test-setuid-mismatch.c; expected to be run as a suid > > >>binary; note that I tested both with and without "presignalling"). > > > > > >Does it work if we force two iterations of the readdir loop with no > > >tasks missed, rather than just one, to catch the case of missed > > >concurrent additions? I'm not sure. But all this makes me really > > >uncomfortable with the current approach. > > > > I've tested with 0, 1, 2 and 3 retries of the main loop if miss_cnt > > == 0. The test eventually failed in all cases, with 0 retries > > requiring only a handful of iterations, 1 -- on the order of 100, 2 > > -- on the order of 10000 and 3 -- on the order of 100000. > > Do you have a theory on the mechanism of failure here? I'm guessing > it's something like this: there's a thread that goes unseen in the > first round, and during the second round, it creates a new thread and > exits itself. The exit gets seen (again, it doesn't show up in the > dirents) but the new thread it created still doesn't. Is that right? > > In any case, it looks like the whole mechanism we're using is > unreliable, so something needs to be done. My leaning is to go with > the global thread list and atomicity of list-unlock with exit. yes that sounds possible, i added some instrumentation to musl and the trace shows situations like that before the deadlock, exiting threads can even cause old (previously seen) entries to disappear from the dir. tm1 is libc.threads_minus_1, tt is target_tid, r==3 is ESRCH, r==110 is ETIMEDOUT my comments are indented: trace 1: 16:01:45.473004716 [10825] __pthread_exit tm1 2 -> 1 16:01:45.473007835 [10827] start tm1 1 pid 2922 tid 10827 current threads: 2922, 10827, (10825 is exiting) new iter 16:01:45.473012387 [2922] __synccall begin tm1 1 16:01:45.473022059 [2922] __synccall begin dir 16:01:45.473030024 [2922] __synccall tgkill 10825 r 3 tm1 1 16:01:45.473033342 [2922] __synccall end dir cnt 1 miss 0 r 3 tt 10825 (10825) tm1 1 bad: dir had 10825, but not 10827 (which already made a write syscall before signals were sent) 16:01:45.473037483 [2922] __synccall end loop tm1 1 timedout 0 head 0 16:01:45.473040958 [2922] __synccall single_threaded tm1 1 head 0x7ffff7f19310 16:01:45.473040736 [10827] handler new head 0x7ffff7f19310 tt 10825 (10825) chain node: tid 10827 target_sem 0 0 caller_sem 0 0 bad: late signal 16:01:45.473046943 [2922] __synccall finish chain node tid 10827 target_sem 0 0 caller_sem 0 0 new iter 16:01:45.473063328 [2922] __synccall begin tm1 1 16:01:45.473067031 [2922] __synccall begin dir 16:01:45.473073274 [2922] __synccall tgkill 10827 r 0 tm1 1 16:01:45.483124243 [2922] __synccall end dir cnt 1 miss 1 r 110 tt -2147472821 (10827) tm1 1 16:01:45.483137678 [2922] __synccall begin dir 16:01:45.483148355 [2922] __synccall tgkill 10827 r 0 tm1 1 16:01:45.493243161 [2922] __synccall end dir cnt 1 miss 1 r 110 tt -2147472821 (10827) tm1 1 trace 2: 15:09:57.926599655 [19672] __pthread_create tm1 2 -> 3 flags 0x5d0f00 detached printed right before __clone, current threads: 19667, 19670, 19672 new iter: 15:09:57.926603357 [19667] __synccall begin tm1 3 15:09:57.926610882 [19667] __synccall begin dir 15:09:57.926618254 [19667] __synccall tgkill 19670 r 0 tm1 3 15:09:57.926617745 [19672] handler new head 0x7ffff7f5f310 tt 19670 (19670) chain node: tid 19672 target_sem 0 0 caller_sem 0 0 15:09:57.926589725 [19670] handler done 15:09:57.926633870 [19670] handler new head 0x7ffff7f3bcd0 tt -2147463981 (19667) chain node: tid 19670 target_sem 0 0 caller_sem 0 0 15:09:57.926637148 [19667] __synccall already caught tid 19672 tm1 3 15:09:57.926642781 [19667] __synccall end dir cnt 2 miss 0 r 0 tt 19672 (19672) tm1 3 ok: dir had 19670, 19672 (new child of 19672 may not be created yet) 15:09:57.926649375 [19667] __synccall end loop tm1 3 timedout 0 head 0x7ffff7f3bcd0 15:09:57.926653791 [19667] __synccall call chain node tid 19670 target_sem -1 1 caller_sem 0 0 15:09:57.926668175 [19667] __synccall call chain node tid 19672 target_sem -1 1 caller_sem 0 0 15:09:57.926683042 [19667] __synccall single_threaded tm1 3 head 0x7ffff7f3bcd0 15:09:57.926688375 [19667] __synccall finish chain node tid 19670 target_sem -1 1 caller_sem 0 0 15:09:57.926695762 [19667] __synccall finish chain node tid 19672 target_sem -1 1 caller_sem 0 0 15:09:57.926698084 [19670] handler done 15:09:57.926476014 [19670] __pthread_create clone ret 19672 non-monotonic timestamp: the instrumentation code got interrupted 15:09:57.926707218 [19670] __pthread_exit tm1 3 -> 2 new iter: 15:09:57.926721373 [19667] __synccall begin tm1 2 15:09:57.926727938 [19667] __synccall begin dir 15:09:57.926738572 [19667] __synccall tgkill 19670 r 3 tm1 2 15:09:57.926743677 [19667] __synccall end dir cnt 1 miss 0 r 3 tt 19670 (19670) tm1 2 bad: dir had 19670 (which exited) but 19672 is missing ??? 15:09:57.926748951 [19672] handler done 15:09:57.926749685 [19667] __synccall end loop tm1 2 timedout 0 head 0 15:09:57.926756402 [19667] __synccall single_threaded tm1 2 head 0 15:09:57.926757551 [19672] handler new head 0x7ffff7f5f310 tt 19670 (19670) chain node: tid 19672 target_sem 0 0 caller_sem 0 0 bad: late signal 15:09:57.926761019 [19667] __synccall finish chain node tid 19672 target_sem 0 0 caller_sem 0 0 bad: got one sem_post, will still wait on another new iter: 15:09:57.926781060 [19667] __synccall begin tm1 2 15:09:57.926786600 [19667] __synccall begin dir 15:09:57.926793026 [19667] __synccall tgkill 19672 r 0 tm1 2 15:09:57.936881342 [19667] __synccall end dir cnt 1 miss 1 r 110 tt -2147463976 (19672) tm1 2 now dir has 19672, but it's already waiting in the handler.. 15:09:57.936902178 [19667] __synccall begin dir 15:09:57.936921680 [19667] __synccall tgkill 19672 r 0 tm1 2 15:09:57.947024323 [19667] __synccall end dir cnt 1 miss 1 r 110 tt -2147463976 (19672) tm1 2 15:09:57.947045505 [19667] __synccall begin dir 15:09:57.947064716 [19667] __synccall tgkill 19672 r 0 tm1 2 15:09:57.957161709 [19667] __synccall end dir cnt 1 miss 1 r 110 tt -2147463976 (19672) tm1 2 15:09:57.957196705 [19667] __synccall end loop tm1 2 timedout 3 head 0