From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 3317 invoked by alias); 7 Aug 2015 00:46:18 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 36009 Received: (qmail 23410 invoked from network); 7 Aug 2015 00:46:17 -0000 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RCVD_IN_MSPIKE_H2 autolearn=ham autolearn_force=no version=3.4.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=9uYJipXJMvikPTyXVU1McChjmlq8K5dgrC5hWUDMDFI=; b=uEFauuqkT8Nja12Plw3IGEm4sTp+Q/Q2fvE+XKX79Ruz652cFBWtZij86lFDkYG494 nah3NbR4a5tgwkUndkuO1IOfRMOuoahA+U8IqtxbhazhCkPFKzY/8QCjTODEf6i0/TCH cAEYF3s6I4f+NhjI9HjH8X1azuEafNrgxmhZoa8vSLC5GN0T1twKjegTlT/aADwqYVl0 lWs8yTWwfjHNwkGempau37abQyGErwOabbfvKdksEf/OKpbj5/lBPlH2cx4HfdOj16fC iy0U46c6ikCTKJ0P12A+KowdiS5WSY+KxhZreb5FLUo8VUPeUSy7DSlePqRkmq2/6UYP UG5g== X-Received: by 10.202.169.215 with SMTP id s206mr4020795oie.71.1438908374774; Thu, 06 Aug 2015 17:46:14 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <150806085451.ZM402@torch.brasslantern.com> References: <150803085228.ZM24837@torch.brasslantern.com> <150803135818.ZM24977@torch.brasslantern.com> <150804235400.ZM9958@torch.brasslantern.com> <150805085258.ZM17673@torch.brasslantern.com> <150805115249.ZM7158@torch.brasslantern.com> <150805132014.ZM7746@torch.brasslantern.com> <150805220656.ZM18545@torch.brasslantern.com> <150806085451.ZM402@torch.brasslantern.com> From: Mathias Fredriksson Date: Fri, 7 Aug 2015 03:45:35 +0300 Message-ID: Subject: Re: Deadlock when receiving kill-signal from child process To: zsh-workers@zsh.org Content-Type: text/plain; charset=UTF-8 On Thu, Aug 6, 2015 at 6:54 PM, Bart Schaefer wrote: } } This is the stdio thing again. Anyone reading this familar enough with } the POSIX or C standards to point to whether stdio is required to be } signal-safe with pthreads? I.e., is this our bug or someone else's? Sadly I can't be of much assistance here, I believe you can't call pthread mutexes from a signal handler, but that isn't whats happening here? If I understand correctly a signal is received while a mutex lock is (being) aquired. } } NO_TRAPS_ASYNC ? Yes, my bad, typo in the email message, used the correct setopt. } } As with the previous dotrapargs() patch, I'm a little nervous about } the dont_queue_signals() bits, but that's the only safe way to do } the disabling part of signal queueing when the enabling part is not } in local scope. I'm not quite sure I understand what these changes do, but at least this last patch made it a lot harder for me to have zsh lock up. I had to leave my script running in a while true; do ...; done loop (eventually, 30sec-10min it would hit a lock). #0 0x00007fff8abfe72a in __sigsuspend () #1 0x0000000107b59287 in signal_suspend () #2 0x0000000107b30671 in zwaitjob () #3 0x0000000107b304c4 in waitjobs () #4 0x0000000107b130e8 in execpline () #5 0x0000000107b122ce in execlist () #6 0x0000000107b120f6 in execode () #7 0x0000000107b15ebf in runshfunc () #8 0x0000000107b157f4 in doshfunc () #9 0x0000000107b5a70b in dotrapargs () #10 0x0000000107b597b2 in handletrap () #11 0x0000000107b590b0 in zhandler () #12 #13 0x00007fff82ce43a8 in ferror () #14 0x0000000107b2abc2 in loop () #15 0x0000000107b2d7e0 in zsh_main () #16 0x00007fff8610c5c9 in start () This just seems like the same mutex stuff again: #0 0x00007fff8abfe166 in __psynch_mutexwait () #1 0x00007fff8e4b578a in _pthread_mutex_lock () #2 0x00007fff82ce5750 in fputc () #3 0x0000000102c20cd5 in zputs () #4 0x0000000102c20b3c in mb_niceformat () #5 0x0000000102c201cd in zwarning () #6 0x0000000102c20376 in zwarn () #7 0x0000000102c143da in wait_for_processes () #8 0x0000000102c140a6 in zhandler () #9 #10 0x00007fff8abfe72a in __sigsuspend () This also looks vaguely familiar but might as well post it: #0 0x00007fff8abf95da in syscall_thread_switch () #1 0x00007fff853a982d in _OSSpinLockLockSlow () #2 0x00007fff896d771b in szone_malloc_should_clear () #3 0x00007fff896d7667 in malloc_zone_malloc () #4 0x00007fff896d6187 in malloc () #5 0x0000000101ccdeaf in zalloc () #6 0x0000000101cee2ca in ztrdup () #7 0x0000000101cf89b3 in mb_niceformat () #8 0x0000000101cf81cd in zwarning () #9 0x0000000101cf8376 in zwarn () #10 0x0000000101cec3da in wait_for_processes () #11 0x0000000101cec0a6 in zhandler () #12 #13 0x00007fff853aacd1 in _os_lock_spin_lock () #14 0x00007fff896d98d6 in szone_free_definite_size () #15 0x0000000101ca5874 in execlist () #16 0x0000000101ca50f6 in execode () #17 0x0000000101ca8ebf in runshfunc () #18 0x0000000101ca87f4 in doshfunc () #19 0x0000000101ced70b in dotrapargs () #20 0x0000000101cec7b2 in handletrap () #21 0x0000000101ced9ec in unqueue_traps () #22 0x0000000101cc372b in zwaitjob () #23 0x0000000101cc34c4 in waitjobs () #24 0x0000000101ca60e8 in execpline () #25 0x0000000101ca52ce in execlist () #26 0x0000000101ccaad0 in execwhile () #27 0x0000000101cac0d9 in execcmd () #28 0x0000000101ca5d24 in execpline () #29 0x0000000101ca52ce in execlist () #30 0x0000000101ca50f6 in execode () #31 0x0000000101cbdb8f in loop () #32 0x0000000101cc07e0 in zsh_main () #33 0x00007fff8610c5c9 in start () Bonus NO_TRAPS_ASYNC: #0 0x00007fff8abfe72a in __sigsuspend () #1 0x0000000107509287 in signal_suspend () #2 0x00000001074e0671 in zwaitjob () #3 0x00000001074e04c4 in waitjobs () #4 0x00000001074c30e8 in execpline () #5 0x00000001074c22ce in execlist () #6 0x00000001074c20f6 in execode () #7 0x00000001074c5ebf in runshfunc () #8 0x00000001074c57f4 in doshfunc () #9 0x000000010750a70b in dotrapargs () #10 0x00000001075097b2 in handletrap () #11 0x00000001075090b0 in zhandler () #12 #13 0x00007fff8abfe97a in write$NOCANCEL () #14 0x00007fff82ceb9ed in _swrite () #15 0x00007fff82ce44a7 in __sflush () #16 0x00007fff82ce43f5 in fflush () #17 0x0000000107515376 in zwarn () #18 0x00000001075093da in wait_for_processes () #19 0x00000001075090a6 in zhandler () #20 #21 0x00007fff8abffa12 in sigprocmask ()