From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 15441 invoked by alias); 15 Jun 2011 03:00:20 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 29481 Received: (qmail 7096 invoked from network); 15 Jun 2011 03:00:16 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received-SPF: none (ns1.primenet.com.au: domain at closedmail.com does not designate permitted sender hosts) From: Bart Schaefer Message-id: <110614195955.ZM10555@torch.brasslantern.com> Date: Tue, 14 Jun 2011 19:59:53 -0700 In-reply-to: <20110614195458.67af06e2@pws-pc.ntlworld.com> Comments: In reply to Peter Stephenson "Re: killing suspended jobs makes zsh hang after 47d1215" (Jun 14, 7:54pm) References: <86aadnwtl2.fsf@gmail.com> <110612072211.ZM26399@torch.brasslantern.com> <110612075958.ZM27334@torch.brasslantern.com> <8662oaha3g.fsf@gmail.com> <110612185339.ZM28551@torch.brasslantern.com> <20110613120747.2f018471@pwslap01u.europe.root.pri> <110613073748.ZM2701@torch.brasslantern.com> <20110614195458.67af06e2@pws-pc.ntlworld.com> X-Mailer: OpenZMail Classic (0.9.2 24April2005) To: Subject: Re: killing suspended jobs makes zsh hang after 47d1215 MIME-version: 1.0 Content-type: text/plain; charset=us-ascii On Jun 14, 7:54pm, Peter Stephenson wrote: } Subject: Re: killing suspended jobs makes zsh hang after 47d1215 } } On Mon, 13 Jun 2011 07:37:48 -0700 } Bart Schaefer wrote: } > In the 28965 case we might be able to fix it by having findproc() } > continue to scan the table for running jobs any time it encounters } > one that matches but is not running, as long as it eventually does } > return the first one it found if there are no others. } } Possibly I'm being dozy but this is the first thing I've heard that } sounds like a robust fix, if it's the case that we always find an } appropriate running process in the case that was causing the original } problem. The only remaining glitch could be that we find the wrong process in the event that somehow we recycled the whole range of PID values without ever managing to handle the signal for the state change of the first one to exit. I suppose one could concoct a scenario in which that's possible, but it'd be even harder to reproduce than the original bug from years ago. Anyway, that change looks something like this (second hunk just for completeness): Index: Src/jobs.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/jobs.c,v retrieving revision 1.83 diff -u -r1.83 jobs.c --- Src/jobs.c 12 Jun 2011 15:06:37 -0000 1.83 +++ Src/jobs.c 15 Jun 2011 02:56:08 -0000 @@ -160,6 +160,8 @@ Process pn; int i; + *jptr = NULL; + *pptr = NULL; for (i = 1; i <= maxjob; i++) { /* @@ -189,16 +191,16 @@ * the termination of the process which pid we were supposed * to return in a different job. */ - if (pn->pid == pid && (pn->status == SP_RUNNING || - WIFSTOPPED(pn->status))) { + if (pn->pid == pid) { *pptr = pn; *jptr = jobtab + i; - return 1; + if (pn->status == SP_RUNNING) + return 1; } } } - return 0; + return (*pptr && *jptr); } /* Does the given job number have any processes? */ Index: Src/signals.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/signals.c,v retrieving revision 1.61 diff -u -r1.61 signals.c --- Src/signals.c 14 Sep 2010 12:52:31 -0000 1.61 +++ Src/signals.c 15 Jun 2011 02:56:08 -0000 @@ -489,7 +489,6 @@ * Find the process and job containing this pid and * update it. */ - pn = NULL; if (findproc(pid, &jn, &pn, 0)) { #if defined(HAVE_WAIT3) && defined(HAVE_GETRUSAGE) struct timezone dummy_tz; --