From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-29481-mason-zsh=primenet.com.au@zsh.org>
Received: (qmail 15441 invoked by alias); 15 Jun 2011 03:00:20 -0000
Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Seq: 29481
Received: (qmail 7096 invoked from network); 15 Jun 2011 03:00:16 -0000
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_NONE
	autolearn=ham version=3.3.1
Received-SPF: none (ns1.primenet.com.au: domain at closedmail.com does not designate permitted sender hosts)
From: Bart Schaefer <schaefer@brasslantern.com>
Message-id: <110614195955.ZM10555@torch.brasslantern.com>
Date: Tue, 14 Jun 2011 19:59:53 -0700
In-reply-to: <20110614195458.67af06e2@pws-pc.ntlworld.com>
Comments: In reply to Peter Stephenson <p.w.stephenson@ntlworld.com>
 "Re: killing suspended jobs makes zsh hang after 47d1215" (Jun 14,  7:54pm)
References: <86aadnwtl2.fsf@gmail.com>
	<110612072211.ZM26399@torch.brasslantern.com>
	<110612075958.ZM27334@torch.brasslantern.com>	<8662oaha3g.fsf@gmail.com>
	<110612185339.ZM28551@torch.brasslantern.com>
	<20110613120747.2f018471@pwslap01u.europe.root.pri>
	<110613073748.ZM2701@torch.brasslantern.com>
	<20110614195458.67af06e2@pws-pc.ntlworld.com>
X-Mailer: OpenZMail Classic (0.9.2 24April2005)
To: <zsh-workers@zsh.org>
Subject: Re: killing suspended jobs makes zsh hang after 47d1215
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii

On Jun 14,  7:54pm, Peter Stephenson wrote:
} Subject: Re: killing suspended jobs makes zsh hang after 47d1215
}
} On Mon, 13 Jun 2011 07:37:48 -0700
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > In the 28965 case we might be able to fix it by having findproc()
} > continue to scan the table for running jobs any time it encounters
} > one that matches but is not running, as long as it eventually does
} > return the first one it found if there are no others.
} 
} Possibly I'm being dozy but this is the first thing I've heard that
} sounds like a robust fix, if it's the case that we always find an
} appropriate running process in the case that was causing the original
} problem.

The only remaining glitch could be that we find the wrong process in
the event that somehow we recycled the whole range of PID values
without ever managing to handle the signal for the state change of
the first one to exit.  I suppose one could concoct a scenario in
which that's possible, but it'd be even harder to reproduce than the
original bug from years ago.

Anyway, that change looks something like this (second hunk just for
completeness):

Index: Src/jobs.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/jobs.c,v
retrieving revision 1.83
diff -u -r1.83 jobs.c
--- Src/jobs.c	12 Jun 2011 15:06:37 -0000	1.83
+++ Src/jobs.c	15 Jun 2011 02:56:08 -0000
@@ -160,6 +160,8 @@
     Process pn;
     int i;
 
+    *jptr = NULL;
+    *pptr = NULL;
     for (i = 1; i <= maxjob; i++)
     {
 	/*
@@ -189,16 +191,16 @@
 	     * the termination of the process which pid we were supposed
 	     * to return in a different job.
 	     */
-	    if (pn->pid == pid && (pn->status == SP_RUNNING ||
-				   WIFSTOPPED(pn->status))) {
+	    if (pn->pid == pid) {
 		*pptr = pn;
 		*jptr = jobtab + i;
-		return 1;
+		if (pn->status == SP_RUNNING) 
+		    return 1;
 	    }
 	}
     }
 
-    return 0;
+    return (*pptr && *jptr);
 }
 
 /* Does the given job number have any processes? */
Index: Src/signals.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/signals.c,v
retrieving revision 1.61
diff -u -r1.61 signals.c
--- Src/signals.c	14 Sep 2010 12:52:31 -0000	1.61
+++ Src/signals.c	15 Jun 2011 02:56:08 -0000
@@ -489,7 +489,6 @@
 	 * Find the process and job containing this pid and
 	 * update it.
 	 */
-	pn = NULL;
 	if (findproc(pid, &jn, &pn, 0)) {
 #if defined(HAVE_WAIT3) && defined(HAVE_GETRUSAGE)
 	    struct timezone dummy_tz;

--