From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11389 invoked from network); 2 May 2008 22:45:48 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.4 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 2 May 2008 22:45:48 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 64736 invoked from network); 2 May 2008 22:45:34 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 2 May 2008 22:45:34 -0000 Received: (qmail 17161 invoked by alias); 2 May 2008 22:45:21 -0000 Mailing-List: contact zsh-users-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 12815 Received: (qmail 17144 invoked from network); 2 May 2008 22:45:20 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 2 May 2008 22:45:20 -0000 Received: from mtaout02-winn.ispmail.ntl.com (mtaout02-winn.ispmail.ntl.com [81.103.221.48]) by bifrost.dotsrc.org (Postfix) with ESMTP id B598780ED172 for ; Sat, 3 May 2008 00:45:10 +0200 (CEST) Received: from aamtaout02-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout02-winn.ispmail.ntl.com with ESMTP id <20080502224844.TACX17818.mtaout02-winn.ispmail.ntl.com@aamtaout02-winn.ispmail.ntl.com>; Fri, 2 May 2008 23:48:44 +0100 Received: from pws-pc ([81.107.40.67]) by aamtaout02-winn.ispmail.ntl.com with ESMTP id <20080502224738.XING17393.aamtaout02-winn.ispmail.ntl.com@pws-pc>; Fri, 2 May 2008 23:47:38 +0100 Date: Fri, 2 May 2008 23:44:21 +0100 From: Peter Stephenson To: Kamil =?UTF-8?B?Sm/FhGNh?= Cc: zsh-users@sunsite.dk Subject: Re: Zsh hangs sometimes? Message-ID: <20080502234421.1382582d@pws-pc> In-Reply-To: <20080502173943.GA8824@alfa.kjonca> References: <1209745744.25440.ezmlm@sunsite.dk> <20080502173943.GA8824@alfa.kjonca> X-Mailer: Claws Mail 3.3.1 (GTK+ 2.12.5; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Scanned: ClamAV 0.91.2/7014/Fri May 2 22:17:48 2008 on bifrost X-Virus-Status: Clean On Fri, 2 May 2008 19:39:43 +0200 Kamil Jo=C5=84ca wrote: > And sometimes this hangs :(=20 > there's not child processes. OK, as you already know this isn't a very good way of doing things but it does tickle a bug and I've managed to get it to happen. With a bit of luck, I think this fixes it. On attaching to the shell I happened to notice that there were two jobs in the job table with the same process number. One was marked as done, and the other wasn't: the shell was waiting for the second job to finish. Further probing confirmed my suspicion that the shell had decided to fish out the PID for a job that was already done and mark it as, well, even more well done. So the newly done job never got cleared and the shell waited for it for ever. This would only happen if you had lots and lots and lots of processes so that there was a chance that the process numbers would wrap before an entry in the table was cleared (an essentially random process depending on when children died)---not something that happens with most programmes. The fix is not to fish out jobs that are already done. I think the use of findproc() is restricted to occasions when you've just found out, or want to find out, something new about a process and hence ignoring those attached to done and dusted jobs is OK. I've tidied up zhandler() because the comments were all over the place and I took against the way a goto in a for loop was written, and I've tidied up the STAT_ definitions because I can't do powers of two in my head well enough. The only actual fix is the new test in the first hunk, so you can ignore the rest until I commit it. I'm extremely relieved this wasn't a horrible race. Index: Src/jobs.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/zsh/zsh/Src/jobs.c,v retrieving revision 1.62 diff -u -r1.62 jobs.c --- Src/jobs.c 25 Mar 2008 18:17:08 -0000 1.62 +++ Src/jobs.c 2 May 2008 22:33:14 -0000 @@ -153,6 +153,15 @@ =20 for (i =3D 1; i <=3D maxjob; i++) { + /* + * We are only interested in jobs with processes still + * marked as live. Careful in case there's an identical + * process number in a job we haven't quite got around + * to deleting. + */ + if (jobtab[i].stat & STAT_DONE) + continue; + for (pn =3D aux ? jobtab[i].auxprocs : jobtab[i].procs; pn; pn =3D pn->next) if (pn->pid =3D=3D pid) { Index: Src/signals.c =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/zsh/zsh/Src/signals.c,v retrieving revision 1.45 diff -u -r1.45 signals.c --- Src/signals.c 1 May 2007 09:35:05 -0000 1.45 +++ Src/signals.c 2 May 2008 22:33:14 -0000 @@ -408,15 +408,21 @@ signal_process(sig); =20 sigfillset(&newmask); - oldmask =3D signal_block(newmask); /* Block all signals tempora= rily */ + /* Block all signals temporarily */ + oldmask =3D signal_block(newmask); =20 #if defined(NO_SIGNAL_BLOCKING) - do_jump =3D suspend_longjmp; /* do we need to longjmp to = signal_suspend */ - suspend_longjmp =3D 0; /* In case a SIGCHLD somehow= arrives */ - - if (sig =3D=3D SIGCHLD) { /* Traps can cause nested = signal_suspend() */ - if (do_jump) - jump_to =3D suspend_jmp_buf; /* Copy suspend_jmp_buf = */ + /* do we need to longjmp to signal_suspend */ + do_jump =3D suspend_longjmp; + /* In case a SIGCHLD somehow arrives */ + suspend_longjmp =3D 0; + + /* Traps can cause nested signal_suspend() */ + if (sig =3D=3D SIGCHLD) { + if (do_jump) { + /* Copy suspend_jmp_buf */ + jump_to =3D suspend_jmp_buf; + } } #endif =20 @@ -425,30 +431,36 @@ int temp_rear =3D ++queue_rear % MAX_QUEUE_SIZE; =20 DPUTS(temp_rear =3D=3D queue_front, "BUG: signal queue full"); - if (temp_rear !=3D queue_front) { /* Make sure it's not full (extr= emely unlikely) */ - queue_rear =3D temp_rear; /* ok, not full, so= add to queue */ - signal_queue[queue_rear] =3D sig; /* save signal caug= ht */ - signal_mask_queue[queue_rear] =3D oldmask; /* save current sig= nal mask */ + /* Make sure it's not full (extremely unlikely) */ + if (temp_rear !=3D queue_front) { + /* ok, not full, so add to queue */ + queue_rear =3D temp_rear; + /* save signal caught */ + signal_queue[queue_rear] =3D sig; + /* save current signal mask */ + signal_mask_queue[queue_rear] =3D oldmask; } signal_reset(sig); return; } =20 - signal_setmask(oldmask); /* Reset signal mask, signal traps o= k now */ + /* Reset signal mask, signal traps ok now */ + signal_setmask(oldmask); =20 switch (sig) { case SIGCHLD: =20 /* keep WAITING until no more child processes to reap */ - for (;;) - cont: { - int old_errno =3D errno; /* save the errno, since WAIT may cha= nge it */ + for (;;) { + /* save the errno, since WAIT may change it */ + int old_errno =3D errno; int status; Job jn; Process pn; - pid_t pid; + pid_t pid; pid_t *procsubpid =3D &cmdoutpid; int *procsubval =3D &cmdoutval; + int cont =3D 0; struct execstack *es =3D exstack; =20 /* @@ -471,8 +483,8 @@ # endif #endif =20 - if (!pid) /* no more children to reap */ - break; + if (!pid) /* no more children to reap */ + break; =20 /* check if child returned was from process substitution */ for (;;) { @@ -483,7 +495,8 @@ else *procsubval =3D WEXITSTATUS(status); get_usage(); - goto cont; + cont =3D 1; + break; } if (!es) break; @@ -491,16 +504,22 @@ procsubval =3D &es->cmdoutval; es =3D es->next; } + if (cont) + continue; =20 /* check for WAIT error */ - if (pid =3D=3D -1) { - if (errno !=3D ECHILD) - zerr("wait failed: %e", errno); - errno =3D old_errno; /* WAIT changed errno, so restore = the original */ - break; - } + if (pid =3D=3D -1) { + if (errno !=3D ECHILD) + zerr("wait failed: %e", errno); + /* WAIT changed errno, so restore the original */ + errno =3D old_errno; + break; + } =20 - /* Find the process and job containing this pid and update it. */ + /* + * Find the process and job containing this pid and + * update it. + */ if (findproc(pid, &jn, &pn, 0)) { #if defined(HAVE_WAIT3) && defined(HAVE_GETRUSAGE) struct timezone dummy_tz; @@ -517,11 +536,12 @@ } else { /* If not found, update the shell record of time spent by * children in sub processes anyway: otherwise, this - * will get added on to the next found process that terminates. + * will get added on to the next found process that + * terminates. */ get_usage(); } - } + } break; =20 case SIGHUP: Index: Src/zsh.h =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v retrieving revision 1.128 diff -u -r1.128 zsh.h --- Src/zsh.h 29 Apr 2008 17:19:26 -0000 1.128 +++ Src/zsh.h 2 May 2008 22:33:14 -0000 @@ -857,24 +857,24 @@ struct ttyinfo *ty; /* the modes specified by STTY */ }; =20 -#define STAT_CHANGED (1<<0) /* status changed and not reported */ -#define STAT_STOPPED (1<<1) /* all procs stopped or exited */ -#define STAT_TIMED (1<<2) /* job is being timed */ -#define STAT_DONE (1<<3) /* job is done */ -#define STAT_LOCKED (1<<4) /* shell is finished creating this job, */ - /* may be deleted from job table */ -#define STAT_NOPRINT (1<<5) /* job was killed internally, */ - /* we don't want to show that */ -#define STAT_INUSE (1<<6) /* this job entry is in use */ -#define STAT_SUPERJOB (1<<7) /* job has a subjob */ -#define STAT_SUBJOB (1<<8) /* job is a subjob */ -#define STAT_WASSUPER (1<<9) /* was a super-job, sub-job needs to be */ - /* deleted */ -#define STAT_CURSH (1<<10) /* last command is in current shell */ -#define STAT_NOSTTY (1<<11) /* the tty settings are not inherited */ - /* from this job when it exits. */ -#define STAT_ATTACH (1<<12) /* delay reattaching shell to tty */ -#define STAT_SUBLEADER (1<<13) /* is super-job, but leader is sub-shell */ +#define STAT_CHANGED (0x0001) /* status changed and not reported */ +#define STAT_STOPPED (0x0002) /* all procs stopped or exited */ +#define STAT_TIMED (0x0004) /* job is being timed */ +#define STAT_DONE (0x0008) /* job is done */ +#define STAT_LOCKED (0x0010) /* shell is finished creating this job, */ + /* may be deleted from job table */ +#define STAT_NOPRINT (0x0020) /* job was killed internally, */ + /* we don't want to show that */ +#define STAT_INUSE (0x0040) /* this job entry is in use */ +#define STAT_SUPERJOB (0x0080) /* job has a subjob */ +#define STAT_SUBJOB (0x0100) /* job is a subjob */ +#define STAT_WASSUPER (0x0200) /* was a super-job, sub-job needs to be */ + /* deleted */ +#define STAT_CURSH (0x0400) /* last command is in current shell */ +#define STAT_NOSTTY (0x0800) /* the tty settings are not inherited */ + /* from this job when it exits. */ +#define STAT_ATTACH (0x1000) /* delay reattaching shell to tty */ +#define STAT_SUBLEADER (0x2000) /* is super-job, but leader is sub-shell = */ =20 #define SP_RUNNING -1 /* fake status for jobs currently running */ =20 --=20 Peter Stephenson Web page now at http://homepage.ntlworld.com/p.w.stephenson/