From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-10016-mason-zsh=primenet.com.au@sunsite.auc.dk>
Received: (qmail 21217 invoked from network); 9 Mar 2000 12:26:19 -0000
Received: from sunsite.auc.dk (130.225.51.30)
  by ns1.primenet.com.au with SMTP; 9 Mar 2000 12:26:19 -0000
Received: (qmail 13559 invoked by alias); 9 Mar 2000 12:25:42 -0000
Mailing-List: contact zsh-workers-help@sunsite.auc.dk; run by ezmlm
Precedence: bulk
X-No-Archive: yes
X-Seq: 10016
Received: (qmail 13546 invoked from network); 9 Mar 2000 12:25:41 -0000
Date: Thu, 9 Mar 2000 13:01:32 +0100 (MET)
Message-Id: <200003091201.NAA25784@beta.informatik.hu-berlin.de>
From: Sven Wischnowsky <wischnow@informatik.hu-berlin.de>
To: zsh-workers@sunsite.auc.dk
In-reply-to: "Bart Schaefer"'s message of Wed, 8 Mar 2000 07:30:38 +0000
Subject: Re: Bogus "no such job" (Re: Preliminary release of 3.0.8 - please test)


Bart Schaefer wrote:

> On Mar 3,  2:55am, Geoff Wing wrote:
> } Subject: Re: Preliminary release of 3.0.8 - please test
> }
> } Bart Schaefer <schaefer@candle.brasslantern.com> typed:
> } :On Feb 29, 10:33am, Geoff Wing wrote:
> } :} Subject: Re: Preliminary release of 3.0.8 - please test
> } :} After some initial usage, got it into a state of:
> } :} % %
> } :} fg: no such job: 3
> } :} % %%
> } :} fg: no such job: 3
> } :} % fg
> } :} fg: no current job
> } :} % jobs
> } :} %
> } :Hrm.  The job handling code is now identical to 3.1.6-dev-19, so if you
> } :can get 3.0.8 into that state theres a problem for 3.1.6 as well.
> } 
> } I'm thinking that getjob() may need a setcurjob() before it checks curjob.
> 
> Since Sven has been incommunicado for a couple of days, I tried to look
> into this myself in more detail.  The only two places where getjob() is
> called are from bin_kill(), and from bin_fg() *after* the setcurjob()
> that you noted.
> 
> I can believe that a race condition might cause "no such job: 3" once,
> but twice in a row is impossible.  So the only possible answer is that
> the one and only job has STAT_NOPRINT set but *not* STAT_SUBJOB, which
> in turn happens only at exec.c:768 and 806 (in 3.0.8; in 3.1.6-dev-19,
> exec.c:993 and 1031), both in execpline().  See jobs.c:setprevjob(),
> which is called from setcurjob().

The one in 1031 isn't interesting here, it only makes the sub-shells
created for stopped lists not report their jobs (list_pipe_child is
non-zero only in those sub-shells). Leaves us with the one in 993.
This is used to make sure that jobs started for commands which are
not the first one in a pipeline and jobs started from some kind of 
pipeline nesting (e.g. in a loop in a pipeline) are not shown.

Given that, your suggestion:

> Now, it may be that the right solution is to have setprevjob() ignore
> jobs that have STAT_NOPRINT set, but I wouldn't want that to mask some
> more serious job-state problem.  If you have any insights, share 'em.

seems sensible. But... how can such a job survive when the super-job
of the (main) pipeline is dead? I wished I could find a way to
reproduce it.

Bye
 Sven


--
Sven Wischnowsky                         wischnow@informatik.hu-berlin.de