zsh-workers
 help / color / mirror / code / Atom feed
* Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
@ 2013-12-20 19:24 Axel Beckert
  2013-12-20 20:27 ` Bart Schaefer
  0 siblings, 1 reply; 17+ messages in thread
From: Axel Beckert @ 2013-12-20 19:24 UTC (permalink / raw)
  To: zsh-workers

Hi,

this has been reported[1] against zsh 5.0.3 in Debian today. Looks like
some regression in 5.0.3 on a first glance. Maybe related to the
test-suite freezes which have been reported on a few architectures.

[1] http://bugs.debian.org/732726

Since 5.0.4 seems close, I thought I'd better forward it soon despite
I haven't looked at it closer yet.

----- Forwarded message from Vincent Lefevre <vincent@vinc17.net> -----
Date: Fri, 20 Dec 2013 19:14:36 +0100
From: Vincent Lefevre <vincent@vinc17.net>
Subject: [Pkg-zsh-devel] Bug#732726: zsh function freeze
Reply-To: Vincent Lefevre <vincent@vinc17.net>, 732726@bugs.debian.org

Package: zsh
Version: 5.0.3-1
Severity: important

This is an important regression (no problems after downgrading to
5.0.2-6), always reproducible.

With a private SVN repository, "svncdiff -c65935 ~/wd | head" freezes, where
svncdiff is the following zsh function (I have "autoload -U svncdiff"):

------------------------------------------------------------------------
#!/usr/bin/env zsh

# Wrapper to "svn diff", written by Vincent Lefevre <vincent@vinc17.org>

# Needs my tdiff utility to process the diff output; otherwise you need
# to remove the "| tdiff" at the end.

# Example of svncdiff usage:
#   svncdiff -5 -x -p file
# for 5 lines of unified context and function information.

emulate -LR zsh
local -a args xopt
setopt EXTENDED_GLOB

while [[ $# -ge 1 ]]
do
  if [[ "x$1" == x-[0-9]# ]] then
    args=($args --diff-cmd diff)
    xopt=($xopt -U${1[2,-1]})
  elif [[ $# -ge 2 && "x$1" == x-x ]] then
    shift
    xopt=($xopt $1)
  else
    args=($args $1)
  fi
  shift
done

[[ $#xopt -ge 1 ]] && args=(-x "$xopt" $args)
svnwrapper diff "$args[@]" | tdiff

# $Id: svncdiff 38442 2010-08-05 11:41:16Z vinc17/ypig $
------------------------------------------------------------------------

The dependencies can be found on <https://www.vinc17.net/unix/>.

When svncdiff is called as a script, this is no such problem.

With my tps utility, I can observe:

10987  sshd: vlefevre@pts/2
└─> 10988  -zsh
  └─> 11262  -zsh
    ├─> 11264  zsh -f -- /home/vlefevre/bin/svnwrapper diff -c65935 /home/vlefevre/wd
    │ └─> 11269  svn diff -c65935 /home/vlefevre/wd
    │   ├─> 11273  zsh /home/vlefevre/scripts/ssh mysvn svnserve -t
    │   │ ├─> 11295  cat
    │   │ └─> 11298  ssh -F /home/vlefevre/.ssh/config -C mysvn svnserve -t
    │   └─> 11299  zsh /home/vlefevre/scripts/ssh mysvn svnserve -t
    │     ├─> 11317  cat
    │     └─> 11320  ssh -F /home/vlefevre/.ssh/config -C mysvn svnserve -t
    └─> 11265  perl /home/vlefevre/bin/tdiff

I think that this occurs on big changesets.

$ svncdiff -c65935 ~/wd | wc
   2166   14746  212680

I'll try to investigate, but any idea about which zsh change could
trigger the problem?

-- System Information:
Debian Release: jessie/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 3.11-2-amd64 (SMP w/8 CPU cores)
Locale: LANG=POSIX, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages zsh depends on:
ii  libc6       2.17-97
ii  libcap2     1:2.22-1.2
ii  libtinfo5   5.9+20130608-1
ii  zsh-common  5.0.3-1

Versions of packages zsh recommends:
ii  libncursesw5  5.9+20130608-1
ii  libpcre3      1:8.31-2

Versions of packages zsh suggests:
ii  zsh-doc  5.0.3-1

-- no debconf information
----- End forwarded message -----

----- Forwarded message from Vincent Lefevre <vincent@vinc17.net> -----
Date: Fri, 20 Dec 2013 19:19:18 +0100
From: Vincent Lefevre <vincent@vinc17.net>
To: 732726@bugs.debian.org
Subject: [Pkg-zsh-devel] Bug#732726: zsh function freeze
Reply-To: Vincent Lefevre <vincent@vinc17.net>, 732726@bugs.debian.org

On 2013-12-20 19:14:36 +0100, Vincent Lefevre wrote:
> I think that this occurs on big changesets.

Indeed I can reproduce it with:

  svncdiff -c 8540 svn://scm.gforge.inria.fr/svn/mpfr/trunk | head
----- End forwarded message -----

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.asciiribbon.org/              | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-20 19:24 Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze Axel Beckert
@ 2013-12-20 20:27 ` Bart Schaefer
  2013-12-20 20:49   ` Axel Beckert
  2013-12-20 23:51   ` Vincent Lefevre
  0 siblings, 2 replies; 17+ messages in thread
From: Bart Schaefer @ 2013-12-20 20:27 UTC (permalink / raw)
  To: Axel Beckert, zsh-workers

On Dec 20,  8:24pm, Axel Beckert wrote:
}
} Since 5.0.4 seems close, I thought I'd better forward it soon despite
} I haven't looked at it closer yet.

You're just a tiny bit too late -- PWS announced zsh-5.0.4 about 25
minutes before your email was sent. :-(

} this has been reported[1] against zsh 5.0.3 in Debian today. Looks like
} some regression in 5.0.3 on a first glance. Maybe related to the
} test-suite freezes which have been reported on a few architectures.

I think that's unlikely unless svnwrapper is using zpty.  I'm not a user
of SVN at this point so I don't have a reference.

Best at this point to see if you can still reproduce it with the current
zsh source from git.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-20 20:27 ` Bart Schaefer
@ 2013-12-20 20:49   ` Axel Beckert
  2013-12-20 23:51   ` Vincent Lefevre
  1 sibling, 0 replies; 17+ messages in thread
From: Axel Beckert @ 2013-12-20 20:49 UTC (permalink / raw)
  To: zsh-workers

Hi,

On Fri, Dec 20, 2013 at 12:27:00PM -0800, Bart Schaefer wrote:
> On Dec 20,  8:24pm, Axel Beckert wrote:
> } Since 5.0.4 seems close, I thought I'd better forward it soon despite
> } I haven't looked at it closer yet.
> 
> You're just a tiny bit too late -- PWS announced zsh-5.0.4 about 25
> minutes before your email was sent. :-(

Hrm, I just saw the git tags so far, but neither me nor Frank Terbeck
have received such an announcement via e-mail yet. Well, maybe it's
hanging in Greylisting or such.

Thanks for the information anyway. :-)

		Kind regards, Axel
-- 
/~\  Plain Text Ribbon Campaign                   | Axel Beckert
\ /  Say No to HTML in E-Mail and News            | abe@deuxchevaux.org  (Mail)
 X   See http://www.asciiribbon.org/              | abe@noone.org (Mail+Jabber)
/ \  I love long mails: http://email.is-not-s.ms/ | http://noone.org/abe/ (Web)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-20 20:27 ` Bart Schaefer
  2013-12-20 20:49   ` Axel Beckert
@ 2013-12-20 23:51   ` Vincent Lefevre
  2013-12-20 23:59     ` Vincent Lefevre
  1 sibling, 1 reply; 17+ messages in thread
From: Vincent Lefevre @ 2013-12-20 23:51 UTC (permalink / raw)
  To: zsh-workers; +Cc: 732726

On 2013-12-20 12:27:00 -0800, Bart Schaefer wrote:
> I think that's unlikely unless svnwrapper is using zpty.  I'm not a user
> of SVN at this point so I don't have a reference.

That's the pipe itself that is completely broken.
The problem can be reproduced with "zsh -f" (version 5.0.3) and:

ypig% foo() { ls -R / }
ypig% foo | head

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-20 23:51   ` Vincent Lefevre
@ 2013-12-20 23:59     ` Vincent Lefevre
  2013-12-21  0:12       ` Vincent Lefevre
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Lefevre @ 2013-12-20 23:59 UTC (permalink / raw)
  To: zsh-workers

On 2013-12-21 00:51:49 +0100, Vincent Lefevre wrote:
> The problem can be reproduced with "zsh -f" (version 5.0.3) and:
> 
> ypig% foo() { ls -R / }
> ypig% foo | head

And also with zsh 5.0.4, which has just been released.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-20 23:59     ` Vincent Lefevre
@ 2013-12-21  0:12       ` Vincent Lefevre
  2013-12-21  2:19         ` Bart Schaefer
  0 siblings, 1 reply; 17+ messages in thread
From: Vincent Lefevre @ 2013-12-21  0:12 UTC (permalink / raw)
  To: zsh-workers

A better example:

foo() { printf "%d\n" {1..20000} }

"foo > /dev/null" terminates immediately.
"foo" terminates after a few seconds (output takes a bit time via ssh).
"foo | head" freezes.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / AriC project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-21  0:12       ` Vincent Lefevre
@ 2013-12-21  2:19         ` Bart Schaefer
  2013-12-21  3:42           ` Bart Schaefer
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Schaefer @ 2013-12-21  2:19 UTC (permalink / raw)
  To: Vincent Lefevre, zsh-workers

On Dec 21,  1:12am, Vincent Lefevre wrote:
} Subject: Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function
}
} A better example:
} 
} foo() { printf "%d\n" {1..20000} }

Actually the "ls -R /" example was better because it ruled out problems
with SIGPIPE handling and the "print" builtin.

This has to be result of either

+       * 31549: Src/exec,c, Src/zsh.h: replace ad-hoc subsh_close file
+       descriptor for pipes with new addfilelist() job-based mechanism.

(which is the tail of a whole thread starting with 31528) or else it's

+       * 31919: Src/exec.c, Src/init.c: fix deadlock when a shell builtin
+       with a multio redirection is used on the left side of a pipeline,
+       by making sure stdin/out/err file descriptors are closed for the
+       multio copy process, which means not re-using those descriptors
+       after they are closed and marked FDT_UNUSED in fdtable[].  For
+       completeness, initialize their fdtable[] state to FDT_EXTERNAL.

because the problem is that the parent shell still holds open the file
descriptor that is the standard output of the shell function, so the
write is blocked rather than getting SIGPIPE or other failure.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-21  2:19         ` Bart Schaefer
@ 2013-12-21  3:42           ` Bart Schaefer
  2013-12-21  7:16             ` Bart Schaefer
  2013-12-21 18:08             ` Peter Stephenson
  0 siblings, 2 replies; 17+ messages in thread
From: Bart Schaefer @ 2013-12-21  3:42 UTC (permalink / raw)
  To: Vincent Lefevre, zsh-workers

On Dec 20,  6:19pm, Bart Schaefer wrote:
}
} +       * 31549: Src/exec,c, Src/zsh.h: replace ad-hoc subsh_close file
} +       descriptor for pipes with new addfilelist() job-based mechanism.

Yep, that's the one.

I compiled commit 709dbbbda82efde2020d9d67a19687c101b91570 and was
able to reproduce the problem.  The immediately preceding commit
39ab9952e8255cb99e9c0abcc8bbec43158a55d7 does not show the bug.

Non-interactive shells also do not show the bug, which may be why
"make check" didn't flag anything.

The problem seems to be with this hunk specifically:

@@ -1744,8 +1742,6 @@ execpline2(Estate state, wordcode pcode,
 	execpline2(state, *state->pc++, how, pipes[0], output, last1);
 	list_pipe = old_list_pipe;
 	cmdpop();
-	zclose(pipes[0]);
-	subsh_close = -1;
     }
 }
 

The problem is that pipes[0] isn't always added to the list of files
for the job; sometimes it really does need to be closed there.


diff --git a/Src/exec.c b/Src/exec.c
index dccdc2b..4480033 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1691,6 +1691,7 @@ execpline2(Estate state, wordcode pcode,
 	execcmd(state, input, output, how, last1 ? 1 : 2);
     else {
 	int old_list_pipe = list_pipe;
+	int subsh_close = -1;
 	Wordcode next = state->pc + (*state->pc), pc;
 	wordcode code;
 
@@ -1738,6 +1739,7 @@ execpline2(Estate state, wordcode pcode,
 	} else {
 	    /* otherwise just do the pipeline normally. */
 	    addfilelist(NULL, pipes[0]);
+	    subsh_close = pipes[0];
 	    execcmd(state, input, pipes[1], how, 0);
 	}
 	zclose(pipes[1]);
@@ -1750,6 +1752,8 @@ execpline2(Estate state, wordcode pcode,
 	execpline2(state, *state->pc++, how, pipes[0], output, last1);
 	list_pipe = old_list_pipe;
 	cmdpop();
+	if (subsh_close != pipes[0])
+	    zclose(pipes[0]);
     }
 }
 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-21  3:42           ` Bart Schaefer
@ 2013-12-21  7:16             ` Bart Schaefer
  2013-12-21 18:08             ` Peter Stephenson
  1 sibling, 0 replies; 17+ messages in thread
From: Bart Schaefer @ 2013-12-21  7:16 UTC (permalink / raw)
  To: zsh-workers

On Dec 20,  7:42pm, Bart Schaefer wrote:
}
} The problem is that pipes[0] isn't always added to the list of files
} for the job; sometimes it really does need to be closed there.

Here's a regression test.  I assume it's OK to do this coproc trick,
since if it were not tests A01 and A04 would already have failed.

I verified that this test *fails* with 5.0.3 and 5.0.4 (sigh).


diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst
index c8320a1..61d24fe 100644
--- a/Test/A05execution.ztst
+++ b/Test/A05execution.ztst
@@ -202,3 +202,15 @@ F:the bug is still there or it reappeared. See workers-29973 for details.
 0:Check $pipestatus with a known difficult case
 >1 0 1 0 0
 F:This similar test was triggering a reproducible failure with pipestatus.
+
+  { unsetopt MONITOR } 2>/dev/null
+  coproc { read -Et 5 || kill -INT $$ }
+  print -u $ZTST_fd 'This test takes 5 seconds to fail...'
+  { printf "%d\n" {1..20000} } | ( read -E )
+  print -p done
+  read -Ep
+0:Bug regression: piping a shell construct to an external process may hang
+>1
+>done
+F:This test checks for a file descriptor leak that could cause the left
+F:side of a pipe to block on write after the right side has exited


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-21  3:42           ` Bart Schaefer
  2013-12-21  7:16             ` Bart Schaefer
@ 2013-12-21 18:08             ` Peter Stephenson
  2013-12-21 20:57               ` Bart Schaefer
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Stephenson @ 2013-12-21 18:08 UTC (permalink / raw)
  To: zsh-workers

On Fri, 20 Dec 2013 19:42:23 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> The problem is that pipes[0] isn't always added to the list of files
> for the job; sometimes it really does need to be closed there.

Thanks for the quick work.

Are we happy this fixes the particular problem (the regression test looks
plausible, don't see why we shouldn't use coproc)?  If so I'll make a
new version immediately.  This evening is my last opportunity for a
while so if this is a definite improvement I'd like to get on with it.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-21 18:08             ` Peter Stephenson
@ 2013-12-21 20:57               ` Bart Schaefer
  2013-12-21 22:34                 ` Peter Stephenson
  0 siblings, 1 reply; 17+ messages in thread
From: Bart Schaefer @ 2013-12-21 20:57 UTC (permalink / raw)
  To: zsh-workers

On Dec 21,  6:08pm, Peter Stephenson wrote:
}
} Thanks for the quick work.

The timing was convenient and I was afraid it might have been my fault.

} Are we happy this fixes the particular problem (the regression test looks
} plausible, don't see why we shouldn't use coproc)?  If so I'll make a
} new version immediately.

As far as I can tell this fixes the problem.  It'd be nice to hear from
Vincent for confirmation but as that may not happen in time I'd say you
should go ahead.

-- 
Barton E. Schaefer


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-21 20:57               ` Bart Schaefer
@ 2013-12-21 22:34                 ` Peter Stephenson
  2013-12-22  1:34                   ` Bart Schaefer
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Stephenson @ 2013-12-21 22:34 UTC (permalink / raw)
  To: zsh-workers

On Sat, 21 Dec 2013 12:57:18 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> As far as I can tell this fixes the problem.  It'd be nice to hear from
> Vincent for confirmation but as that may not happen in time I'd say you
> should go ahead.

Unfortunately it (the patch in 32171) doesn't fix the problem for me.
With

foo() { ls -R / }
foo | head

I'm still getting a hang.

I think we're forking inside execcmd() after adding pipes[0] to the
filelist for thisjob.  This subshell is what's going to form the LHS of
the pipeline --- and we entersubsh() which will clear the job table.
So I think we need to salvage the filelist from the job table and remove
the pipe file descriptors in the danger cases, which I take to be the
places where we were handling subsh_close in the old version of the code
(where we are handling nested shell constructs of some sort).

The following does seem to fix the hang here and not cause any new test
failures.  Note it includes your code change, but not your regression
test (you hadn't pushed either yet last I looked).

The change to zwaitjob() is just pulling out the common code shared with
the two new cases.

I'm not going to commit this and make a new release just on the basis of
my guesses, however.

diff --git a/Src/exec.c b/Src/exec.c
index dccdc2b..f16cfd3 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -1691,6 +1691,7 @@ execpline2(Estate state, wordcode pcode,
 	execcmd(state, input, output, how, last1 ? 1 : 2);
     else {
 	int old_list_pipe = list_pipe;
+	int subsh_close = -1;
 	Wordcode next = state->pc + (*state->pc), pc;
 	wordcode code;
 
@@ -1738,6 +1739,7 @@ execpline2(Estate state, wordcode pcode,
 	} else {
 	    /* otherwise just do the pipeline normally. */
 	    addfilelist(NULL, pipes[0]);
+	    subsh_close = pipes[0];
 	    execcmd(state, input, pipes[1], how, 0);
 	}
 	zclose(pipes[1]);
@@ -1750,6 +1752,8 @@ execpline2(Estate state, wordcode pcode,
 	execpline2(state, *state->pc++, how, pipes[0], output, last1);
 	list_pipe = old_list_pipe;
 	cmdpop();
+	if (subsh_close != pipes[0])
+	    zclose(pipes[0]);
     }
 }
 
@@ -2385,7 +2389,7 @@ static void
 execcmd(Estate state, int input, int output, int how, int last1)
 {
     HashNode hn = NULL;
-    LinkList args;
+    LinkList args, filelist = NULL;
     LinkNode node;
     Redir fn;
     struct multio *mfds[10];
@@ -2907,6 +2911,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 	    flags |= ESUB_KEEPTRAP;
 	if (type == WC_SUBSH && !(how & Z_ASYNC))
 	    flags |= ESUB_JOB_CONTROL;
+	filelist = jobtab[thisjob].filelist;
 	entersubsh(flags);
 	close(synch[1]);
 	forked = 1;
@@ -3260,6 +3265,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 
 	    if (is_shfunc) {
 		/* It's a shell function */
+		pipecleanfilelist(filelist);
 		execshfunc((Shfunc) hn, args);
 	    } else {
 		/* It's a builtin */
@@ -3338,6 +3344,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 		DPUTS(varspc,
 		      "BUG: assignment before complex command");
 		list_pipe = 0;
+		pipecleanfilelist(filelist);
 		/* If we're forked (and we should be), no need to return */
 		DPUTS(last1 != 1 && !forked, "BUG: not exiting?");
 		DPUTS(type != WC_SUBSH, "Not sure what we're doing.");
diff --git a/Src/jobs.c b/Src/jobs.c
index 371b8eb..a321172 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -1173,6 +1173,30 @@ addfilelist(const char *name, int fd)
     zaddlinknode(ll, jf);
 }
 
+/* Clean up pipes no longer needed associated with a job */
+
+/**/
+void
+pipecleanfilelist(LinkList filelist)
+{
+    LinkNode node;
+
+    if (!filelist)
+	return;
+    node = firstnode(filelist);
+    while (node) {
+	Jobfile jf = (Jobfile)getdata(node);
+	if (jf->is_fd) {
+	    LinkNode next = nextnode(node);
+	    zclose(jf->u.fd);
+	    (void)remnode(filelist, node);
+	    zfree(jf, sizeof(*jf));
+	    node = next;
+	} else
+	    incnode(node);
+    }
+}
+
 /* Finished with list of files for a job */
 
 /**/
@@ -1415,19 +1439,7 @@ zwaitjob(int job, int wait_cmd)
 	     * we can't deadlock on the fact that those still exist, so
 	     * that's not a problem.
 	     */
-	    LinkNode node = firstnode(jn->filelist);
-	    while (node) {
-		Jobfile jf = (Jobfile)getdata(node);
-		if (jf->is_fd) {
-		    LinkNode next = nextnode(node);
-		    (void)remnode(jn->filelist, node);
-		    zclose(jf->u.fd);
-		    zfree(jf, sizeof(*jf));
-		    node = next;
-		} else {
-		    incnode(node);
-		}
-	    }
+	    pipecleanfilelist(jn->filelist);
 	}
 	while (!errflag && jn->stat &&
 	       !(jn->stat & STAT_DONE) &&

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-21 22:34                 ` Peter Stephenson
@ 2013-12-22  1:34                   ` Bart Schaefer
  2013-12-23  2:06                     ` (potential regression in 5.0.3) Paul Ackersviller
  2014-01-02 18:06                     ` Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze Peter Stephenson
  0 siblings, 2 replies; 17+ messages in thread
From: Bart Schaefer @ 2013-12-22  1:34 UTC (permalink / raw)
  To: zsh-workers

On Dec 21, 10:34pm, Peter Stephenson wrote:
}
} Unfortunately it (the patch in 32171) doesn't fix the problem for me.

Drat.

} I think we're forking inside execcmd() after adding pipes[0] to the
} filelist for thisjob.  This subshell is what's going to form the LHS of
} the pipeline --- and we entersubsh() which will clear the job table.
} So I think we need to salvage the filelist from the job table and remove
} the pipe file descriptors in the danger cases, which I take to be the
} places where we were handling subsh_close in the old version of the code
} (where we are handling nested shell constructs of some sort).

I wondered if that would be necessary, but couldn't ever manage to get
DPUTS() statements in those two places to print anything, so concluded
that the issue was in the place that I did patch.

What concerns me is whether we might be closing too many file descriptors
if we remove all is_fd entries from filelist at that point, but if that's
in the parent that's going to do nothing but wait for the child it should
be OK.
 
} The following does seem to fix the hang here and not cause any new test
} failures.  Note it includes your code change, but not your regression
} test (you hadn't pushed either yet last I looked).

Argh!  Did "git commit" but forgot "git push".  Did that now.

Here's your new bit of 32175 plus an additional regression test (fails
with just 32171 but succeeds after 32175).


diff --git a/Src/exec.c b/Src/exec.c
index 4480033..f16cfd3 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -2389,7 +2389,7 @@ static void
 execcmd(Estate state, int input, int output, int how, int last1)
 {
     HashNode hn = NULL;
-    LinkList args;
+    LinkList args, filelist = NULL;
     LinkNode node;
     Redir fn;
     struct multio *mfds[10];
@@ -2911,6 +2911,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 	    flags |= ESUB_KEEPTRAP;
 	if (type == WC_SUBSH && !(how & Z_ASYNC))
 	    flags |= ESUB_JOB_CONTROL;
+	filelist = jobtab[thisjob].filelist;
 	entersubsh(flags);
 	close(synch[1]);
 	forked = 1;
@@ -3264,6 +3265,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 
 	    if (is_shfunc) {
 		/* It's a shell function */
+		pipecleanfilelist(filelist);
 		execshfunc((Shfunc) hn, args);
 	    } else {
 		/* It's a builtin */
@@ -3342,6 +3344,7 @@ execcmd(Estate state, int input, int output, int how, int last1)
 		DPUTS(varspc,
 		      "BUG: assignment before complex command");
 		list_pipe = 0;
+		pipecleanfilelist(filelist);
 		/* If we're forked (and we should be), no need to return */
 		DPUTS(last1 != 1 && !forked, "BUG: not exiting?");
 		DPUTS(type != WC_SUBSH, "Not sure what we're doing.");
diff --git a/Src/jobs.c b/Src/jobs.c
index 371b8eb..a321172 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -1173,6 +1173,30 @@ addfilelist(const char *name, int fd)
     zaddlinknode(ll, jf);
 }
 
+/* Clean up pipes no longer needed associated with a job */
+
+/**/
+void
+pipecleanfilelist(LinkList filelist)
+{
+    LinkNode node;
+
+    if (!filelist)
+	return;
+    node = firstnode(filelist);
+    while (node) {
+	Jobfile jf = (Jobfile)getdata(node);
+	if (jf->is_fd) {
+	    LinkNode next = nextnode(node);
+	    zclose(jf->u.fd);
+	    (void)remnode(filelist, node);
+	    zfree(jf, sizeof(*jf));
+	    node = next;
+	} else
+	    incnode(node);
+    }
+}
+
 /* Finished with list of files for a job */
 
 /**/
@@ -1415,19 +1439,7 @@ zwaitjob(int job, int wait_cmd)
 	     * we can't deadlock on the fact that those still exist, so
 	     * that's not a problem.
 	     */
-	    LinkNode node = firstnode(jn->filelist);
-	    while (node) {
-		Jobfile jf = (Jobfile)getdata(node);
-		if (jf->is_fd) {
-		    LinkNode next = nextnode(node);
-		    (void)remnode(jn->filelist, node);
-		    zclose(jf->u.fd);
-		    zfree(jf, sizeof(*jf));
-		    node = next;
-		} else {
-		    incnode(node);
-		}
-	    }
+	    pipecleanfilelist(jn->filelist);
 	}
 	while (!errflag && jn->stat &&
 	       !(jn->stat & STAT_DONE) &&
diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst
index 61d24fe..6abfd8b 100644
--- a/Test/A05execution.ztst
+++ b/Test/A05execution.ztst
@@ -207,10 +207,12 @@ F:This similar test was triggering a reproducible failure with pipestatus.
   coproc { read -Et 5 || kill -INT $$ }
   print -u $ZTST_fd 'This test takes 5 seconds to fail...'
   { printf "%d\n" {1..20000} } | ( read -E )
+  hang(){ printf "%d\n" {2..20000} | cat }; hang | ( read -E )
   print -p done
   read -Ep
 0:Bug regression: piping a shell construct to an external process may hang
 >1
+>2
 >done
 F:This test checks for a file descriptor leak that could cause the left
 F:side of a pipe to block on write after the right side has exited


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: (potential regression in 5.0.3)
  2013-12-22  1:34                   ` Bart Schaefer
@ 2013-12-23  2:06                     ` Paul Ackersviller
  2013-12-23  5:47                       ` Bart Schaefer
  2014-01-02 18:06                     ` Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze Peter Stephenson
  1 sibling, 1 reply; 17+ messages in thread
From: Paul Ackersviller @ 2013-12-23  2:06 UTC (permalink / raw)
  To: zsh-workers

I've ran across something else that first looked like a regression,
but upon further inspection Test/A05execution.ztst was doing this
for a couple of months.  Strangely, it goes into an infinite loop,
but only when run in the background.  It does this for me on both
Linux and FreeBSD, starting from the commit
c3114a7735c85b79771e08bd156470bde1a36950, but on 5.0.2 too.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: (potential regression in 5.0.3)
  2013-12-23  2:06                     ` (potential regression in 5.0.3) Paul Ackersviller
@ 2013-12-23  5:47                       ` Bart Schaefer
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Schaefer @ 2013-12-23  5:47 UTC (permalink / raw)
  To: zsh-workers

On Dec 23,  2:06am, Paul Ackersviller wrote:
}
} I've ran across something else that first looked like a regression,
} but upon further inspection Test/A05execution.ztst was doing this
} for a couple of months.  Strangely, it goes into an infinite loop,
} but only when run in the background.  It does this for me on both
} Linux and FreeBSD, starting from the commit
} c3114a7735c85b79771e08bd156470bde1a36950, but on 5.0.2 too.

It's a known issue that changing the MONITOR option from off to on
can cause the acquire_pgrp() routine to go into an infinite loop,
depending on the shape of the process tree from the foreground.
The jobs.c diff in c3114a7 partly fixed that, but the new A05execution
test needs setopt MONITOR to create the conditions it is regressing,
so it is still possible for that to trigger a loop.

Here's an attempt to resolve the remaining infinite-loop condition:

diff --git a/Src/jobs.c b/Src/jobs.c
index a321172..8719465 100644
--- a/Src/jobs.c
+++ b/Src/jobs.c
@@ -2619,6 +2619,7 @@ acquire_pgrp(void)
     sigset_t blockset, oldset;
 
     if ((mypgrp = GETPGRP()) > 0) {
+	long lastpgrp = mypgrp;
 	sigemptyset(&blockset);
 	sigaddset(&blockset, SIGTTIN);
 	sigaddset(&blockset, SIGTTOU);
@@ -2639,6 +2640,9 @@ acquire_pgrp(void)
 	    if (read(0, NULL, 0) != 0) {} /* Might generate SIGT* */
 	    signal_block(blockset);
 	    mypgrp = GETPGRP();
+	    if (mypgrp == lastpgrp && !interact)
+		break; /* Unlikely that pgrp will ever change */
+	    lastpgrp = mypgrp;
 	}
 	if (mypgrp != mypid) {
 	    if (setpgrp(0, 0) == 0) {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2013-12-22  1:34                   ` Bart Schaefer
  2013-12-23  2:06                     ` (potential regression in 5.0.3) Paul Ackersviller
@ 2014-01-02 18:06                     ` Peter Stephenson
  2014-01-02 20:40                       ` Bart Schaefer
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Stephenson @ 2014-01-02 18:06 UTC (permalink / raw)
  To: zsh-workers

I'm now back and looking through the mail, most of which I've already
seen as I managed to get it onto my phone while the network was
networking (it was not just that it was 2G, I think the backhaul was a
woodpecker with a dodgy beak and a morse key it kept missing).

It looks like it was either a good or a bad time for me to be away,
depending whether you're me or Bart.  As far as I can see most of the
stuff has been sorted out, by Bart with assistance.  If there's nothing
to indicate otherwise over the next couple of days I'll make a 5.0.5
with the fixes in at the weekend.

First...

On Sat, 21 Dec 2013 17:34:59 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> } I think we're forking inside execcmd() after adding pipes[0] to the
> } filelist for thisjob.  This subshell is what's going to form the LHS of
> } the pipeline --- and we entersubsh() which will clear the job table.
> } So I think we need to salvage the filelist from the job table and remove
> } the pipe file descriptors in the danger cases, which I take to be the
> } places where we were handling subsh_close in the old version of the code
> } (where we are handling nested shell constructs of some sort).
> 
> I wondered if that would be necessary, but couldn't ever manage to get
> DPUTS() statements in those two places to print anything, so concluded
> that the issue was in the place that I did patch.
> 
> What concerns me is whether we might be closing too many file descriptors
> if we remove all is_fd entries from filelist at that point, but if that's
> in the parent that's going to do nothing but wait for the child it should
> be OK.

Yes, I don't think there should be too much problem with what we *are*
doing, and the regression test seems fine.

I'm a little bit worried about what we're *not* doing.  The closure is
tied to thisjob.  That's probably better than the previous version where
it was simply signalled by a static, neither associated with the current
job nor with the call hierarchy, but I wonder if we should be closing
even more in the subshell (only); should we, in fact, be closing all
f.d.s associated with pipes now that we're no longer in a shell that has
any interest in them?  Or doesn't this actually cause a problem in
practice?  (We clearly shouldn't be removing files, the other purpose of
the filelist, since that's needed at exactly one point after the final
use --- it might be cleaner having a separate list to emphasise the
different behaviour of files and f.d.s but it's probably not worth the
extra pointer now.)

Anyway, that's for future study (FFS, as a colleague slightly
ambiguously puts it).

pws


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze
  2014-01-02 18:06                     ` Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze Peter Stephenson
@ 2014-01-02 20:40                       ` Bart Schaefer
  0 siblings, 0 replies; 17+ messages in thread
From: Bart Schaefer @ 2014-01-02 20:40 UTC (permalink / raw)
  To: zsh-workers

On Jan 2,  6:06pm, Peter Stephenson wrote:
} Subject: Re: Fwd (potential regression in 5.0.3): Bug#732726: zsh function
}
} It looks like it was either a good or a bad time for me to be away,
} depending whether you're me or Bart.

I was at home with a few days off, so it really wasn't a problem.

} If there's nothing to indicate otherwise over the next couple of days
} I'll make a 5.0.5 with the fixes in at the weekend.

There was one thing I left for your attention: Carl's patch in 32196 for
copy-prev-shell-word.


} > } So I think we need to salvage the filelist from the job table and remove
} > } the pipe file descriptors in the danger cases, which I take to be the
} > } places where we were handling subsh_close in the old version of the code
} > } (where we are handling nested shell constructs of some sort).
[...]
} > What concerns me is whether we might be closing too many file descriptors
} > if we remove all is_fd entries from filelist at that point
} 
} Yes, I don't think there should be too much problem with what we *are*
} doing, and the regression test seems fine.

Is it only pipes in that filelist?  I was concerned that there might be
descriptors for real files there, that shouldn't be closed.  But again I
think it's OK because in that context all the shell is going to do is
wait.
 
} I wonder if we should be closing even more in the subshell (only);
} should we, in fact, be closing all f.d.s associated with pipes now
} that we're no longer in a shell that has any interest in them?

Probably yes, which might in fact fix some of the nits with coprocs
(though zsh's coproc doesn't seem to have any nits of that particular
flavor that e.g. ksh's does not also have).  To do that I think we'd
need to start keeping track of what kind of fd is in the filelist,
and whether it's read or write, not just that it's a descriptor
associated with a particular job.

} Or doesn't this actually cause a problem in practice?

As you pointed out, we're doing better now than we were, and even the
old way it hasn't been especially noticable for many years so far.


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2014-01-02 20:40 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-20 19:24 Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze Axel Beckert
2013-12-20 20:27 ` Bart Schaefer
2013-12-20 20:49   ` Axel Beckert
2013-12-20 23:51   ` Vincent Lefevre
2013-12-20 23:59     ` Vincent Lefevre
2013-12-21  0:12       ` Vincent Lefevre
2013-12-21  2:19         ` Bart Schaefer
2013-12-21  3:42           ` Bart Schaefer
2013-12-21  7:16             ` Bart Schaefer
2013-12-21 18:08             ` Peter Stephenson
2013-12-21 20:57               ` Bart Schaefer
2013-12-21 22:34                 ` Peter Stephenson
2013-12-22  1:34                   ` Bart Schaefer
2013-12-23  2:06                     ` (potential regression in 5.0.3) Paul Ackersviller
2013-12-23  5:47                       ` Bart Schaefer
2014-01-02 18:06                     ` Fwd (potential regression in 5.0.3): Bug#732726: zsh function freeze Peter Stephenson
2014-01-02 20:40                       ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).