zsh-workers
 help / color / mirror / code / Atom feed
* Zsh killed when autoloaded function calls mislinked program
@ 2004-12-20 20:20 Travis Spencer
  2004-12-21 11:30 ` Peter Stephenson
  0 siblings, 1 reply; 3+ messages in thread
From: Travis Spencer @ 2004-12-20 20:20 UTC (permalink / raw)
  To: zsh-workers

I've found that invoking an autoloaded function that calls a program
that isn't linked correctly kills zsh.

Here is the function that does this:


#!/usr/bin/zsh

local grep

if [[ -x /stash/travissu/bin/grep ]] &&
    /stash/travissu/bin/grep test /dev/null 2>/dev/null ; then

    grep=(/stash/travissu/bin/grep)

else

    grep=(grep)
fi

command $grep $*


Here is what happens when i run /stash/travissu/bin/grep from the
shell:


zsh> /stash/travissu/bin/grep
ld.so.1: /stash/travissu/bin/grep: fatal: libgcc_s.so.1: open failed:
No such file or directory
zsh: killed     /stash/travissu/bin/grep
zsh>


Now, if I chmod the script so it has execute permission, I get this:


zsh> echo $path | ./fn test
zsh> 


However, if I autoload it like this:


zsh> autoload -U fn
zsh> fn
Usage: grep [OPTION]... PATTERN [FILE]...
Try `grep --help' for more information.
zsh>


When I execute the autoloaded function, I get this:


zsh> echo $path | fn
Killed
tcsh>


Where tcsh is the shell I started zsh from.  

I am using zsh 4.2.1 on Solaris 9.

-- 

Regards,

Travis Spencer


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Zsh killed when autoloaded function calls mislinked program
  2004-12-20 20:20 Zsh killed when autoloaded function calls mislinked program Travis Spencer
@ 2004-12-21 11:30 ` Peter Stephenson
  2005-01-03 15:57   ` Bart Schaefer
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Stephenson @ 2004-12-21 11:30 UTC (permalink / raw)
  To: zsh-workers

Travis Spencer wrote:
> I've found that invoking an autoloaded function that calls a program
> that isn't linked correctly kills zsh.

I get this, too, actually from Solaris 2.6 since I have lots of
conveniently unloadable Solaris 8 binaries lying around.  I've
simplified it to this:

% fn() { if ~/solaris8/bin/touch /dev/null 2>/dev/null; then true; fi }
% echo | fn
zsh: killed     TEST_MODULES=1 ./zsh

The "if" and the function are both crucial.

You can get the same effect on Linux (and therefore presumably more
generally) with the following code:

% fn() { if sh -c 'kill -9 $$'; then true; fi }
% echo | fn
zsh: killed     zsh

so this is quite bad.

Good news... I think I've found out what's doing it.

Bad news... it's in Sven's hacks for being clever with jobs when stuff
is running in the last part of a pipeline and I've only a vague idea
what's going on.

The culprit appears to be this chunk in execpline, around line 1236 of
exec.c:

	    if (list_pipe && (lastval & 0200) && pj >= 0 &&
		(!(jn->stat & STAT_INUSE) || (jn->stat & STAT_DONE))) {
		deletejob(jn);
		jn = jobtab + pj;
		killjb(jn, lastval & ~0200);
	    }

pj is the old value of "thisjob" at the start of execpline(). jn refers
to the job created with the new process.  list_pipe is the extra special
Sven flag indicating we are doing something extra special with the
current process.

In that call to killjb, we send the signal which killed the failed
process (touch in my case, grep in Travis's) to the process group
including that process (the PID of the group leader).  This is
presumably some hack to pass the signal to a group when the shell
assumes it should get it.  I don't know why it assumes that here.

In this case the group leader is PID 0.  This is presumably the current
process group (the killpg documentation for Solaris isn't explicit but
this is normal) including the shell.  The signal is 9 (SIGKILL).  From
this point on it's all easy to understand.

This seems to fix the immediate problem, but I don't even know if it's
in the right target area.  Do we ever want to kill a process group where
the group leader is marked as 0?  Or is this working because it's not
killing things that should be killed?  Or is that entire chunk I quoted
misguided?  What has the old "thisjob", to which jn is being set, got to
do with the preceeding jn at this point anyway, such that it needs
killing?

Help.

Index: Src/exec.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/exec.c,v
retrieving revision 1.79
diff -u -r1.79 exec.c
--- Src/exec.c	7 Dec 2004 16:55:03 -0000	1.79
+++ Src/exec.c	21 Dec 2004 11:03:29 -0000
@@ -1233,7 +1233,8 @@
 		(!(jn->stat & STAT_INUSE) || (jn->stat & STAT_DONE))) {
 		deletejob(jn);
 		jn = jobtab + pj;
-		killjb(jn, lastval & ~0200);
+		if (jn->gleader)
+		    killjb(jn, lastval & ~0200);
 	    }
 	    if (list_pipe_child ||
 		((jn->stat & STAT_DONE) &&

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Zsh killed when autoloaded function calls mislinked program
  2004-12-21 11:30 ` Peter Stephenson
@ 2005-01-03 15:57   ` Bart Schaefer
  0 siblings, 0 replies; 3+ messages in thread
From: Bart Schaefer @ 2005-01-03 15:57 UTC (permalink / raw)
  To: zsh-workers

[I'm back from holiday travel.]

On Tue, 21 Dec 2004, Peter Stephenson wrote:

> -		killjb(jn, lastval & ~0200);
> +		if (jn->gleader)
> +		    killjb(jn, lastval & ~0200);


This looks right to me, but I haven't groveled through that code for a 
while.  The only case where it might be wrong is in a subshell if the 
subshell is to die with the same signal as the job, but I *think* that's
handled elsewhere by the value passed to exit() ... and AFAICT from some
attempts at testing, it is.


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2005-01-03 15:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-12-20 20:20 Zsh killed when autoloaded function calls mislinked program Travis Spencer
2004-12-21 11:30 ` Peter Stephenson
2005-01-03 15:57   ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).