[BUG] Solaris-specific program flow corruption after subshell error exit

zsh-workers
 help / color / mirror / code / Atom feed

* [BUG] Solaris-specific program flow corruption after subshell error exit
@ 2017-02-26  5:36 Martijn Dekker
  2017-02-26 20:29 ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Martijn Dekker @ 2017-02-26  5:36 UTC (permalink / raw)
  To: Zsh hackers list

Modernish (cross-platform POSIX shell library with ambitions to become
modernizr/jQuery for the shell) is finally getting near a first testing
release, so I'm doing testing on all shell/OS combinations I can get my
hands on.

In the course of that testing I've come across a zsh bug that *only*
manifests on Solaris, at least version 11.3. (A free VirtualBox VM for
evaluation purposes is available from Oracle.)

If a subshell exits due to an error in a special builtin or redirection,
execution flow is corrupted in such a manner that, when end of file is
reached without an explicit 'return' or 'exit' being encountered,
execution of the file does not end but restarts at the point exactly
after the subshell was exited. The second time around, if program
specifics allow it, execution ends normally.

The bug only manifests if POSIXBUILTINS is active, and only on Solaris.
I confirmed the bug on zsh 5.0.7 (as shipped by default), zsh 5.2
(package available from Oracle), *and* today's current git version
(compiled myself, obviously). So it appears to be long-standing.

Test script:

# Bug only occurs with POSIXBUILTINS active.
setopt POSIXBUILTINS
# Execution counter.
count=0
# Exiting from a subshell due to an error triggers the bug.
(set -o nonexistent_@_option) 2>/dev/null
# With the bug, this will be executed twice so 'let' returns true.
let "(count += 1) > 1" && echo "BUG DETECTED"
# EOF. To trigger the bug, don't explicitly exit or return.

Save and run with "zsh test.zsh". On Solaris, it outputs "BUG DETECTED".
On any other OS, it outputs nothing.

Interestingly, a sourced dot script will trigger the bug just as cleanly
as a standalone script, so it is possible to test for the bug from
another program without affecting that program.

Actually, things get *really* interesting if you add "return" to the end
of the test script and source it from another script as a dot script. In
that case, the bug appears to "move up" in the calling hierarchy; that
is, if the file sourcing this test script (with the extra "return") ends
execution due to end of file (i.e. no "return" or "exit"), its execution
resumes to just after the command that sourced this file.

(This is how I initially encountered the bug: when I tried 'modernish
--test', zsh 5.0.7 on Solaris would mysteriously try to run the test
suite twice. Which was "interesting" to track down, to say the least.)

Good luck with this one. Let me know if you need me to do anything
specific to help track it down.

- M.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Solaris-specific program flow corruption after subshell error exit
  2017-02-26  5:36 [BUG] Solaris-specific program flow corruption after subshell error exit Martijn Dekker
@ 2017-02-26 20:29 ` Bart Schaefer
  2017-02-26 22:55   ` Martijn Dekker
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2017-02-26 20:29 UTC (permalink / raw)
  To: Martijn Dekker, Zsh hackers list

On Feb 26,  6:36am, Martijn Dekker wrote:
}
} In the course of that testing I've come across a zsh bug that *only*
} manifests on Solaris, at least version 11.3. (A free VirtualBox VM for
} evaluation purposes is available from Oracle.)
} 
} If a subshell exits due to an error in a special builtin or redirection,
} execution flow is corrupted in such a manner that, when end of file is
} reached without an explicit 'return' or 'exit' being encountered,
} execution of the file does not end but restarts at the point exactly
} after the subshell was exited. The second time around, if program
} specifics allow it, execution ends normally.

Given that this is specific to Solaris and has to do with execution of
script files, I'm going to guess that it's related to file descriptor
management and buffering within STREAMS modules.

In particular I'm guessing that the POSIX-compatible behavior referenced
at the "fatal:" label in exec.c near line 4000 is leaving some kind of
shared stdin state between the parent and the subshell, because "set"
is a special builtin so will invoke the exit at that point.  Without
actually running the code, I'd expect we're going through the _exit()
branch rather than the exit() branch which under normal circumstances is
deliberately to avoid having such shared state mess up the parent file
descriptor positions, but maybe that's not sufficient.

Whether this is revealing a bug in zsh or a bug in Solaris 11.3 I can't
say.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Solaris-specific program flow corruption after subshell error exit
  2017-02-26 20:29 ` Bart Schaefer
@ 2017-02-26 22:55   ` Martijn Dekker
  2017-02-27  7:39     ` Bart Schaefer
  0 siblings, 1 reply; 6+ messages in thread
From: Martijn Dekker @ 2017-02-26 22:55 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

Op 26-02-17 om 21:29 schreef Bart Schaefer:
> In particular I'm guessing that the POSIX-compatible behavior referenced
> at the "fatal:" label in exec.c near line 4000 is leaving some kind of
> shared stdin state between the parent and the subshell, because "set"
> is a special builtin so will invoke the exit at that point.  Without
> actually running the code, I'd expect we're going through the _exit()
> branch rather than the exit() branch which under normal circumstances is
> deliberately to avoid having such shared state mess up the parent file
> descriptor positions, but maybe that's not sufficient.

I can't comment on that, but I did a little uninformed experimenting.

>From line 4002:

    if (forked)
        _exit(1);
    else
        exit(1);

if I change the second "exit(1)" to "_exit(1)", the bug disappears and
the shell *appears* to act normally, at least with all the POSIX options
active. It passes the modernish test suite and runs all the example
programs without a hitch.

So that appears to suggest that the problem is something to do with
invoking exit() and not _exit() upon exiting a subshell on error. Could
it be that the 'forked' flag contains the wrong value?

- M.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Solaris-specific program flow corruption after subshell error exit
  2017-02-26 22:55   ` Martijn Dekker
@ 2017-02-27  7:39     ` Bart Schaefer
  2017-02-27 14:00       ` Martijn Dekker
  0 siblings, 1 reply; 6+ messages in thread
From: Bart Schaefer @ 2017-02-27  7:39 UTC (permalink / raw)
  To: Martijn Dekker, Zsh hackers list

On Feb 26, 11:55pm, Martijn Dekker wrote:
}
} So that appears to suggest that the problem is something to do with
} invoking exit() and not _exit() upon exiting a subshell on error. Could
} it be that the 'forked' flag contains the wrong value?

"forked" is local to that function and is correct in that the "set"
command has not forked from the (sub)shell in which it is running.

The problem seems to be that this test is not sufficient in the case
where we're exiting due to failure of a special builtin.  Instead we
also need to know whether the surrounding context is a subshell.

You might think the global "subsh" in exec.c would record that, but
it doesn't.

The following is a hack and there should probably be another way to
handle this, but try the patch below to see if it fixes the issue?
And then maybe somebody else can chime in with the right thing to be
testing here.

diff --git a/Src/exec.c b/Src/exec.c
index 83d1513..6af4ddb 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -3996,6 +3996,7 @@ execcmd_exec(Estate state, Execcmd_params eparams,
 	 * classify as a builtin) we treat all errors as fatal.
 	 * The "command" builtin is not special so resets this behaviour.
 	 */
+	forked |= zsh_subshell;
     fatal:
 	if (redir_err || errflag) {
 	    if (!isset(INTERACTIVE)) {

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Solaris-specific program flow corruption after subshell error exit
  2017-02-27  7:39     ` Bart Schaefer
@ 2017-02-27 14:00       ` Martijn Dekker
  2017-07-05  1:37         ` Martijn Dekker
  0 siblings, 1 reply; 6+ messages in thread
From: Martijn Dekker @ 2017-02-27 14:00 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

Op 27-02-17 om 08:39 schreef Bart Schaefer:
> The following is a hack and there should probably be another way to
> handle this, but try the patch below to see if it fixes the issue?

Indeed, that fixes it nicely.

> And then maybe somebody else can chime in with the right thing to be
> testing here.

I'll await that with interest.

- M.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] Solaris-specific program flow corruption after subshell error exit
  2017-02-27 14:00       ` Martijn Dekker
@ 2017-07-05  1:37         ` Martijn Dekker
  0 siblings, 0 replies; 6+ messages in thread
From: Martijn Dekker @ 2017-07-05  1:37 UTC (permalink / raw)
  To: Bart Schaefer, Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 1020 bytes --]

Op 27-02-17 om 15:00 schreef Martijn Dekker:
> Op 27-02-17 om 08:39 schreef Bart Schaefer:
>> The following is a hack and there should probably be another way to
>> handle this, but try the patch below to see if it fixes the issue?
> 
> Indeed, that fixes it nicely.
> 
>> And then maybe somebody else can chime in with the right thing to be
>> testing here.
> 
> I'll await that with interest.

I was distracted by real life for a while and didn't notice you
committed that patch on the 4th of March, because the list remained
silent on it.

Meanwhile, just FYI: I just finally got around to trying Slackware Linux
14.2 and zsh (any version before 4th of March) has the same bug on that
system! On Slackware 14.1 and earlier, the bug does not happen. Your
patch fixes it on Slackware 14.2 as it does on Solaris. So it is not
Solaris-specific after all; perhaps it's an interaction with certain
versions of (g)libc.

Meanwhile I adapted my test script for this bug to the zsh test suite
(see attached patch).

- Martijn

[-- Attachment #2: test-40645.patch --]
[-- Type: text/plain, Size: 787 bytes --]

diff --git a/Test/A05execution.ztst b/Test/A05execution.ztst
index 0804691..35efbc0 100644
--- a/Test/A05execution.ztst
+++ b/Test/A05execution.ztst
@@ -310,3 +310,17 @@ F:anonymous function, and a descriptor leak when backgrounding a pipeline
 >17
 >19
 
+# Regression test for workers/40645
+  # Bug only occurs with POSIXBUILTINS active.
+  setopt POSIXBUILTINS
+  # A dot script is needed to trigger the bug.
+  printf '%s' '
+    # Execution counter.
+    count=0
+    # Exiting from a subshell due to an error triggers the bug.
+    (set -o nonexistent_@_option) 2>/dev/null
+    # With the bug, this will be executed twice so "let" returns true (0).
+    let "(count += 1) > 1"
+  ' > 40645.t
+  . ./40645.t
+1:program flow corruption with POSIXBUILTINS after subshell error exit

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-07-05  1:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-26  5:36 [BUG] Solaris-specific program flow corruption after subshell error exit Martijn Dekker
2017-02-26 20:29 ` Bart Schaefer
2017-02-26 22:55   ` Martijn Dekker
2017-02-27  7:39     ` Bart Schaefer
2017-02-27 14:00       ` Martijn Dekker
2017-07-05  1:37         ` Martijn Dekker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).