Maybe this gives you a clue. In the code below the only difference between fooLoop1 and fooLoop2 is that the latter uses "false && true" instead of "false", which should not immediately trigger an ERR_EXIT but instead bubble up to the calling function and trigger an ERR_EXIT there.

function fooLoop1() {
    init;
    v=2;
    while [[ $v -ne 0 ]]; do
        echo "Loop with v=$v" >&2;
        v=$((v-1));
        false;
    done
}
function fooLoop2() {
    init;
    v=2;
    while [[ $v -ne 0 ]]; do
        echo "Loop with v=$v" >&2;
        v=$((v-1));
        false && true;
    done
}

In the latest Zsh, fooLoop1 triggers an ERR_EXIT at the line of the "false" (expected and correct). fooLoop2 also triggers an ERR_EXIT but after two loop iterations and at the line of the while.

Now let's remove the following code from exec.c

       if (!(oldnoerrexit & NOERREXIT_UNTIL_EXEC))
           noerrexit = oldnoerrexit;

With that change the two functions still behave exactly the same but if you replace the while statement with any other statement (if, case, for, ...) then no ERR_EXIT is triggered in foo. Instead the non-zero exit status correctly bubbles up to the caller and triggers an ERR_EXIT in bar.

I still don't understand why the change above fixes the problem for if, case, for, ... statements. I understand even less why it doesn't fix it for while statements.

Philippe


On Sun, Nov 13, 2022 at 3:24 PM Philippe Altherr <philippe.altherr@gmail.com> wrote:
The commenting out also fixes the problem for case statements and braces (i.e., for "{ ... }"). It works even if loop.c is reverted to the previous state with "this_noerrexit = 1" statements, which seem more correct to me.

Apparently execwhile, like execif, needs more complicated noerrexit resetting logic, even though I still don't understand what it's doing in execif and why it's needed.

Philippe


On Sun, Nov 13, 2022 at 2:55 PM Philippe Altherr <philippe.altherr@gmail.com> wrote:
You shouldn't even be bothering with 5.8.1, it's been wrong all along;
it blindly never errexits at the end of an if/then/fi.

I think that this isn't necessarily wrong. My understanding of the code so far is that the decision to trigger an ERR_EXIT is pushed down the evaluation stack. In "if cmd1; then cmd2; else cmd3; fi", only the evaluations of the (word codes representing the) commands "cmd1", "cmd2", or "cmd3" can ever trigger an ERR_EXIT. The evaluation of the (word codes representing the) if/then/else itself never triggers an ERR_EXIT. In other words only (the word codes representing) "basic commands", like function calls or UNIX commands, can ever trigger an ERR_EXIT. This strategy has the benefit that ERR_EXIT will be triggered exactly at the point where the fatal non-zero exit status was produced.

00 if
01   cmd1
02  then
03   cmd2
04. else
05.   cmd3
06 fi

In the example above, with the strategy I described, and with the knowledge that the if condition never triggers an ERR_EXIT, it's guaranteed that an ERR_EXIT will only ever be thrown at line 03 or 05. If the triggering of the ERR_EXIT was sometimes delayed and delegated to the if/then/else, then ERR_EXIT could also be triggered at line 02 or 04, or worse at line 01 or 06, which wouldn't let you know whether the non-zero status originated from "cmd2" or from "cmd3". The delayed/delegated triggering looks undesirable because it gives you less information on the origin of the error. My understanding is that it's also never needed.

The behavior of ERR_EXIT is controlled by the variables "noerrexit" and "local_noerrexit". My understanding of these variables is the following:

- noerrexit: This variable is set to indicate that the triggering of ERR_EXIT must be disabled in the evaluation of any word code from the point where it's set until it's reset. For example it's set here in execif before the evaluation of the condition and reset here, here, here, and here after the evaluation of the condition. I don't really understand why the reseting is so complicated. It's much more straightforward in execwhile (here).

- local_noerrexit: This variable is set to indicate that the triggering of ERR_EXIT must be disabled in the remainder of the evaluation of the current word code. For example it's set at the end of each compound command, like here. This used to be a plain "this_noerrexit = 1", which I don't think was wrong.

I think my patches so far have uncovered a different bug that was
already present but was masked by the foregoing, which is, that
noerrexit is unwound in cases where it should not be.  I think this is
happening at lines 1530-1531 of exec.c, right under the comment about
"hairy code near the end of execif()".  That's an area I didn't touch,
but I'm pretty sure it's restoring noerrexit to its state before
entering the "if" (oldnoerrexit) when it should be preserving the
state from the "&&" conditional.  In 5.8.1 this gets reversed again
via this_noerrexit.

I must admit that I don't understand the NOERREXIT_UNTIL_EXEC logic here, nor the complicated resetting logic of noerrexit at the end of execif. I was about to say that this doesn't seem to be the source of the problem because if, while, and for statements all behave the same in Zsh.

function fooIf1()    { init; cond=true; if    $cond; then cond=false; false        ; fi  ; }
function fooIf2()    { init; cond=true; if    $cond; then cond=false; false && true; fi  ; } 
 
function fooWhile1() { init; cond=true; while $cond; do   cond=false; false        ; done; }
function fooWhile2() { init; cond=true; while $cond; do   cond=false; false && true; done; } 
 
function fooFor1()   { init; cond=true; for v in x ; do   cond=false; false        ; done; }
function fooFor2()   { init; cond=true; for v in x ; do   cond=false; false && true; done; }

In the examples above fooIf1, fooWhile1, and fooFor1 all work correctly but fooIf2, fooWhile2, and fooFor2 fail to trigger ERR_EXIT in Zsh 5.8 and trigger it too early (in foo instead of in bar) in Zsh 5.9.

However, if I comment out the NOERREXIT_UNTIL_EXEC logic in exec.c (or remove the negation), then fooIf2 and surprisingly also fooFor2 work correctly in Zsh 5.9 but not fooWhile2!?! fooWhile2 still triggers too early.

So it looks like this may indeed be the start of the answer. But I'm still scratching my head on why that is.

Philippe