Maybe this gives you a clue. In the code below the only difference between
fooLoop1 and fooLoop2 is that the latter uses "false && true" instead of
"false", which should not immediately trigger an ERR_EXIT but instead
bubble up to the calling function and trigger an ERR_EXIT there.

function fooLoop1() {
>     init;
>     v=2;
>     while [[ $v -ne 0 ]]; do
>         echo "Loop with v=$v" >&2;
>         v=$((v-1));
>         false;
>     done
> }
> function fooLoop2() {
>     init;
>     v=2;
>     while [[ $v -ne 0 ]]; do
>         echo "Loop with v=$v" >&2;
>         v=$((v-1));
>         false && true;
>     done
> }


In the latest Zsh, fooLoop1 triggers an ERR_EXIT at the line of the "false"
(expected and correct). fooLoop2 also triggers an ERR_EXIT but after two
loop iterations and at the line of the while.

Now let's remove the following code from exec.c

       if (!(oldnoerrexit & NOERREXIT_UNTIL_EXEC))
>            noerrexit = oldnoerrexit;


With that change the two functions still behave exactly the same but if you
replace the while statement with any other statement (if, case, for, ...)
then no ERR_EXIT is triggered in foo. Instead the non-zero exit status
correctly bubbles up to the caller and triggers an ERR_EXIT in bar.

I still don't understand why the change above fixes the problem for if,
case, for, ... statements. I understand even less why it doesn't fix it for
while statements.

Philippe


On Sun, Nov 13, 2022 at 3:24 PM Philippe Altherr <philippe.altherr@gmail.com>
wrote:

> The commenting out also fixes the problem for case statements and braces
> (i.e., for "{ ... }"). It works even if loop.c is reverted to the previous
> state with "this_noerrexit = 1" statements, which seem more correct to me.
>
> Apparently execwhile, like execif, needs more complicated
> noerrexit resetting logic, even though I still don't understand what it's
> doing in execif and why it's needed.
>
> Philippe
>
>
> On Sun, Nov 13, 2022 at 2:55 PM Philippe Altherr <
> philippe.altherr@gmail.com> wrote:
>
>> You shouldn't even be bothering with 5.8.1, it's been wrong all along;
>>> it blindly never errexits at the end of an if/then/fi.
>>
>>
>> I think that this isn't necessarily wrong. My understanding of the code
>> so far is that the decision to trigger an ERR_EXIT is pushed down the
>> evaluation stack. In "if cmd1; then cmd2; else cmd3; fi", only the
>> evaluations of the (word codes representing the) commands "cmd1", "cmd2",
>> or "cmd3" can ever trigger an ERR_EXIT. The evaluation of the (word codes
>> representing the) if/then/else itself never triggers an ERR_EXIT. In other
>> words only (the word codes representing) "basic commands", like function
>> calls or UNIX commands, can ever trigger an ERR_EXIT. This strategy has the
>> benefit that ERR_EXIT will be triggered exactly at the point where the
>> fatal non-zero exit status was produced.
>>
>> 00 if
>> 01   cmd1
>> 02  then
>> 03   cmd2
>> 04. else
>> 05.   cmd3
>> 06 fi
>>
>> In the example above, with the strategy I described, and with the
>> knowledge that the if condition never triggers an ERR_EXIT, it's guaranteed
>> that an ERR_EXIT will only ever be thrown at line 03 or 05. If the
>> triggering of the ERR_EXIT was sometimes delayed and delegated to the
>> if/then/else, then ERR_EXIT could also be triggered at line 02 or 04, or
>> worse at line 01 or 06, which wouldn't let you know whether the non-zero
>> status originated from "cmd2" or from "cmd3". The delayed/delegated
>> triggering looks undesirable because it gives you less information on the
>> origin of the error. My understanding is that it's also never needed.
>>
>> The behavior of ERR_EXIT is controlled by the variables "noerrexit" and
>> "local_noerrexit". My understanding of these variables is the following:
>>
>> - noerrexit: This variable is set to indicate that the triggering of
>> ERR_EXIT must be disabled in the evaluation of any word code from the point
>> where it's set until it's reset. For example it's set here
>> <https://github.com/zsh-users/zsh/blob/b1533066ca7d50c88b37ce72093c12cf19807818/Src/loop.c#L550>
>> in execif before the evaluation of the condition and reset here
>> <https://github.com/zsh-users/zsh/blob/b1533066ca7d50c88b37ce72093c12cf19807818/Src/loop.c#L576>,
>> here
>> <https://github.com/zsh-users/zsh/blob/b1533066ca7d50c88b37ce72093c12cf19807818/Src/loop.c#L578>,
>> here
>> <https://github.com/zsh-users/zsh/blob/b1533066ca7d50c88b37ce72093c12cf19807818/Src/loop.c#L580>,
>> and here
>> <https://github.com/zsh-users/zsh/blob/b1533066ca7d50c88b37ce72093c12cf19807818/Src/loop.c#L585>
>> after the evaluation of the condition. I don't really understand why the
>> reseting is so complicated. It's much more straightforward in execwhile (
>> here
>> <https://github.com/zsh-users/zsh/blob/b1533066ca7d50c88b37ce72093c12cf19807818/Src/loop.c#L440>
>> ).
>>
>> - local_noerrexit: This variable is set to indicate that the triggering
>> of ERR_EXIT must be disabled in the remainder of the evaluation of the
>> current word code. For example it's set at the end of each compound
>> command, like here
>> <https://github.com/zsh-users/zsh/blob/b1533066ca7d50c88b37ce72093c12cf19807818/Src/loop.c#L481>.
>> This used to be a plain "this_noerrexit = 1", which I don't think was wrong.
>>
>> I think my patches so far have uncovered a different bug that was
>>> already present but was masked by the foregoing, which is, that
>>> noerrexit is unwound in cases where it should not be.  I think this is
>>> happening at lines 1530-1531 of exec.c, right under the comment about
>>> "hairy code near the end of execif()".  That's an area I didn't touch,
>>> but I'm pretty sure it's restoring noerrexit to its state before
>>> entering the "if" (oldnoerrexit) when it should be preserving the
>>> state from the "&&" conditional.  In 5.8.1 this gets reversed again
>>> via this_noerrexit.
>>
>>
>> I must admit that I don't understand the NOERREXIT_UNTIL_EXEC logic here
>> <https://github.com/zsh-users/zsh/blob/b1533066ca7d50c88b37ce72093c12cf19807818/Src/exec.c#L1529>,
>> nor the complicated resetting logic of noerrexit at the end of execif. I
>> was about to say that this doesn't seem to be the source of the problem
>> because if, while, and for statements all behave the same in Zsh.
>>
>> function fooIf1()    { init; cond=true; if    $cond; then cond=false;
>>> false        ; fi  ; }
>>> function fooIf2()    { init; cond=true; if    $cond; then cond=false;
>>> false && true; fi  ; }
>>
>>
>>
>> function fooWhile1() { init; cond=true; while $cond; do   cond=false;
>>> false        ; done; }
>>> function fooWhile2() { init; cond=true; while $cond; do   cond=false;
>>> false && true; done; }
>>
>>
>>
>> function fooFor1()   { init; cond=true; for v in x ; do   cond=false;
>>> false        ; done; }
>>> function fooFor2()   { init; cond=true; for v in x ; do   cond=false;
>>> false && true; done; }
>>
>>
>> In the examples above fooIf1, fooWhile1, and fooFor1 all work correctly
>> but fooIf2, fooWhile2, and fooFor2 fail to trigger ERR_EXIT in Zsh 5.8 and
>> trigger it too early (in foo instead of in bar) in Zsh 5.9.
>>
>> However, if I comment out the NOERREXIT_UNTIL_EXEC logic in exec.c (or
>> remove the negation), then fooIf2 and surprisingly also fooFor2 work
>> correctly in Zsh 5.9 but not fooWhile2!?! fooWhile2 still triggers too
>> early.
>>
>> So it looks like this may indeed be the start of the answer. But I'm
>> still scratching my head on why that is.
>>
>> Philippe
>>
>>