From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-32632-mason-zsh=primenet.com.au@zsh.org>
Received: (qmail 10248 invoked by alias); 30 May 2014 21:15:06 -0000
Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Seq: 32632
Received: (qmail 28398 invoked from network); 30 May 2014 21:14:50 -0000
X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED
	autolearn=ham version=3.3.2
X-Submitted: to socket.bbn.com (Postfix) with ESMTPSA id 383EB403E8
Message-ID: <5388F4C3.6070801@bbn.com>
Date: Fri, 30 May 2014 17:14:43 -0400
From: Richard Hansen <rhansen@bbn.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: Bart Schaefer <schaefer@brasslantern.com>
CC: zsh-workers@zsh.org
Subject: Re: 'emulate sh -c' and $0
References: <5387BD0D.8090202@bbn.com>	<140529204533.ZM5362@torch.brasslantern.com>	<5388461D.8060203@bbn.com> <140530100050.ZM18382@torch.brasslantern.com>
In-Reply-To: <140530100050.ZM18382@torch.brasslantern.com>
X-Enigmail-Version: 1.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 2014-05-30 13:00, Bart Schaefer wrote:
> Finally the entry for the emulate builtin says:
> 
>      With single argument set up zsh options to emulate the specified
>      shell as much as possible.  `csh' will never be fully emulated.
>      If the argument is not one of the shells listed above, zsh will be
>      used as a default; more precisely, the tests performed on the
>      argument are the same as those used to determine the emulation at
>      startup based on the shell name, see Compatibility.

Thanks for the documentation references.  I had read "to emulate the
specified shell as much as possible" without paying enough attention to
the qualifying "set up zsh options".  So you are right:  The
documentation says that emulate only toggles options, and the behavior
of $0 with FUNCTION_ARGZERO is clear, so there's no reason to expect Zsh
to reset $0 to the original $0 when in sh emulation mode.

That being said, I still think there's value in changing Zsh's behavior.

> 
>> (I am aware of the documentation for the FUNCTION_ARGZERO option.  I'm
>> more interested in what it really means to be running in sh emulation
>> mode, as that's where I think the bug is.)
> 
> In general, emulation is at its most complete if and only if the shell
> is actually started as an emulator (e.g., the path name to the shell
> binary itself is not zsh, or ARGV0 is set in the environment).  The
> "emulate" builtin only changes setopts to the closest possible.

Would it add too much complexity to the code or documentation if the
emulate builtin did more than just toggle options (specifically:
temporarily change the binding of $0 to the original value)?

Perhaps the behavior of FUNCTION_ARGZERO could be altered so that $0
expands as follows:

    If option FUNCTION_ARGZERO is enabled and $0 is expanded inside the
    body of a function, $0 expands to the name of the enclosing
    function.

    Otherwise, if option FUNCTION_ARGZERO is enabled and $0 is expanded
    inside a sourced file, $0 expands to the pathname given to the
    'source' or '.' builtin command.

    Otherwise, if the shell was invoked with an argument naming a
    script containing shell commands to be executed, $0 expands to the
    value of that argument.

    Otherwise, if the shell was invoked with the '-c' flag and at least
    one non-option non-flag argument was given, $0 expands to the value
    of the first non-option non-flag argument.

    Otherwise, $0 expands to the value of the first argument passed to
    zsh from its parent (argv[0] in C).

This modification would make it possible to toggle the setting back and
forth to examine the local or original value as desired, even within the
same function.  I wouldn't expect this change to break many scripts, but
maybe any backward incompatibility is unacceptable.

>>> I don't find those examples particularly compelling,
>> 
>> Here's the real-world problem that motivated my bug report; perhaps it
>> is a more compelling example (or perhaps you'll think of a better way to
>> solve the problem I was addressing):
>> 
>> http://article.gmane.org/gmane.comp.version-control.git/250409
> 
> Instead of "compelling" I perhaps should have said "likely to come up
> in common usage."  You have a fairly rare special case there.

Good point.  :)

> In that example,
> 
>     ARGV0=sh exec zsh "$0" "$@"
> 
> might do what you want, but I'm not entirely following from the diff
> context what's intended.

Some more context if you're curious:  The Git distribution comes with
t/test-lib.sh, a file containing POSIX shell code implementing common
test infrastructure (print error messages, declare and run test cases,
etc.).  The test scripts are POSIX shell scripts that source this shared
file, with two exceptions:

  * t/t9903-bash-prompt.sh starts off running under /bin/sh, but it
    does the following early on:

        exec bash "$0" "$@"

    so that it can run and test Bash-specific shell code.  After
    reinvoking itself under Bash, the code sources test-lib.sh in order
    to reuse the shared test infrastructure code.  (The code in
    test-lib.sh is interpreted as Bash code, not POSIX shell code, but
    that doesn't really matter because the code is compatible with both
    shells.)

  * t/t9904-zsh-prompt.sh (new in that linked patch series) is similar
    to t9903, except it restarts itself under Zsh instead of Bash.
    Like t9903, it sources test-lib.sh, but because the code in
    test-lib.sh is incompatible with Zsh, it uses Zsh's sh emulation to
    source test-lib.sh.

The point of these two test scripts is to run Bash and Zsh in their
native modes as much as possible -- emulation is explicitly avoided
except as necessary to run the shared test infrastructure.

So 'ARGV0=sh exec zsh "$0" "$@"' doesn't work for two reasons:

  * at the time that line is executed, the script is being interpreted
    by /bin/sh and not Zsh, so the ARGV0 assignment won't have the
    desired effect

  * we want as little as possible to run in sh emulation mode so that
    we can test Zsh-specific code

>  
>>> but the original
>>> value of $0 is already stashed; what would need to change is that the
>>> *local* value of $0 gets temporarily replaced by the global one.
>> 
>> That's good news; that should make it easier to write a patch that
>> temporarily replaces the local value with the global value.
> 
> Unfortunately the way the local value is implemented is usually to use
> C local variable scoping to stash and restore the contents of a C global
> pointer, so this would mean at least one additional C global.
> 
>> Would you (or anyone else in the community) be opposed to such a patch?
> 
> The use cases in both directions seem pretty unusual to me.  Losing the
> ability to "localize" $0 for scripts feels almost as likely to create
> questions as does your situation.

I'm not sure what you mean by losing the ability to localize $0.

I see a few OK options:

  * Option #1:
    1. Add a new global variable 'orig_argzero' to hold the original
       value of $0.  This variable is never modified once set.
    2. The existing global variable 'argzero' continues to serve its
       current role of holding the "localized" value of $0 (it is
       updated when executing functions or sourcing files if
       FUNCTION_ARGZERO is enabled).
    3. When 'emulate sh' starts, temporarily set argzero to
       orig_argzero.  Restore argzero when 'emulate sh' returns.

This would result in behavior that is identical to the current behavior
except $0 would match the POSIX spec when in sh emulation mode (and only
in sh emulation mode).

  * Option #2:
    1. Add a new global variable 'orig_argzero' to hold the original
       value of $0.  This variable is never modified once set.
    2. The existing global variable 'argzero' continues to serve its
       current role of holding the "localized" value of $0 (it is
       updated when executing functions or sourcing files if
       FUNCTION_ARGZERO is enabled).
    3. Add a new option; let's call it LOCALIZE_ARGZERO for now.  If
       LOCALIZE_ARGZERO is enabled, use argzero to expand $0.  If
       LOCALIZE_ARGZERO is disabled, use orig_argzero to expand $0.
    4. Enable LOCALIZE_ARGZERO by default, but disable it in sh
       emulation mode.
    5. Stop disabling FUNCTION_ARGZERO by default in sh emulation mode.

  * Option #3:
    1. Add a new global variable 'orig_argzero' to hold the original
       value of $0.  This variable is never modified once set.
    2. Whenever a function is called or a file sourced, update the
       global variable holding the "localized" $0 ('argzero'), even if
       FUNCTION_ARGZERO is disabled.
    3. Modify the expansion rules for $0 as follows:  If
       FUNCTION_ARGZERO is enabled, use argzero to expand $0.  If
       FUNCTION_ARGZERO is disabled, use orig_argzero to expand $0.

Pros and cons:

Option #1 is simplest to implement, simple for users, and (mostly)
backward compatible, but less powerful than options #2 and #3 and
'emulate' no longer just sets options.

Option #2 is complex but powerful (scripts can read both the original $0
and the localized $0 in the same chunk of code) and (mostly) backward
compatible.  Note that option #1 can be used as a stepping stone to
option #2.

Option #3 is simple for users but not backward compatible.

I think my preference is to go with option #1 with a possible future
step to option #2 (at which time FUNCTION_ARGZERO can be deprecated in
favor of LOCALIZE_ARGZERO).

> I suppose if both values were in the
> C global state, it would be possible to have the "correct" one appear
> at the instant functionargzero changes, instead of being determined by
> the setting at the time the function is entered.  OTOH that would be a
> larger behavior difference / lack of backward compatibilty.

Oops, I should have thoroughly read your email before proposing the same
thing but with more words.  :)

> 
>> If not, can you point me to the relevant bits of code to help me get
>> started?
> 
> Search Src/*.c for references to "argzero", with particular attention to
> builtin.c:bin_emulate.

Thanks.  No promises that I'll have the time to submit a patch soon (or
even at all), but I plan on taking a crack at it this weekend.

-Richard