From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-43923-ml=inbox.vuxu.org@zsh.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2
Received: from primenet.com.au (ns1.primenet.com.au [203.24.36.2])
	by inbox.vuxu.org (OpenSMTPD) with ESMTP id 17b58630
	for <ml@inbox.vuxu.org>;
	Fri, 21 Dec 2018 11:32:18 +0000 (UTC)
Received: (qmail 25072 invoked by alias); 21 Dec 2018 11:32:01 -0000
Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
List-Unsubscribe: <mailto:zsh-workers-unsubscribe@zsh.org>
X-Seq: 43923
Received: (qmail 29766 invoked by uid 1010); 21 Dec 2018 11:32:01 -0000
X-Qmail-Scanner-Diagnostics: from munnari.oz.au by f.primenet.com.au (envelope-from <kre@munnari.OZ.AU>, uid 7791) with qmail-scanner-2.11 
 (clamdscan: 0.100.2/25112. spamassassin: 3.4.2.  
 Clear:RC:0(202.29.151.3):SA:0(-1.9/5.0):. 
 Processed in 1.55904 secs); 21 Dec 2018 11:32:01 -0000
X-Envelope-From: kre@munnari.OZ.AU
X-Qmail-Scanner-Mime-Attachments: |
X-Qmail-Scanner-Zip-Files: |
From: Robert Elz <kre@munnari.OZ.AU>
To: Bart Schaefer <schaefer@brasslantern.com>
cc: Martijn Dekker <martijn@inlv.org>,
        "zsh-workers@zsh.org" <zsh-workers@zsh.org>
Subject: Re: The big kre zsh bug report
In-Reply-To: <CAH+w=7bthv3E3Lr3UC9FvroXFcS1W+fDRuS6CLPxy_eyX0szqw@mail.gmail.com>
References: <CAH+w=7bthv3E3Lr3UC9FvroXFcS1W+fDRuS6CLPxy_eyX0szqw@mail.gmail.com> <d7b0451f90bdfe61f48cc1361690180e07158900.camel@ntlworld.com> <b8851c3a50bd8bceba1961f2f764e1a6869481ac.camel@ntlworld.com> <18f684a8-2fec-4ebe-a63e-cf6688ae519f@inlv.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Date: Fri, 21 Dec 2018 18:30:52 +0700
Message-ID: <16681.1545391852@jinx.noi.kre.to>

    Date:        Thu, 20 Dec 2018 23:53:52 -0800
    From:        Bart Schaefer <schaefer@brasslantern.com>
    Message-ID:  <CAH+w=7bthv3E3Lr3UC9FvroXFcS1W+fDRuS6CLPxy_eyX0szqw@mail.gmail.com>


  | > Looks logical to me: in that the ${X+$@} parameter substitution
  | > substitutes $@, within quotes, leaving "$@", which is definitely removed
  | > completely if there are no positional parameters.

  | I'd prefer that this continue to act like bash and ksh than to follow
  | the abstract spec.

This is a case where it can actually affect some real scripts, even
though it is rare.   That's why I changed what the NetBSD sh does.

  | It seems to me that changing this for $=t in zsh native mode might
  | break a lot of things, so I'll leave it open for discussion as to
  | whether it's feasible to change it only for emulation mode.  However,
  | it does differ from the most recent version of bash I have handy.

bash -c 't=" x";     IFS=" x"; set $t; IFS=":"; r="$*"; IFS=; echo $# $r'
1

That isn't even -o posix mode, nor is it the most recent bash available.
(ksh93 does the same, so does mksh/dash/...).

>From the version in the prompt string where you illustrated bash
behaviour later, you seem to be testing against a very old bash
(though apparently bash 3 still exists in the wild).   bash has had
a lot of bugs fixed in the interim (and more keep getting fixed).

  | For the lazy or very busy reader:
  |
  | -h  Locate and remember utilities invoked by functions as those
  | functions are defined (the utilities are normally located when the
  | function is executed).

Yes, I know what it means.   I cannot think of a use for it, nor have
I ever seen any script that would ever use it, nor does anything I
know of implement it (though I have not tried ksh88 - ksh93 might
implement some variant, but not what is specified:

ksh93 -c 'X=$PATH; PATH=/no/such/place; set -h; g() { grep "$@";}; PATH=$X;  g fff /tmp/bbb'
grep: /tmp/bbb: No such file or directory

If the path lookups were done (only) during fn definition, then
that should not find grep rather than run it.   If -h is simply some
kind of optimisation, it is worthless (its purpose had to be as some
kind of way of guaranteeing what PATH would be used for
library functions, so they could be standalone, and work in any
environment, which standard sh "dynamic everything" does not
make easy).

[Aside: it is entirely possible -h will be removed from POSIX sometime,
I think a request for that has been lodged already, but that doesn't
mean that other shells will simply delete that current "support" for it,
ie: allowing it to be set/reset and otherwise ignoring it.]


  | > > tc-se:case_matching[147]: Test of 'var='\z'; case ${var} in (${var}) printf M;; (*) printf X;; esac' failed.
  | > >
  | > > The word to match is two chars, backslash and z, the pattern is
  | > > a quoted 'z' (the backslash becomes a quoting character).
  |
  | I don't think this is going to get fixed.  I went looking for this
  | test case but didn't find it:
  |
  | var='"z"'; case ${var} in (${var}) printf M;; (*) printf X;; esac
  |
  | This also prints M.  If backslash-z should become a quoted z,
  | shouldn't the above case also become a quoted z?

No.   It all comes from the use of RE's to define how glob works
(and the ancient implementation) - '\' quotes magic in an RE, other
forms of shell quoting do not.   So, for the string given to the matcher,
a \ is the one and only quote char.   Where it gets really messy is
that to remain consistent with itself, chars that the shell has already
quoted are never interpreted as magic, even when they would be
if (once the sh quoting is no longer there) they would be in an RE
(which is how in a literal pattern it is possible to use sh quoting
to quote the '-' in a [] match, whereas in a RE only putting the - first
or last removes its "range of chars" meaning).

All this flows from the way the original Bourne shell implemented
quoting ( ch |= 0x80 ) and that quote removal doesn't happen until
after glob or parameter expansion (for the ${var%pattern} etc stuff,
and never happens in case matching (not needed, as all that ever
wants is a match/no-match - no-one cares what matched).  Some of
it is hideous, but we are stuck with it...

In native zsh mode if you want to change that to be more like csh,
which did complain about invalid patterns, that's fine, but for emulating
sh it really isn't.

  | So that means in
  | case statements all variable references have to be treated as if they
  | were ${(Q)var} (to use zsh-speke)?  What if the quotes aren't
  | balanced?

I don't speak zsh, so this is hard, but I can guess, but no, I don't think
that is correct, and as quotes are just chars (the same as they are any
other time a variable is expanded), they don't need to be balanced.
\ is only special because it is defined to be that way for matching (and
because something needs to be for uses of glob matching in other
utilities, like find, which don't do anything like sh quoting, but need to
be able to distinguish a literal '*' from "match anything" somehow, and \
is the way it is done there too.)

  | > I'm not sure the zsh authors aim to make emulation modes quite that
  | > exact, but I'll just leave this here for their consideration.
  |
  | There isn't any --emulate bash, really, it's merly a synonym for
  | --emulate sh.

In that case, I would suggest deleting it.   It gives a false impression.
bash has lots of stuff that sh does not have, how close some of that
is to zsh native mode I have no idea.

  | And therefore no, it's not intended to be perfect.

It depends what you're aiming for with --emulate ... if it is just so you
can run native sh scripts, then most of this does not matter.  But an
alternative use is to allow zsh users to check if their scripts will
work when run with some other sh (ie: if they are portable scripts or
not) and for that, while perfection is not required, allowing too many
variations would render the --emulate stuff useless.

  | That's not true, it just doesn't handle redirection to descriptors >9
  | without the use of variant syntax.

OK.  That's better -- but again, in sh/bash mode, it really should
use sh/bash syntax (again, otherwise the emulation isn't very
useful).


  | % case in in (esac|cat
  | case> zsh: parse error near `(esac|cat'
  |
  | So this has apparently been special-cased to be silent when the shell
  | is not interactive.

I have no knowledge of zsh internals - but in many other shells when
this kind of thing happens, it is more the interactive mode that is
special cased - normally when the shell reads EOF it simply exits
(after running any EDIT trap) and the logic to do that is buried deep
in the input routines.   That is why in a lot of shells all kinds of missing
termination is ignored (which doesn't really affect any valid script,
naturally).


And somehow in all of that I managed to just delete the one case that
caused me to start to reply, so here it is again out of order ...

[Sorry about ugly formatting, it is a side effect of the way I got
this text back into this message...]

  | > > tc-se:shell_params[13]: Test of 'set -- a b c d; echo ${4294967297}'
  | failed. > > tc-se:[13] Expected output '', received 'a' > > > > This
  | indicates that 32 bit arith overflow occurred, and wasn't detected. > >
  | Confirmed (also on ksh93 and dash).

  | Checking for overflow here seems like a lot of computational expense for a
  | case that probably only happens in test suites.  Since zsh implements arrays
  | as actual non-sparse C arrays, memory is going to explode long before
  | anything manages to assign that many positional parameters. 

Don't bet on it.   An array with 2^32 unset elements, followed by one set
(assuming it is implemented in a rational way) is just 8 * (2^32 + 1) bytes,
plus noise, which is just 32 GiB (plus change) - my laptop has 32GiB today
- so we are already very very close to average consumer systems being
able to do that, lots of desktop/server type systems can install 128GiB
(or more) which would be plenty to allow this to work - especially in zsh
which apparently allows
	4294967297=hello
which other shells do not...

jinx$ zsh --emulate sh -c '1=hello; echo ${1}'
hello
jinx$ zsh --emulate sh -c '4294967297=hello; echo ${4294967297}'
hello
jinx$ zsh --emulate sh -c '4294967297=hello; echo ${1}'
hello

But even with other shells:
	set -- '' '' '' '' [repeat 2^32 times] hello
is going to be feasible (if not fast) quite soon now.   Similarly the
even slower, but feasible to write:

	set -- ''
	for i in $(seq 1 32)
	do
		set -- "$@" "$@"
	done
	set -- "$@" hello

[aside: not sure if the args to seq are correct there, didn't atually test 
that... there may be an off by 1], and I know these versions need more
space to store all the empty strings .. but that should be less than
another factor of 2.

On the NetBSD lists (because we cannot currently handle it - in the
kernel) there was recent mention of a HP system, available today, with
48TiB of ram.  For a system like that, a few tens of GiB's of memory
per process is in the noise ...  (handling processes that big is why
systems that big are designed, built, and desired).

The overflow test has negligible cost -- strtoxxx() is not really any more
expensive than atoi() and really ought to be used everywhere, it is
future proof against all kinds of bugs that "cannot happen in real life"
today, but will tomorrow.  The few extra nano-secs the test takes is in
the noise when compared with everything else that is happening.

I am not suggesting there is any need to make being able to set that
many args actually work (we certainly don't) just to detect the overflow.

When that happens here, we just treat the positional parameter as unset
and continue, other places (like as the arg to shift) we treat it just the
same as any other time the value is too big (shift count exceeds $# in
that case), and in some other places it simply becomes an error (sh
cannot handle that, so error("Number out of range") or something.)
Anything but silent truncation of the value.

kre