From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=5.0 tests=MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from primenet.com.au (ns1.primenet.com.au [203.24.36.2]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id 17b58630 for ; Fri, 21 Dec 2018 11:32:18 +0000 (UTC) Received: (qmail 25072 invoked by alias); 21 Dec 2018 11:32:01 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: List-Unsubscribe: X-Seq: 43923 Received: (qmail 29766 invoked by uid 1010); 21 Dec 2018 11:32:01 -0000 X-Qmail-Scanner-Diagnostics: from munnari.oz.au by f.primenet.com.au (envelope-from , uid 7791) with qmail-scanner-2.11 (clamdscan: 0.100.2/25112. spamassassin: 3.4.2. Clear:RC:0(202.29.151.3):SA:0(-1.9/5.0):. Processed in 1.55904 secs); 21 Dec 2018 11:32:01 -0000 X-Envelope-From: kre@munnari.OZ.AU X-Qmail-Scanner-Mime-Attachments: | X-Qmail-Scanner-Zip-Files: | From: Robert Elz To: Bart Schaefer cc: Martijn Dekker , "zsh-workers@zsh.org" Subject: Re: The big kre zsh bug report In-Reply-To: References: <18f684a8-2fec-4ebe-a63e-cf6688ae519f@inlv.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Fri, 21 Dec 2018 18:30:52 +0700 Message-ID: <16681.1545391852@jinx.noi.kre.to> Date: Thu, 20 Dec 2018 23:53:52 -0800 From: Bart Schaefer Message-ID: | > Looks logical to me: in that the ${X+$@} parameter substitution | > substitutes $@, within quotes, leaving "$@", which is definitely removed | > completely if there are no positional parameters. | I'd prefer that this continue to act like bash and ksh than to follow | the abstract spec. This is a case where it can actually affect some real scripts, even though it is rare. That's why I changed what the NetBSD sh does. | It seems to me that changing this for $=t in zsh native mode might | break a lot of things, so I'll leave it open for discussion as to | whether it's feasible to change it only for emulation mode. However, | it does differ from the most recent version of bash I have handy. bash -c 't=" x"; IFS=" x"; set $t; IFS=":"; r="$*"; IFS=; echo $# $r' 1 That isn't even -o posix mode, nor is it the most recent bash available. (ksh93 does the same, so does mksh/dash/...). >From the version in the prompt string where you illustrated bash behaviour later, you seem to be testing against a very old bash (though apparently bash 3 still exists in the wild). bash has had a lot of bugs fixed in the interim (and more keep getting fixed). | For the lazy or very busy reader: | | -h Locate and remember utilities invoked by functions as those | functions are defined (the utilities are normally located when the | function is executed). Yes, I know what it means. I cannot think of a use for it, nor have I ever seen any script that would ever use it, nor does anything I know of implement it (though I have not tried ksh88 - ksh93 might implement some variant, but not what is specified: ksh93 -c 'X=$PATH; PATH=/no/such/place; set -h; g() { grep "$@";}; PATH=$X; g fff /tmp/bbb' grep: /tmp/bbb: No such file or directory If the path lookups were done (only) during fn definition, then that should not find grep rather than run it. If -h is simply some kind of optimisation, it is worthless (its purpose had to be as some kind of way of guaranteeing what PATH would be used for library functions, so they could be standalone, and work in any environment, which standard sh "dynamic everything" does not make easy). [Aside: it is entirely possible -h will be removed from POSIX sometime, I think a request for that has been lodged already, but that doesn't mean that other shells will simply delete that current "support" for it, ie: allowing it to be set/reset and otherwise ignoring it.] | > > tc-se:case_matching[147]: Test of 'var='\z'; case ${var} in (${var}) printf M;; (*) printf X;; esac' failed. | > > | > > The word to match is two chars, backslash and z, the pattern is | > > a quoted 'z' (the backslash becomes a quoting character). | | I don't think this is going to get fixed. I went looking for this | test case but didn't find it: | | var='"z"'; case ${var} in (${var}) printf M;; (*) printf X;; esac | | This also prints M. If backslash-z should become a quoted z, | shouldn't the above case also become a quoted z? No. It all comes from the use of RE's to define how glob works (and the ancient implementation) - '\' quotes magic in an RE, other forms of shell quoting do not. So, for the string given to the matcher, a \ is the one and only quote char. Where it gets really messy is that to remain consistent with itself, chars that the shell has already quoted are never interpreted as magic, even when they would be if (once the sh quoting is no longer there) they would be in an RE (which is how in a literal pattern it is possible to use sh quoting to quote the '-' in a [] match, whereas in a RE only putting the - first or last removes its "range of chars" meaning). All this flows from the way the original Bourne shell implemented quoting ( ch |= 0x80 ) and that quote removal doesn't happen until after glob or parameter expansion (for the ${var%pattern} etc stuff, and never happens in case matching (not needed, as all that ever wants is a match/no-match - no-one cares what matched). Some of it is hideous, but we are stuck with it... In native zsh mode if you want to change that to be more like csh, which did complain about invalid patterns, that's fine, but for emulating sh it really isn't. | So that means in | case statements all variable references have to be treated as if they | were ${(Q)var} (to use zsh-speke)? What if the quotes aren't | balanced? I don't speak zsh, so this is hard, but I can guess, but no, I don't think that is correct, and as quotes are just chars (the same as they are any other time a variable is expanded), they don't need to be balanced. \ is only special because it is defined to be that way for matching (and because something needs to be for uses of glob matching in other utilities, like find, which don't do anything like sh quoting, but need to be able to distinguish a literal '*' from "match anything" somehow, and \ is the way it is done there too.) | > I'm not sure the zsh authors aim to make emulation modes quite that | > exact, but I'll just leave this here for their consideration. | | There isn't any --emulate bash, really, it's merly a synonym for | --emulate sh. In that case, I would suggest deleting it. It gives a false impression. bash has lots of stuff that sh does not have, how close some of that is to zsh native mode I have no idea. | And therefore no, it's not intended to be perfect. It depends what you're aiming for with --emulate ... if it is just so you can run native sh scripts, then most of this does not matter. But an alternative use is to allow zsh users to check if their scripts will work when run with some other sh (ie: if they are portable scripts or not) and for that, while perfection is not required, allowing too many variations would render the --emulate stuff useless. | That's not true, it just doesn't handle redirection to descriptors >9 | without the use of variant syntax. OK. That's better -- but again, in sh/bash mode, it really should use sh/bash syntax (again, otherwise the emulation isn't very useful). | % case in in (esac|cat | case> zsh: parse error near `(esac|cat' | | So this has apparently been special-cased to be silent when the shell | is not interactive. I have no knowledge of zsh internals - but in many other shells when this kind of thing happens, it is more the interactive mode that is special cased - normally when the shell reads EOF it simply exits (after running any EDIT trap) and the logic to do that is buried deep in the input routines. That is why in a lot of shells all kinds of missing termination is ignored (which doesn't really affect any valid script, naturally). And somehow in all of that I managed to just delete the one case that caused me to start to reply, so here it is again out of order ... [Sorry about ugly formatting, it is a side effect of the way I got this text back into this message...] | > > tc-se:shell_params[13]: Test of 'set -- a b c d; echo ${4294967297}' | failed. > > tc-se:[13] Expected output '', received 'a' > > > > This | indicates that 32 bit arith overflow occurred, and wasn't detected. > > | Confirmed (also on ksh93 and dash). | Checking for overflow here seems like a lot of computational expense for a | case that probably only happens in test suites. Since zsh implements arrays | as actual non-sparse C arrays, memory is going to explode long before | anything manages to assign that many positional parameters. Don't bet on it. An array with 2^32 unset elements, followed by one set (assuming it is implemented in a rational way) is just 8 * (2^32 + 1) bytes, plus noise, which is just 32 GiB (plus change) - my laptop has 32GiB today - so we are already very very close to average consumer systems being able to do that, lots of desktop/server type systems can install 128GiB (or more) which would be plenty to allow this to work - especially in zsh which apparently allows 4294967297=hello which other shells do not... jinx$ zsh --emulate sh -c '1=hello; echo ${1}' hello jinx$ zsh --emulate sh -c '4294967297=hello; echo ${4294967297}' hello jinx$ zsh --emulate sh -c '4294967297=hello; echo ${1}' hello But even with other shells: set -- '' '' '' '' [repeat 2^32 times] hello is going to be feasible (if not fast) quite soon now. Similarly the even slower, but feasible to write: set -- '' for i in $(seq 1 32) do set -- "$@" "$@" done set -- "$@" hello [aside: not sure if the args to seq are correct there, didn't atually test that... there may be an off by 1], and I know these versions need more space to store all the empty strings .. but that should be less than another factor of 2. On the NetBSD lists (because we cannot currently handle it - in the kernel) there was recent mention of a HP system, available today, with 48TiB of ram. For a system like that, a few tens of GiB's of memory per process is in the noise ... (handling processes that big is why systems that big are designed, built, and desired). The overflow test has negligible cost -- strtoxxx() is not really any more expensive than atoi() and really ought to be used everywhere, it is future proof against all kinds of bugs that "cannot happen in real life" today, but will tomorrow. The few extra nano-secs the test takes is in the noise when compared with everything else that is happening. I am not suggesting there is any need to make being able to set that many args actually work (we certainly don't) just to detect the overflow. When that happens here, we just treat the positional parameter as unset and continue, other places (like as the arg to shift) we treat it just the same as any other time the value is too big (shift count exceeds $# in that case), and in some other places it simply becomes an error (sh cannot handle that, so error("Number out of range") or something.) Anything but silent truncation of the value. kre