From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28101 invoked by alias); 23 Nov 2010 11:14:32 -0000 Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm Precedence: bulk X-No-Archive: yes List-Id: Zsh Workers List List-Post: List-Help: X-Seq: 28434 Received: (qmail 18550 invoked from network); 23 Nov 2010 11:14:30 -0000 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_LOW, SPF_HELO_PASS autolearn=ham version=3.3.1 Received-SPF: none (ns1.primenet.com.au: domain at csr.com does not designate permitted sender hosts) Date: Tue, 23 Nov 2010 11:14:23 +0000 From: Peter Stephenson To: zsh-workers@zsh.org (Zsh hackers list) Subject: Re: PATCH: bash-style substrings & subarrays Message-ID: <20101123111423.60a04caf@pwslap01u.europe.root.pri> In-Reply-To: <201011211702.oALH2ci6003141@pws-pc.ntlworld.com> References: <101120223401.ZM6950@torch.brasslantern.com> <201011211702.oALH2ci6003141@pws-pc.ntlworld.com> Organization: Cambridge Silicon Radio X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 23 Nov 2010 11:14:24.0248 (UTC) FILETIME=[95AABF80:01CB8AFF] X-Scanned-By: MailControl A_10_80_00 (www.mailcontrol.com) on 10.71.0.128 On Sun, 21 Nov 2010 17:02:38 +0000 Peter Stephenson wrote: > Should ${foo:1} always start 1 character/element beyond the > first one, regardless which subscripting rules are in use? I'm now > inclining in that direction. Nobody commented but this is the change with some more careful documentation. Index: Doc/Zsh/expn.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v retrieving revision 1.123 diff -p -u -r1.123 expn.yo --- Doc/Zsh/expn.yo 18 Nov 2010 13:57:19 -0000 1.123 +++ Doc/Zsh/expn.yo 23 Nov 2010 11:09:33 -0000 @@ -588,23 +588,29 @@ remove the non-matched elements). xitem(tt(${)var(name)tt(:)var(offset)tt(})) item(tt(${)var(name)tt(:)var(offset)tt(:)var(length)tt(}))( This syntax gives effects similar to parameter subscripting -in the form tt($)var(name)tt({)var(offset)tt(,)var(end)tt(}) but in -a form compatible with other shells. +in the form tt($)var(name)tt({)var(start)tt(,)var(end)tt(}), but is +compatible with other shells; note that both var(offset) and var(length) +are interpreted differently from the components of a subscript. + +If var(offset) is non-negative, then if the variable var(name) is a +scalar substitute the contents starting var(offset) characters from the +first character of the string, and if var(name) is an array substitute +elements starting var(offset) elements from the first element. If +var(length) is given, substitute that many characters or elements, +otherwise the entire rest of the scalar or array. + +A positive var(offset) is always treated as the offset of a character or +element in var(name) from the first character or element of the array +(this is different from native zsh subscript notation). Hence 0 +refers to the first character or element regardless of the setting of +the option tt(KSH_ARRAYS). -If the variable var(name) is a scalar, substitute the contents -starting from offset var(offset); if var(name) is an array, -substitute elements from element var(offset). If var(length) is -given, substitute that many characters or elements, otherwise the -entire rest of the scalar or array. - -var(offset) is treated similarly to a parameter subscript: -the offset of the first character or element in var(name) -is 0 if the option tt(KSH_ARRAYS) is set, else 1; a negative -subscript counts backwards so that -1 corresponds to the last -character or element. +A negative offset counts backwards from the end of the scalar or array, +so that -1 corresponds to the last character or element, and so on. var(length) is always treated directly as a length and hence may not be -negative. +negative. The option tt(MULTIBYTE) is obeyed, i.e. the offset and length +count multibyte characters where appropriate. var(offset) and var(length) undergo the same set of shell substitutions as for scalar assignment; in addition, they are then subject to arithmetic @@ -615,19 +621,29 @@ print ${foo: 1 + 2} print ${foo:$(( 1 + 2))} print ${foo:$(echo 1 + 2)}) -all have the same effect. +all have the same effect, extracting the string starting at the fourth +character of tt($foo) if the substution would otherwise return a scalar, +or the array starting at the fourth element if tt($foo) would return an +array. Note that with the option tt(KSH_ARRAYS) tt($foo) always returns +a scalar (regardless of the use of the offset syntax) and a form +such as tt($foo[*]:3) is required to extract elements of an array named +tt(foo). -Note that if var(offset) is negative, the tt(-) may not appear immediately +If var(offset) is negative, the tt(-) may not appear immediately after the tt(:) as this indicates the -tt(${)var(name)tt(:-)var(word)tt(}) form of substitution; a space +tt(${)var(name)tt(:-)var(word)tt(}) form of substitution. Instead, a space may be inserted before the tt(-). Furthermore, neither var(offset) nor var(length) may begin with an alphabetic character or tt(&) as these are -used to indicate history-style modifiers. +used to indicate history-style modifiers. To substitute a value from a +variable, the recommended approach is to proceed it with a tt($) as this +signifies the intention (parameter substitution can easily be rendered +unreadable); however, as arithmetic substitution is performed, the +expression tt(${var: offs}) does work, retrieving the offset from +tt($offs). For further compatibility with other shells there is a special case -when the tt(KSH_ARRAYS) option is active, as in emulation of -Bourne-style shells. In this case array subscript 0 usually refers to the -first element of the array. However, if the substitution refers to the +for array offset 0. This usually accesses to the +first element of the array. However, if the substitution refers the positional parameter array, e.g. tt($@) or tt($*), then offset 0 instead refers to tt($0), offset 1 refers to tt($1), and so on. In other words, the positional parameter array is effectively extended by Index: Src/subst.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/subst.c,v retrieving revision 1.111 diff -p -u -r1.111 subst.c --- Src/subst.c 20 Nov 2010 23:46:26 -0000 1.111 +++ Src/subst.c 23 Nov 2010 11:09:34 -0000 @@ -1640,7 +1640,7 @@ paramsubst(LinkList l, LinkNode n, char int subexp; /* * If we're referring to the positional parameters, then - * e.g ${*:1:1} refers to $1 even if KSH_ARRAYS is in effect. + * e.g ${*:1:1} refers to $1. * This is for compatibility. */ int horrible_offset_hack = 0; @@ -2768,16 +2768,15 @@ paramsubst(LinkList l, LinkNode n, char return NULL; } } - if (!isset(KSHARRAYS) || horrible_offset_hack) { + if (horrible_offset_hack) { /* * As part of the 'orrible hoffset 'ack, * (what hare you? Han 'orrible hoffset 'ack, * sergeant major), if we are given a ksh/bash/POSIX - * style array which includes offset 0, we use - * $0. + * style positional parameter array which includes + * offset 0, we use $0. */ - if (isset(KSHARRAYS) && horrible_offset_hack && - offset == 0 && isarr) { + if (offset == 0 && isarr) { offset_hack_argzero = 1; } else if (offset > 0) { offset--; Index: Test/D04parameter.ztst =================================================================== RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v retrieving revision 1.46 diff -p -u -r1.46 D04parameter.ztst --- Test/D04parameter.ztst 18 Nov 2010 13:57:19 -0000 1.46 +++ Test/D04parameter.ztst 23 Nov 2010 11:09:34 -0000 @@ -1268,15 +1268,15 @@ print ${foo:$(echo 3 + 3):`echo 4 - 3`} print ${foo: -1} print ${foo: -10} -0:Bash-style subscripts, scalar ->3456789 +0:Bash-style offsets, scalar >456789 >56789 >6789 ->3 +>789 >4 >5 >6 +>7 >9 >123456789 @@ -1291,15 +1291,15 @@ print ${foo:$(echo 3 + 3):`echo 4 - 3`} print ${foo: -1} print ${foo: -10} -0:Bash-style subscripts, array ->3 4 5 6 7 8 9 +0:Bash-style offsets, array >4 5 6 7 8 9 >5 6 7 8 9 >6 7 8 9 ->3 +>7 8 9 >4 >5 >6 +>7 >9 >1 2 3 4 5 6 7 8 9 @@ -1321,7 +1321,7 @@ echo ${str: -1:1} } testfn -0:Bash-style subscripts, Bourne-style indexing +0:Bash-style offsets, Bourne-style indexing >1 >2 >3 -- Peter Stephenson Software Engineer Tel: +44 (0)1223 692070 Cambridge Silicon Radio Limited Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom