From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 12121 invoked from network); 25 Mar 2008 17:25:03 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-1.2 required=5.0 tests=BAYES_00,WEIRD_QUOTING autolearn=no version=3.2.4 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 25 Mar 2008 17:25:03 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 13458 invoked from network); 25 Mar 2008 17:24:59 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 25 Mar 2008 17:24:59 -0000 Received: (qmail 11549 invoked by alias); 25 Mar 2008 17:24:56 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 24731 Received: (qmail 11531 invoked from network); 25 Mar 2008 17:24:56 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 25 Mar 2008 17:24:56 -0000 Received: from mtaout02-winn.ispmail.ntl.com (mtaout02-winn.ispmail.ntl.com [81.103.221.48]) by bifrost.dotsrc.org (Postfix) with ESMTP id 04E408097826 for ; Tue, 25 Mar 2008 18:24:47 +0100 (CET) Received: from aamtaout02-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout02-winn.ispmail.ntl.com with ESMTP id <20080325172731.ITQJ27871.mtaout02-winn.ispmail.ntl.com@aamtaout02-winn.ispmail.ntl.com> for ; Tue, 25 Mar 2008 17:27:31 +0000 Received: from pws-pc ([81.107.40.67]) by aamtaout02-winn.ispmail.ntl.com with ESMTP id <20080325172642.RXAP17393.aamtaout02-winn.ispmail.ntl.com@pws-pc> for ; Tue, 25 Mar 2008 17:26:42 +0000 Date: Tue, 25 Mar 2008 17:24:09 +0000 From: Peter Stephenson To: Zsh hackers list Subject: Re: ${a[(i)pattern]} if a=() Message-ID: <20080325172409.4357aa10@pws-pc> In-Reply-To: <080318084728.ZM12523@torch.brasslantern.com> References: <200803181213.m2ICDULc004081@pws-pc.ntlworld.com> <080318084728.ZM12523@torch.brasslantern.com> X-Mailer: Claws Mail 3.3.1 (GTK+ 2.12.5; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV 0.91.2/6392/Tue Mar 25 13:44:02 2008 on bifrost X-Virus-Status: Clean On Tue, 18 Mar 2008 08:47:28 -0700 Bart Schaefer wrote: > Looking at documenation for this, I was reminded about this recent bit: > > Note that in subscripts with both `r' and `R' pattern characters > are active even if they were substituted for a parameter > (regardless of the setting of GLOB_SUBST which controls this > feature in normal pattern matching). It is therefore necessary to > quote pattern characters for an exact string match. > > Maybe we could press the (e) flag into service here? I haven't looked > at how hard that would be to do, but it's semantically similar to the > existing use Yes, that seems perfectly reasonable, and it was easy to do (except I've just got back from holiday so it's appeared a week late). It might look a little bizarre that in one case we untokenize() and in the other case we tokenize(): you might think we'd need just one or the other. The difference occurs if the substitution is inside double quotes: if so, we need to tokenize to do pattern matching, while if not we need to untokenize to make sure we don't. It's still necessary to use a parameter as the key to guarantee all characters are interpreted literally. The issue is that we don't do full argument parsing on the subscript; it's handled a bit like a special case of double quoting (but with a different terminator), so single and double quotes don't have their quoting effect there. I don't think we want to change this in a hurry. I noticed meanwhile that the optimization for pattern-character-free strings was being confused by multibyte mode; the only difference is speed, so it's unlikely anybody would have noticed. Index: Doc/Zsh/params.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Doc/Zsh/params.yo,v retrieving revision 1.41 diff -u -r1.41 params.yo --- Doc/Zsh/params.yo 25 Oct 2007 09:33:01 -0000 1.41 +++ Doc/Zsh/params.yo 25 Mar 2008 17:08:38 -0000 @@ -227,16 +227,14 @@ If tt(KSH_ARRAYS) is in effect, the tt(-le) should be replaced by tt(-lt). Note that in subscripts with both `tt(r)' and `tt(R)' pattern characters -are active even if they were substituted for a parameter (regardless -of the setting of tt(GLOB_SUBST) which controls this feature in normal -pattern matching). It is therefore necessary to quote pattern characters -for an exact string match. Given a string in tt($key), and assuming -the tt(EXTENDED_GLOB) option is set, the following is sufficient to -match an element of an array tt($array) containing exactly the value of -tt($key): +are active even if they were substituted for a parameter (regardless of the +setting of tt(GLOB_SUBST) which controls this feature in normal pattern +matching). The flag `tt(e)' can be added to inhibit pattern matching. As +this flag does not inhibit other forms of substitution, care is still +required; using a parameter to hold the key has the desired effect: -example(key2=${key//(#m)[\][+LPAR()+RPAR()\\*?#<>~^]/\\$MATCH} -print ${array[(R)$key2]}) +example(key2='original key' +print ${array[(Re)$key2]}) ) item(tt(R))( Like `tt(r)', but gives the last match. For associative arrays, gives @@ -283,11 +281,15 @@ The delimiter character tt(:) is arbitrary; see above. ) item(tt(e))( -This flag has no effect and for ordinary arrays is retained for backward -compatibility only. For associative arrays, this flag can be used to -force tt(*) or tt(@) to be interpreted as a single key rather than as a -reference to all values. This flag may be used on the left side of an -assignment. +This flag causes any pattern matching that would be performed on the +subscript to use plain string matching instead. Hence +`tt(${array[(re)*]})' matches only the array element whose value is tt(*). +Note that other forms of substitution such as parameter substitution are +not inhibited. + +This flag can also be used to force tt(*) or tt(@) to be interpreted as +a single key rather than as a reference to all values. It may be used +for either purpose on the left side of an assignment. ) enditem() Index: Src/params.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/params.c,v retrieving revision 1.141 diff -u -r1.141 params.c --- Src/params.c 10 Jan 2008 10:25:31 -0000 1.141 +++ Src/params.c 25 Mar 2008 17:08:38 -0000 @@ -1007,7 +1007,7 @@ int hasbeg = 0, word = 0, rev = 0, ind = 0, down = 0, l, i, ishash; int keymatch = 0, needtok = 0, arglen, len; char *s = *str, *sep = NULL, *t, sav, *d, **ta, **p, *tt, c; - zlong num = 1, beg = 0, r = 0; + zlong num = 1, beg = 0, r = 0, quote_arg = 0; Patprog pprog = NULL; ishash = (v->pm && PM_TYPE(v->pm->node.flags) == PM_HASHED); @@ -1058,8 +1058,7 @@ sep = "\n"; break; case 'e': - /* Compatibility flag with no effect except to prevent * - * special interpretation by getindex() of `*' or `@'. */ + quote_arg = 1; break; case 'n': t = get_strarg(++s, &arglen); @@ -1286,7 +1285,10 @@ } } if (!keymatch) { - tokenize(s); + if (quote_arg) + untokenize(s); + else + tokenize(s); remnulargs(s); pprog = patcompile(s, 0, NULL); } else Index: Src/pattern.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v retrieving revision 1.41 diff -u -r1.41 pattern.c --- Src/pattern.c 23 Oct 2007 16:09:10 -0000 1.41 +++ Src/pattern.c 25 Mar 2008 17:08:38 -0000 @@ -511,7 +511,7 @@ if (!(patflags & PAT_ANY)) { /* Look for a really pure string, with no tokens at all. */ - if (!patglobflags + if (!(patglobflags & ~GF_MULTIBYTE) #ifdef __CYGWIN__ /* * If the OS treats files case-insensitively and we Index: Test/D04parameter.ztst =================================================================== RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v retrieving revision 1.32 diff -u -r1.32 D04parameter.ztst --- Test/D04parameter.ztst 11 Mar 2008 10:00:39 -0000 1.32 +++ Test/D04parameter.ztst 25 Mar 2008 17:08:43 -0000 @@ -282,6 +282,7 @@ print ${(P)bar} 0:${(P)...} >I'm nearly out of my mind with tedium +#' deconfuse emacs foo=(I could be watching that programme I recorded) print ${(o)foo} @@ -375,6 +376,7 @@ print ${(QX)foo} 1:${(QX)...} ?(eval):2: unmatched " +# " deconfuse emacs array=(characters in an array) print ${(c)#array} @@ -411,6 +413,7 @@ print ${(pl.10..\x22..X.)foo} 0:${(pl...)...} >Xresulting """"Xwords roariously """Xpadded +#" deconfuse emacs print ${(l.5..X.r.5..Y.)foo} print ${(l.6..X.r.4..Y.)foo} @@ -870,6 +873,7 @@ 0:Parameters associated with backreferences >match 12 16 match >1 1 1 +#' deconfuse emacs string='and look for a MATCH in here' if [[ ${(S)string%%(#m)M*H} = "and look for a in here" ]]; then @@ -1010,3 +1014,36 @@ >fields >in >it + + array=('%' '$' 'j' '*' '$foo') + print ${array[(i)*]} "${array[(i)*]}" + print ${array[(ie)*]} "${array[(ie)*]}" + key='$foo' + print ${array[(ie)$key]} "${array[(ie)$key]}" + key='*' + print ${array[(ie)$key]} "${array[(ie)$key]}" +0:Matching array indices with and without quoting +>1 1 +>4 4 +>5 5 +>4 4 + +# Ordering of associative arrays is arbitrary, so we need to use +# patterns that only match one element. + typeset -A assoc_r + assoc_r=(star '*' of '*this*' and '!that!' or '(the|other)') + print ${(kv)assoc_r[(re)*]} + print ${(kv)assoc_r[(re)*this*]} + print ${(kv)assoc_r[(re)!that!]} + print ${(kv)assoc_r[(re)(the|other)]} + print ${(kv)assoc_r[(r)*at*]} + print ${(kv)assoc_r[(r)*(ywis|bliss|kiss|miss|this)*]} + print ${(kv)assoc_r[(r)(this|that|\(the\|other\))]} +0:Reverse subscripting associative arrays with literal matching +>star * +>of *this* +>and !that! +>or (the|other) +>and !that! +>of *this* +>or (the|other) -- Peter Stephenson Web page now at http://homepage.ntlworld.com/p.w.stephenson/