From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 8218 invoked from network); 13 Dec 2007 20:45:31 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.3 (2007-08-08) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.3 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 13 Dec 2007 20:45:31 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 56431 invoked from network); 13 Dec 2007 20:45:14 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 13 Dec 2007 20:45:14 -0000 Received: (qmail 17993 invoked by alias); 13 Dec 2007 20:45:11 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 24234 Received: (qmail 17975 invoked from network); 13 Dec 2007 20:45:10 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 13 Dec 2007 20:45:10 -0000 Received: from virusfilter.dotsrc.org (bifrost [127.0.0.1]) by spamfilter.dotsrc.org (Postfix) with ESMTP id D333F8058F55 for ; Thu, 13 Dec 2007 21:42:22 +0100 (CET) Received: from mtaout01-winn.ispmail.ntl.com (mtaout01-winn.ispmail.ntl.com [81.103.221.47]) by bifrost.dotsrc.org (Postfix) with ESMTP for ; Thu, 13 Dec 2007 21:42:22 +0100 (CET) Received: from aamtaout02-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout01-winn.ispmail.ntl.com with ESMTP id <20071213204533.MWIV13570.mtaout01-winn.ispmail.ntl.com@aamtaout02-winn.ispmail.ntl.com> for ; Thu, 13 Dec 2007 20:45:33 +0000 Received: from pws-pc ([82.6.96.116]) by aamtaout02-winn.ispmail.ntl.com with SMTP id <20071213204526.QGUV17393.aamtaout02-winn.ispmail.ntl.com@pws-pc> for ; Thu, 13 Dec 2007 20:45:26 +0000 Date: Thu, 13 Dec 2007 20:43:18 +0000 From: Peter Stephenson To: Zsh Hackers' List Subject: PATCH: internal parameter flags (resend) Message-Id: <20071213204318.2ff3e43c.p.w.stephenson@ntlworld.com> X-Mailer: Sylpheed 2.4.7 (GTK+ 2.12.1; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Scanned: ClamAV using ClamSMTP I sent this yesterday evening, but it seems to have disappeared and I didn't keep the original. Mail via smtp.ntlworld.com appears to be a bit flaky at the moment. Handling of internal parameter flags, by which I means ones defined with typeset rather than applied during the subsitution, is flaky. % typeset -i 16 -Z 6 val % val=0xa % print $val 16#00A % print $val[3,4] #0000A Everything is OK until the last output. (Zero padding with a radix is documented to fill with zeros at the right point.) The problem is that the subscript is applied before the flags. This seems plain wrong to me: the flags are an internal feature of the parameter, the subscript should be applied to what the parameter produces. Another example of where this goes funny is % typeset -u param=upper % UPPER=VALUE % print ${(P)param} prints nothing, even though $param outputs UPPER, because of the way flags are handled in the wrong place. I propose to move handling of flags inside the parameter code where it should be. I even made a note about this some time ago. I also noted "bet that's easier said than done", but it did seem to be straighforward. The documentation puts internal parameter flags into the order of substitution. Index: Doc/Zsh/expn.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v retrieving revision 1.83 diff -u -r1.83 expn.yo --- Doc/Zsh/expn.yo 30 Oct 2007 14:01:34 -0000 1.83 +++ Doc/Zsh/expn.yo 12 Dec 2007 22:35:11 -0000 @@ -1062,7 +1062,12 @@ substitution then applies the modifier tt(:h) and takes the directory part of the path.) ) -item(tt(2.) em(Parameter Subscripting))( +time(tt(2.) em(Internal Parameter Flags))( +Any parameter flags set by one of the tt(typeset) family of commands, +in particular the tt(L), tt(R), tt(Z), tt(u) and tt(l) flags for padding +and capitalization, are applied directly to the parameter value. +) +item(tt(3.) em(Parameter Subscripting))( If the value is a raw parameter reference with a subscript, such as tt(${)var(var)tt([3]}), the effect of subscripting is applied directly to the parameter. Subscripts are evaluated left to right; subsequent @@ -1072,11 +1077,11 @@ word (the second word of the range of words two through four of the original array). Any number of subscripts may appear. ) -item(tt(3.) em(Parameter Name Replacement))( +item(tt(4.) em(Parameter Name Replacement))( The effect of any tt((P)) flag, which treats the value so far as a parameter name and replaces it with the corresponding value, is applied. ) -item(tt(4.) em(Double-Quoted Joining))( +item(tt(5.) em(Double-Quoted Joining))( If the value after this process is an array, and the substitution appears in double quotes, and no tt((@)) flag is present at the current level, the words of the value are joined with the first character of the @@ -1084,7 +1089,7 @@ arrays are not modified). If the tt((j)) flag is present, that is used for joining instead of tt($IFS). ) -item(tt(5.) em(Nested Subscripting))( +item(tt(6.) em(Nested Subscripting))( Any remaining subscripts (i.e. of a nested substitution) are evaluated at this point, based on whether the value is an array or a scalar. As with tt(2.), multiple subscripts can appear. Note that tt(${foo[2,4][2]}) is @@ -1093,13 +1098,13 @@ both cases), but not to tt("${${foo[2,4]}[2]}") (the nested substitution returns a scalar because of the quotes). ) -item(tt(6.) em(Modifiers))( +item(tt(7.) em(Modifiers))( Any modifiers, as specified by a trailing `tt(#)', `tt(%)', `tt(/)' (possibly doubled) or by a set of modifiers of the form tt(:...) (see noderef(Modifiers) in noderef(History Expansion)), are applied to the words of the value at this level. ) -item(tt(7.) em(Forced Joining))( +item(tt(8.) em(Forced Joining))( If the `tt((j))' flag is present, or no `tt((j))' flag is present but the string is to be split as given by rules tt(8.) or tt(9.), and joining did not take place at step tt(4.), any words in the value are joined @@ -1107,36 +1112,36 @@ Note that the `tt((F))' flag implicitly supplies a string for joining in this manner. ) -item(tt(8.) em(Forced Splitting))( +item(tt(9.) em(Forced Splitting))( If one of the `tt((s))', `tt((f))' or `tt((z))' flags are present, or the `tt(=)' specifier was present (e.g. tt(${=)var(var)tt(})), the word is split on occurrences of the specified string, or (for tt(=) with neither of the two flags present) any of the characters in tt($IFS). ) -item(tt(9.) em(Shell Word Splitting))( +item(tt(10.) em(Shell Word Splitting))( If no `tt((s))', `tt((f))' or `tt(=)' was given, but the word is not quoted and the option tt(SH_WORD_SPLIT) is set, the word is split on occurrences of any of the characters in tt($IFS). Note this step, too, takes place at all levels of a nested substitution. ) -item(tt(10.) em(Uniqueness))( +item(tt(11.) em(Uniqueness))( If the result is an array and the `tt((u))' flag was present, duplicate elements are removed from the array. ) -item(tt(11.) em(Ordering))( +item(tt(12.) em(Ordering))( If the result is still an array and one of the `tt((o))' or `tt((O))' flags was present, the array is reordered. ) -item(tt(12.) em(Re-Evaluation))( +item(tt(13.) em(Re-Evaluation))( Any `tt((e))' flag is applied to the value, forcing it to be re-examined for new parameter substitutions, but also for command and arithmetic substitutions. ) -item(tt(13.) em(Padding))( +item(tt(14.) em(Padding))( Any padding of the value by the `tt(LPAR()l.)var(fill)tt(.RPAR())' or `tt(LPAR()r.)var(fill)tt(.RPAR())' flags is applied. ) -item(tt(14.) em(Semantic Joining))( +item(tt(15.) em(Semantic Joining))( In contexts where expansion semantics requires a single word to result, all words are rejoined with the first character of tt(IFS) between. So in `tt(${LPAR()P)tt(RPAR()${LPAR()f)tt(RPAR()lines}})' Index: Src/params.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/params.c,v retrieving revision 1.137 diff -u -r1.137 params.c --- Src/params.c 23 Nov 2007 02:32:58 -0000 1.137 +++ Src/params.c 12 Dec 2007 22:35:15 -0000 @@ -1884,11 +1884,134 @@ s = v->pm->gsu.s->getfn(v->pm); break; default: - s = NULL; + s = ""; DPUTS(1, "BUG: param node without valid type"); break; } + if (v->pm->node.flags & (PM_LEFT|PM_RIGHT_B|PM_RIGHT_Z)) { + int fwidth = v->pm->width ? v->pm->width : MB_METASTRLEN(s); + switch (v->pm->node.flags & (PM_LEFT | PM_RIGHT_B | PM_RIGHT_Z)) { + char *t, *tend; + unsigned int t0; + + case PM_LEFT: + case PM_LEFT | PM_RIGHT_Z: + t = s; + if (v->pm->node.flags & PM_RIGHT_Z) + while (*t == '0') + t++; + else + while (iblank(*t)) + t++; + MB_METACHARINIT(); + for (tend = t, t0 = 0; t0 < fwidth && *tend; t0++) + tend += MB_METACHARLEN(tend); + /* + * t0 is the number of characters from t used, + * hence (fwidth - t0) is the number of padding + * characters. fwidth is a misnomer: we use + * character counts, not character widths. + * + * (tend - t) is the number of bytes we need + * to get fwidth characters or the entire string; + * the characters may be multiple bytes. + */ + fwidth -= t0; /* padding chars remaining */ + t0 = tend - t; /* bytes to copy from string */ + s = (char *) hcalloc(t0 + fwidth + 1); + memcpy(s, t, t0); + if (fwidth) + memset(s + t0, ' ', fwidth); + s[t0 + fwidth] = '\0'; + break; + case PM_RIGHT_B: + case PM_RIGHT_Z: + case PM_RIGHT_Z | PM_RIGHT_B: + { + int zero = 1; + /* Calculate length in possibly multibyte chars */ + unsigned int charlen = MB_METASTRLEN(s); + + if (charlen < fwidth) { + char *valprefend = s; + int preflen; + if (v->pm->node.flags & PM_RIGHT_Z) { + /* + * This is a documented feature: when deciding + * whether to pad with zeroes, ignore + * leading blanks already in the value; + * only look for numbers after that. + * Not sure how useful this really is. + * It's certainly confusing to code around. + */ + for (t = s; iblank(*t); t++) + ; + /* + * Allow padding after initial minus + * for numeric variables. + */ + if ((v->pm->node.flags & + (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) && + *t == '-') + t++; + /* + * Allow padding after initial 0x or + * base# for integer variables. + */ + if (v->pm->node.flags & PM_INTEGER) { + if (isset(CBASES) && + t[0] == '0' && t[1] == 'x') + t += 2; + else if ((valprefend = strchr(t, '#'))) + t = valprefend + 1; + } + valprefend = t; + if (!*t) + zero = 0; + else if (v->pm->node.flags & + (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) { + /* zero always OK */ + } else if (!idigit(*t)) + zero = 0; + } + /* number of characters needed for padding */ + fwidth -= charlen; + /* bytes from original string */ + t0 = strlen(s); + t = (char *) hcalloc(fwidth + t0 + 1); + /* prefix guaranteed to be single byte chars */ + preflen = valprefend - s; + memset(t + preflen, + (((v->pm->node.flags & PM_RIGHT_B) + || !zero) ? ' ' : '0'), fwidth); + /* + * Copy - or 0x or base# before any padding + * zeroes. + */ + if (preflen) + memcpy(t, s, preflen); + memcpy(t + preflen + fwidth, + valprefend, t0 - preflen); + t[fwidth + t0] = '\0'; + s = t; + } else { + /* Need to skip (charlen - fwidth) chars */ + for (t0 = charlen - fwidth; t0; t0--) + s += MB_METACHARLEN(s); + } + } + break; + } + } + switch (v->pm->node.flags & (PM_LOWER | PM_UPPER)) { + case PM_LOWER: + s = casemodify(s, CASMOD_LOWER); + break; + case PM_UPPER: + s = casemodify(s, CASMOD_UPPER); + break; + } if (v->start == 0 && v->end == -1) return s; Index: Src/subst.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/subst.c,v retrieving revision 1.80 diff -u -r1.80 subst.c --- Src/subst.c 30 Oct 2007 14:01:34 -0000 1.80 +++ Src/subst.c 12 Dec 2007 22:35:16 -0000 @@ -1320,11 +1320,6 @@ /* Scalar and array value, see isarr above */ char *val = NULL, **aval = NULL; /* - * Padding based on setting in parameter rather than substitution - * flags. This is only used locally. - */ - unsigned int fwidth = 0; - /* * vbuf and v are both used to retrieve parameter values; this * is a kludge, we pass down vbuf and it may or may not return v. */ @@ -2061,143 +2056,12 @@ } if (!vunset) { /* - * There really is a value. Apply any necessary - * padding or case transformation. Note these - * are the per-parameter transformations specified - * with typeset, not the per-substitution ones set - * by flags. TODO: maybe therefore this would - * be more consistent if moved into getstrvalue()? - * Bet that's easier said than done. - * - * TODO: use string widths. In fact, shouldn't the - * strlen()s be ztrlen()s anyway? + * There really is a value. Padding and case + * transformations used to be handled here, but + * are now handled in getstrvalue() for greater + * consistency. */ val = getstrvalue(v); - fwidth = v->pm->width ? v->pm->width : (int)strlen(val); - switch (v->pm->node.flags & (PM_LEFT | PM_RIGHT_B | PM_RIGHT_Z)) { - char *t, *tend; - unsigned int t0; - - case PM_LEFT: - case PM_LEFT | PM_RIGHT_Z: - t = val; - if (v->pm->node.flags & PM_RIGHT_Z) - while (*t == '0') - t++; - else - while (iblank(*t)) - t++; - MB_METACHARINIT(); - for (tend = t, t0 = 0; t0 < fwidth && *tend; t0++) - tend += MB_METACHARLEN(tend); - /* - * t0 is the number of characters from t used, - * hence (fwidth - t0) is the number of padding - * characters. fwidth is a misnomer: we use - * character counts, not character widths. - * - * (tend - t) is the number of bytes we need - * to get fwidth characters or the entire string; - * the characters may be multiple bytes. - */ - fwidth -= t0; /* padding chars remaining */ - t0 = tend - t; /* bytes to copy from string */ - val = (char *) hcalloc(t0 + fwidth + 1); - memcpy(val, t, t0); - if (fwidth) - memset(val + t0, ' ', fwidth); - val[t0 + fwidth] = '\0'; - copied = 1; - break; - case PM_RIGHT_B: - case PM_RIGHT_Z: - case PM_RIGHT_Z | PM_RIGHT_B: - { - int zero = 1; - /* Calculate length in possibly multibyte chars */ - unsigned int charlen = MB_METASTRLEN(val); - - if (charlen < fwidth) { - char *valprefend = val; - int preflen; - if (v->pm->node.flags & PM_RIGHT_Z) { - /* - * This is a documented feature: when deciding - * whether to pad with zeroes, ignore - * leading blanks already in the value; - * only look for numbers after that. - * Not sure how useful this really is. - * It's certainly confusing to code around. - */ - for (t = val; iblank(*t); t++) - ; - /* - * Allow padding after initial minus - * for numeric variables. - */ - if ((v->pm->node.flags & - (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) && - *t == '-') - t++; - /* - * Allow padding after initial 0x or - * base# for integer variables. - */ - if (v->pm->node.flags & PM_INTEGER) { - if (isset(CBASES) && - t[0] == '0' && t[1] == 'x') - t += 2; - else if ((valprefend = strchr(t, '#'))) - t = valprefend + 1; - } - valprefend = t; - if (!*t) - zero = 0; - else if (v->pm->node.flags & - (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) { - /* zero always OK */ - } else if (!idigit(*t)) - zero = 0; - } - /* number of characters needed for padding */ - fwidth -= charlen; - /* bytes from original string */ - t0 = strlen(val); - t = (char *) hcalloc(fwidth + t0 + 1); - /* prefix guaranteed to be single byte chars */ - preflen = valprefend - val; - memset(t + preflen, - (((v->pm->node.flags & PM_RIGHT_B) - || !zero) ? ' ' : '0'), fwidth); - /* - * Copy - or 0x or base# before any padding - * zeroes. - */ - if (preflen) - memcpy(t, val, preflen); - memcpy(t + preflen + fwidth, - valprefend, t0 - preflen); - t[fwidth + t0] = '\0'; - val = t; - copied = 1; - } else { - /* Need to skip (charlen - fwidth) chars */ - for (t0 = charlen - fwidth; t0; t0--) - val += MB_METACHARLEN(val); - } - } - break; - } - switch (v->pm->node.flags & (PM_LOWER | PM_UPPER)) { - case PM_LOWER: - val = casemodify(val, CASMOD_LOWER); - copied = 1; - break; - case PM_UPPER: - val = casemodify(val, CASMOD_UPPER); - copied = 1; - break; - } } } /* Index: Test/B02typeset.ztst =================================================================== RCS file: /cvsroot/zsh/zsh/Test/B02typeset.ztst,v retrieving revision 1.16 diff -u -r1.16 B02typeset.ztst --- Test/B02typeset.ztst 31 Jul 2007 14:24:26 -0000 1.16 +++ Test/B02typeset.ztst 12 Dec 2007 22:35:16 -0000 @@ -18,7 +18,6 @@ # Function tracing (typeset -ft) E02xtrace # Not yet tested: -# Case conversion (-l, -u) # Assorted illegal flag combinations %prep @@ -339,6 +338,28 @@ >'0x0000002B' >'-0x000002B' + setopt cbases + integer -Z 10 -i 16 foozi16c + for foozi16c in 0x1234 -0x1234; do + for (( i = 1; i <= 5; i++ )); do + print "'${foozi16c[i,11-i]}'" + done + print "'${foozi16c[-2]}'" + done +0:Extracting substrings from padded integers +>'0x00001234' +>'x0000123' +>'000012' +>'0001' +>'00' +>'3' +>'-0x0001234' +>'0x000123' +>'x00012' +>'0001' +>'00' +>'3' + typeset -F 3 -Z 10 foozf for foozf in 3.14159 -3.14159 4 -4; do print "'$foozf'" @@ -405,3 +426,21 @@ >FOOENV=BAR >Exec >Unset + + local case1=upper + typeset -u case1 + print $case1 + UPPER="VALUE OF \$UPPER" + print ${(P)case1} +0:Upper case conversion +>UPPER +>VALUE OF $UPPER + + local case2=LOWER + typeset -l case2 + print $case2 + lower="value of \$lower" + print ${(P)case2} +0:Lower case conversion +>lower +>value of $lower