PATCH: internal parameter flags (resend)

zsh-workers
 help / color / mirror / code / Atom feed

* PATCH: internal parameter flags (resend)
@ 2007-12-13 20:43 Peter Stephenson
  2007-12-14  4:46 ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2007-12-13 20:43 UTC (permalink / raw)
  To: Zsh Hackers' List

I sent this yesterday evening, but it seems to have disappeared and I
didn't keep the original.  Mail via smtp.ntlworld.com appears to be a
bit flaky at the moment.

Handling of internal parameter flags, by which I means ones defined with
typeset rather than applied during the subsitution, is flaky.

% typeset -i 16 -Z 6 val
% val=0xa
% print $val
16#00A
% print $val[3,4]
#0000A

Everything is OK until the last output.  (Zero padding with a radix is
documented to fill with zeros at the right point.)

The problem is that the subscript is applied before the flags.  This
seems plain wrong to me:  the flags are an internal feature of the
parameter, the subscript should be applied to what the parameter
produces.

Another example of where this goes funny is

% typeset -u param=upper
% UPPER=VALUE
% print ${(P)param}

prints nothing, even though $param outputs UPPER, because of the way
flags are handled in the wrong place.

I propose to move handling of flags inside the parameter code where it
should be.  I even made a note about this some time ago.  I also noted
"bet that's easier said than done", but it did seem to be
straighforward.

The documentation puts internal parameter flags into the order of
substitution.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.83
diff -u -r1.83 expn.yo
--- Doc/Zsh/expn.yo	30 Oct 2007 14:01:34 -0000	1.83
+++ Doc/Zsh/expn.yo	12 Dec 2007 22:35:11 -0000
@@ -1062,7 +1062,12 @@
 substitution then applies the modifier tt(:h) and takes the directory part
 of the path.)
 )
-item(tt(2.) em(Parameter Subscripting))(
+time(tt(2.) em(Internal Parameter Flags))(
+Any parameter flags set by one of the tt(typeset) family of commands,
+in particular the tt(L), tt(R), tt(Z), tt(u) and tt(l) flags for padding
+and capitalization, are applied directly to the parameter value.
+)
+item(tt(3.) em(Parameter Subscripting))(
 If the value is a raw parameter reference with a subscript, such as
 tt(${)var(var)tt([3]}), the effect of subscripting is applied directly to
 the parameter.  Subscripts are evaluated left to right; subsequent
@@ -1072,11 +1077,11 @@
 word (the second word of the range of words two through four of the
 original array).  Any number of subscripts may appear.
 )
-item(tt(3.) em(Parameter Name Replacement))(
+item(tt(4.) em(Parameter Name Replacement))(
 The effect of any tt((P)) flag, which treats the value so far as a
 parameter name and replaces it with the corresponding value, is applied.
 )
-item(tt(4.) em(Double-Quoted Joining))(
+item(tt(5.) em(Double-Quoted Joining))(
 If the value after this process is an array, and the substitution
 appears in double quotes, and no tt((@)) flag is present at the current
 level, the words of the value are joined with the first character of the
@@ -1084,7 +1089,7 @@
 arrays are not modified).  If the tt((j)) flag is present, that is used for
 joining instead of tt($IFS).
 )
-item(tt(5.) em(Nested Subscripting))(
+item(tt(6.) em(Nested Subscripting))(
 Any remaining subscripts (i.e. of a nested substitution) are evaluated at
 this point, based on whether the value is an array or a scalar.  As with
 tt(2.), multiple subscripts can appear.  Note that tt(${foo[2,4][2]}) is
@@ -1093,13 +1098,13 @@
 both cases), but not to tt("${${foo[2,4]}[2]}") (the nested substitution
 returns a scalar because of the quotes).
 )
-item(tt(6.) em(Modifiers))(
+item(tt(7.) em(Modifiers))(
 Any modifiers, as specified by a trailing `tt(#)', `tt(%)', `tt(/)'
 (possibly doubled) or by a set of modifiers of the form tt(:...) (see
 noderef(Modifiers) in noderef(History Expansion)), are applied to the words
 of the value at this level.
 )
-item(tt(7.) em(Forced Joining))(
+item(tt(8.) em(Forced Joining))(
 If the `tt((j))' flag is present, or no `tt((j))' flag is present but
 the string is to be split as given by rules tt(8.) or tt(9.), and joining
 did not take place at step tt(4.), any words in the value are joined
@@ -1107,36 +1112,36 @@
 Note that the `tt((F))' flag implicitly supplies a string for joining in this
 manner.
 )
-item(tt(8.) em(Forced Splitting))(
+item(tt(9.) em(Forced Splitting))(
 If one of the `tt((s))', `tt((f))' or `tt((z))' flags are present, or the `tt(=)'
 specifier was present (e.g. tt(${=)var(var)tt(})), the word is split on
 occurrences of the specified string, or (for tt(=) with neither of the two
 flags present) any of the characters in tt($IFS).
 )
-item(tt(9.) em(Shell Word Splitting))(
+item(tt(10.) em(Shell Word Splitting))(
 If no `tt((s))', `tt((f))' or `tt(=)' was given, but the word is not
 quoted and the option tt(SH_WORD_SPLIT) is set, the word is split on
 occurrences of any of the characters in tt($IFS).  Note this step, too,
 takes place at all levels of a nested substitution.
 )
-item(tt(10.) em(Uniqueness))(
+item(tt(11.) em(Uniqueness))(
 If the result is an array and the `tt((u))' flag was present, duplicate
 elements are removed from the array.
 )
-item(tt(11.) em(Ordering))(
+item(tt(12.) em(Ordering))(
 If the result is still an array and one of the `tt((o))' or `tt((O))' flags
 was present, the array is reordered.
 )
-item(tt(12.) em(Re-Evaluation))(
+item(tt(13.) em(Re-Evaluation))(
 Any `tt((e))' flag is applied to the value, forcing it to be re-examined
 for new parameter substitutions, but also for command and arithmetic
 substitutions.
 )
-item(tt(13.) em(Padding))(
+item(tt(14.) em(Padding))(
 Any padding of the value by the `tt(LPAR()l.)var(fill)tt(.RPAR())' or
 `tt(LPAR()r.)var(fill)tt(.RPAR())' flags is applied.
 )
-item(tt(14.) em(Semantic Joining))(
+item(tt(15.) em(Semantic Joining))(
 In contexts where expansion semantics requires a single word to
 result, all words are rejoined with the first character of tt(IFS)
 between.  So in `tt(${LPAR()P)tt(RPAR()${LPAR()f)tt(RPAR()lines}})'
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.137
diff -u -r1.137 params.c
--- Src/params.c	23 Nov 2007 02:32:58 -0000	1.137
+++ Src/params.c	12 Dec 2007 22:35:15 -0000
@@ -1884,11 +1884,134 @@
 	s = v->pm->gsu.s->getfn(v->pm);
 	break;
     default:
-	s = NULL;
+	s = "";
 	DPUTS(1, "BUG: param node without valid type");
 	break;
     }
 
+    if (v->pm->node.flags & (PM_LEFT|PM_RIGHT_B|PM_RIGHT_Z)) {
+	int fwidth = v->pm->width ? v->pm->width : MB_METASTRLEN(s);
+	switch (v->pm->node.flags & (PM_LEFT | PM_RIGHT_B | PM_RIGHT_Z)) {
+	    char *t, *tend;
+	    unsigned int t0;
+
+	case PM_LEFT:
+	case PM_LEFT | PM_RIGHT_Z:
+	    t = s;
+	    if (v->pm->node.flags & PM_RIGHT_Z)
+		while (*t == '0')
+		    t++;
+	    else
+		while (iblank(*t))
+		    t++;
+	    MB_METACHARINIT();
+	    for (tend = t, t0 = 0; t0 < fwidth && *tend; t0++)
+		tend += MB_METACHARLEN(tend);
+	    /*
+	     * t0 is the number of characters from t used,
+	     * hence (fwidth - t0) is the number of padding
+	     * characters.  fwidth is a misnomer: we use
+	     * character counts, not character widths.
+	     *
+	     * (tend - t) is the number of bytes we need
+	     * to get fwidth characters or the entire string;
+	     * the characters may be multiple bytes.
+	     */
+	    fwidth -= t0; /* padding chars remaining */
+	    t0 = tend - t; /* bytes to copy from string */
+	    s = (char *) hcalloc(t0 + fwidth + 1);
+	    memcpy(s, t, t0);
+	    if (fwidth)
+		memset(s + t0, ' ', fwidth);
+	    s[t0 + fwidth] = '\0';
+	    break;
+	case PM_RIGHT_B:
+	case PM_RIGHT_Z:
+	case PM_RIGHT_Z | PM_RIGHT_B:
+	    {
+		int zero = 1;
+		/* Calculate length in possibly multibyte chars */
+		unsigned int charlen = MB_METASTRLEN(s);
+
+		if (charlen < fwidth) {
+		    char *valprefend = s;
+		    int preflen;
+		    if (v->pm->node.flags & PM_RIGHT_Z) {
+			/*
+			 * This is a documented feature: when deciding
+			 * whether to pad with zeroes, ignore
+			 * leading blanks already in the value;
+			 * only look for numbers after that.
+			 * Not sure how useful this really is.
+			 * It's certainly confusing to code around.
+			 */
+			for (t = s; iblank(*t); t++)
+			    ;
+			/*
+			 * Allow padding after initial minus
+			 * for numeric variables.
+			 */
+			if ((v->pm->node.flags &
+			     (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) &&
+			    *t == '-')
+			    t++;
+			/*
+			 * Allow padding after initial 0x or
+			 * base# for integer variables.
+			 */
+			if (v->pm->node.flags & PM_INTEGER) {
+			    if (isset(CBASES) &&
+				t[0] == '0' && t[1] == 'x')
+				t += 2;
+			    else if ((valprefend = strchr(t, '#')))
+				t = valprefend + 1;
+			}
+			valprefend = t;
+			if (!*t)
+			    zero = 0;
+			else if (v->pm->node.flags &
+				 (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) {
+			    /* zero always OK */
+			} else if (!idigit(*t))
+			    zero = 0;
+		    }
+		    /* number of characters needed for padding */
+		    fwidth -= charlen;
+		    /* bytes from original string */
+		    t0 = strlen(s);
+		    t = (char *) hcalloc(fwidth + t0 + 1);
+		    /* prefix guaranteed to be single byte chars */
+		    preflen = valprefend - s;
+		    memset(t + preflen, 
+			   (((v->pm->node.flags & PM_RIGHT_B)
+			     || !zero) ?       ' ' : '0'), fwidth);
+		    /*
+		     * Copy - or 0x or base# before any padding
+		     * zeroes.
+		     */
+		    if (preflen)
+			memcpy(t, s, preflen);
+		    memcpy(t + preflen + fwidth,
+			   valprefend, t0 - preflen);
+		    t[fwidth + t0] = '\0';
+		    s = t;
+		} else {
+		    /* Need to skip (charlen - fwidth) chars */
+		    for (t0 = charlen - fwidth; t0; t0--)
+			s += MB_METACHARLEN(s);
+		}
+	    }
+	    break;
+	}
+    }
+    switch (v->pm->node.flags & (PM_LOWER | PM_UPPER)) {
+    case PM_LOWER:
+	s = casemodify(s, CASMOD_LOWER);
+	break;
+    case PM_UPPER:
+	s = casemodify(s, CASMOD_UPPER);
+	break;
+    }
     if (v->start == 0 && v->end == -1)
 	return s;
 
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.80
diff -u -r1.80 subst.c
--- Src/subst.c	30 Oct 2007 14:01:34 -0000	1.80
+++ Src/subst.c	12 Dec 2007 22:35:16 -0000
@@ -1320,11 +1320,6 @@
     /* Scalar and array value, see isarr above */
     char *val = NULL, **aval = NULL;
     /*
-     * Padding based on setting in parameter rather than substitution
-     * flags.  This is only used locally.
-     */
-    unsigned int fwidth = 0;
-    /*
      * vbuf and v are both used to retrieve parameter values; this
      * is a kludge, we pass down vbuf and it may or may not return v.
      */
@@ -2061,143 +2056,12 @@
 	    }
 	    if (!vunset) {
 		/*
-		 * There really is a value.  Apply any necessary
-		 * padding or case transformation.  Note these
-		 * are the per-parameter transformations specified
-		 * with typeset, not the per-substitution ones set
-		 * by flags.  TODO: maybe therefore this would
-		 * be more consistent if moved into getstrvalue()?
-		 * Bet that's easier said than done.
-		 *
-		 * TODO: use string widths.  In fact, shouldn't the
-		 * strlen()s be ztrlen()s anyway?
+		 * There really is a value.  Padding and case
+		 * transformations used to be handled here, but
+		 * are now handled in getstrvalue() for greater
+		 * consistency.
 		 */
 		val = getstrvalue(v);
-		fwidth = v->pm->width ? v->pm->width : (int)strlen(val);
-		switch (v->pm->node.flags & (PM_LEFT | PM_RIGHT_B | PM_RIGHT_Z)) {
-		    char *t, *tend;
-		    unsigned int t0;
-
-		case PM_LEFT:
-		case PM_LEFT | PM_RIGHT_Z:
-		    t = val;
-		    if (v->pm->node.flags & PM_RIGHT_Z)
-			while (*t == '0')
-			    t++;
-		    else
-			while (iblank(*t))
-			    t++;
-		    MB_METACHARINIT();
-		    for (tend = t, t0 = 0; t0 < fwidth && *tend; t0++)
-			tend += MB_METACHARLEN(tend);
-		    /*
-		     * t0 is the number of characters from t used,
-		     * hence (fwidth - t0) is the number of padding
-		     * characters.  fwidth is a misnomer: we use
-		     * character counts, not character widths.
-		     *
-		     * (tend - t) is the number of bytes we need
-		     * to get fwidth characters or the entire string;
-		     * the characters may be multiple bytes.
-		     */
-		    fwidth -= t0; /* padding chars remaining */
-		    t0 = tend - t; /* bytes to copy from string */
-		    val = (char *) hcalloc(t0 + fwidth + 1);
-		    memcpy(val, t, t0);
-		    if (fwidth)
-			memset(val + t0, ' ', fwidth);
-		    val[t0 + fwidth] = '\0';
-		    copied = 1;
-		    break;
-		case PM_RIGHT_B:
-		case PM_RIGHT_Z:
-		case PM_RIGHT_Z | PM_RIGHT_B:
-		    {
-			int zero = 1;
-			/* Calculate length in possibly multibyte chars */
-			unsigned int charlen = MB_METASTRLEN(val);
-
-			if (charlen < fwidth) {
-			    char *valprefend = val;
-			    int preflen;
-			    if (v->pm->node.flags & PM_RIGHT_Z) {
-				/*
-				 * This is a documented feature: when deciding
-				 * whether to pad with zeroes, ignore
-				 * leading blanks already in the value;
-				 * only look for numbers after that.
-				 * Not sure how useful this really is.
-				 * It's certainly confusing to code around.
-				 */
-				for (t = val; iblank(*t); t++)
-				    ;
-				/*
-				 * Allow padding after initial minus
-				 * for numeric variables.
-				 */
-				if ((v->pm->node.flags &
-				     (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) &&
-				    *t == '-')
-				    t++;
-				/*
-				 * Allow padding after initial 0x or
-				 * base# for integer variables.
-				 */
-				if (v->pm->node.flags & PM_INTEGER) {
-				    if (isset(CBASES) &&
-					t[0] == '0' && t[1] == 'x')
-					t += 2;
-				    else if ((valprefend = strchr(t, '#')))
-					t = valprefend + 1;
-				}
-				valprefend = t;
-				if (!*t)
-				    zero = 0;
-				else if (v->pm->node.flags &
-					 (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) {
-				    /* zero always OK */
-				} else if (!idigit(*t))
-				    zero = 0;
-			    }
-			    /* number of characters needed for padding */
-			    fwidth -= charlen;
-			    /* bytes from original string */
-			    t0 = strlen(val);
-			    t = (char *) hcalloc(fwidth + t0 + 1);
-			    /* prefix guaranteed to be single byte chars */
-			    preflen = valprefend - val;
-			    memset(t + preflen, 
-				   (((v->pm->node.flags & PM_RIGHT_B)
-				     || !zero) ?       ' ' : '0'), fwidth);
-			    /*
-			     * Copy - or 0x or base# before any padding
-			     * zeroes.
-			     */
-			    if (preflen)
-				memcpy(t, val, preflen);
-			    memcpy(t + preflen + fwidth,
-				   valprefend, t0 - preflen);
-			    t[fwidth + t0] = '\0';
-			    val = t;
-			    copied = 1;
-			} else {
-			    /* Need to skip (charlen - fwidth) chars */
-			    for (t0 = charlen - fwidth; t0; t0--)
-				val += MB_METACHARLEN(val);
-			}
-		    }
-		    break;
-		}
-		switch (v->pm->node.flags & (PM_LOWER | PM_UPPER)) {
-		case PM_LOWER:
-		    val = casemodify(val, CASMOD_LOWER);
-		    copied = 1;
-		    break;
-		case PM_UPPER:
-		    val = casemodify(val, CASMOD_UPPER);
-		    copied = 1;
-		    break;
-		}
 	    }
 	}
 	/*
Index: Test/B02typeset.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/B02typeset.ztst,v
retrieving revision 1.16
diff -u -r1.16 B02typeset.ztst
--- Test/B02typeset.ztst	31 Jul 2007 14:24:26 -0000	1.16
+++ Test/B02typeset.ztst	12 Dec 2007 22:35:16 -0000
@@ -18,7 +18,6 @@
 #  Function tracing (typeset -ft)		E02xtrace
 
 # Not yet tested:
-#  Case conversion (-l, -u)
 #  Assorted illegal flag combinations
 
 %prep
@@ -339,6 +338,28 @@
 >'0x0000002B'
 >'-0x000002B'
 
+ setopt cbases
+ integer -Z 10 -i 16 foozi16c
+ for foozi16c in 0x1234 -0x1234; do
+   for (( i = 1; i <= 5; i++ )); do
+       print "'${foozi16c[i,11-i]}'"
+   done
+   print "'${foozi16c[-2]}'"
+ done
+0:Extracting substrings from padded integers
+>'0x00001234'
+>'x0000123'
+>'000012'
+>'0001'
+>'00'
+>'3'
+>'-0x0001234'
+>'0x000123'
+>'x00012'
+>'0001'
+>'00'
+>'3'
+
  typeset -F 3 -Z 10 foozf
  for foozf in 3.14159 -3.14159 4 -4; do
    print "'$foozf'"
@@ -405,3 +426,21 @@
 >FOOENV=BAR
 >Exec
 >Unset
+
+ local case1=upper
+ typeset -u case1
+ print $case1
+ UPPER="VALUE OF \$UPPER"
+ print ${(P)case1}
+0:Upper case conversion
+>UPPER
+>VALUE OF $UPPER
+
+ local case2=LOWER
+ typeset -l case2
+ print $case2
+ lower="value of \$lower"
+ print ${(P)case2}
+0:Lower case conversion
+>lower
+>value of $lower


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: internal parameter flags (resend)
  2007-12-13 20:43 PATCH: internal parameter flags (resend) Peter Stephenson
@ 2007-12-14  4:46 ` Bart Schaefer
  2007-12-14 10:14   ` Peter Stephenson
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2007-12-14  4:46 UTC (permalink / raw)
  To: Zsh Hackers' List

On Dec 13,  8:43pm, Peter Stephenson wrote:
}
} Handling of internal parameter flags, by which I means ones defined with
} typeset rather than applied during the subsitution, is flaky.

Hmm.

I seem to recall some discussion of this in the past.  If I recall
correctly, the upshot was that parameter flags affect the "display"
of a value when it finally comes to a point that's the functional
equivalent of inserting the value into a command line argument, and
that it specifically was not the intention that flags affect the
*actual* value stored in the parameter.

The context of the discussion as I remember it was whether the flags
should be applied immediately when assigning to the parameter (or
when changing the flags of an existing parameter with typeset), vs.
being applied when the value is retrieved, and the resolution was in
favor of the latter.  This means, for example, that you can change
the -R -L and -Z flags without affecting the underlying value.

(However, not all flags are created equal in this respect.  The -U
flag of arrays, for instance, applies at assignment time.)

Put another way, the parameter flags are substitution flags, not
value flags, and apply only when a parameter is substituted, not
every time its value is accessed.

Note particulars of this effect:

schaefer<506> typeset -Z 5 -T ARY ary
schaefer<507> ARY=5
schaefer<508> echo $ARY
00005
schaefer<509> echo $ary
5
schaefer<510> ary=(1 2)
schaefer<511> echo $ARY
001:2

So far this still works with the patch, but here's the tricky bit:

schaefer<512> echo $ARY[3]
00002

With PWS's patch applied, $ARY[3] becomes "1" in this example, which
I believe is wrong.  (I don't really think 00002 is correct either;
in fact I'd have said it should just be "2".  Why should the padding
flags of the entire string apply to a slice of it?)

This is distinct from ${${ARY}[3]}, which always was "1".

} % typeset -u param=upper
} % UPPER=VALUE
} % print ${(P)param}
} 
} prints nothing, even though $param outputs UPPER, because of the way
} flags are handled in the wrong place.

Again I think this was correct as it was.  The *value* of param is
"upper", not "UPPER", regardless of the flags; in this case
${(P)param} and ${(P)${param}} should be distinct, because the first
uses the value of $param as a name whereas the second treats the
substitution of $param as a name.  And the pre-patch zsh does just
that:

torch% typeset -u param=upper
torch% UPPER=VALUE
torch% print ${(P)param}

torch% print ${(P)${param}}
VALUE

In summary I think this change actually removes useful and intended
functionality and should be revised or backed out.

-- 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: internal parameter flags (resend)
  2007-12-14  4:46 ` Bart Schaefer
@ 2007-12-14 10:14   ` Peter Stephenson
  2007-12-15  5:56     ` Bart Schaefer
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Stephenson @ 2007-12-14 10:14 UTC (permalink / raw)
  To: Zsh Hackers' List

Bart Schaefer wrote:
> I seem to recall some discussion of this in the past.  If I recall
> correctly, the upshot was that parameter flags affect the "display"
> of a value when it finally comes to a point that's the functional
> equivalent of inserting the value into a command line argument, and
> that it specifically was not the intention that flags affect the
> *actual* value stored in the parameter.
> 
> The context of the discussion as I remember it was whether the flags
> should be applied immediately when assigning to the parameter (or
> when changing the flags of an existing parameter with typeset), vs.
> being applied when the value is retrieved, and the resolution was in
> favor of the latter.  This means, for example, that you can change
> the -R -L and -Z flags without affecting the underlying value.

I vaguely remember the discussion you refer to, but it's a different
issue.  It is still the case that the flags can be changed without
affecting the value, only when it's retrieved.  You're trying to apply
an additional rationale to what constitutes the value being retrieved
that I don't think holds water (and I don't think was ever discussed).

> Note particulars of this effect:
> 
> schaefer<506> typeset -Z 5 -T ARY ary
> schaefer<507> ARY=5
> schaefer<508> echo $ARY
> 00005
> schaefer<509> echo $ary
> 5
> schaefer<510> ary=(1 2)
> schaefer<511> echo $ARY
> 001:2
> 
> So far this still works with the patch, but here's the tricky bit:
> 
> schaefer<512> echo $ARY[3]
> 00002
> 
> With PWS's patch applied, $ARY[3] becomes "1" in this example, which
> I believe is wrong.  (I don't really think 00002 is correct either;
> in fact I'd have said it should just be "2".  Why should the padding
> flags of the entire string apply to a slice of it?)

I agree that the old value is wrong, but I don't see why the new value
is.  You're now trying to add yet another rule that the padding flags
are sometimes ignored.  I chanced on the problem because I was
specifically expecting the new behaviour (in a simpler version, but with
the new order of padding and subscripting).

It is possible to compromise here, I think, by simply swapping the order
of padding and subscripting in the substitution code (and actually I
think the new description of the substitution code in the manual would
still apply), but I haven't understood why that should be.

> This is distinct from ${${ARY}[3]}, which always was "1".
> 
> } % typeset -u param=upper
> } % UPPER=VALUE
> } % print ${(P)param}
> } 
> } prints nothing, even though $param outputs UPPER, because of the way
> } flags are handled in the wrong place.
> 
> Again I think this was correct as it was.

I really can't see how.  I think it just creates a confusing distinction
on how parameters are accessed.  Why would you set the -u flag for a
parameter and then expect it not to be applied?  What's the point in having
both flags on substitution and on parameters themselves if the latter
only apply at the same time?

> "upper", not "UPPER", regardless of the flags; in this case
> ${(P)param} and ${(P)${param}} should be distinct, because the first
> uses the value of $param as a name whereas the second treats the
> substitution of $param as a name.

To me that's just plain confusing.

> In summary I think this change actually removes useful and intended
> functionality and should be revised or backed out.

I haven't seen any useful functionality in there, but certainly if there
are practical examples of things that should work the old way I'd be
glad to hear them.  It's possible there actually are cases where you
need a back door to what you originally set without altering the flags,
but I can't actually think of one; all the examples I can think of would
suggest you want consistency.  In other words, I'm not saying there's
definitely no rationale for the old way, only that I haven't seen it yet
and it surprised me when I discovered how it worked.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: internal parameter flags (resend)
  2007-12-14 10:14   ` Peter Stephenson
@ 2007-12-15  5:56     ` Bart Schaefer
  2007-12-16 13:52       ` Peter Stephenson
  0 siblings, 1 reply; 5+ messages in thread
From: Bart Schaefer @ 2007-12-15  5:56 UTC (permalink / raw)
  To: Zsh Hackers' List

On Dec 14, 10:14am, Peter Stephenson wrote:
} Subject: Re: PATCH: internal parameter flags (resend)
}
} Bart Schaefer wrote:
} > schaefer<506> typeset -Z 5 -T ARY ary
} > schaefer<510> ary=(1 2)
} > schaefer<512> echo $ARY[3]
} > 00002
} > 
} > With PWS's patch applied, $ARY[3] becomes "1" in this example, which
} > I believe is wrong.  (I don't really think 00002 is correct either;
} > in fact I'd have said it should just be "2".  Why should the padding
} > flags of the entire string apply to a slice of it?)
} 
} I agree that the old value is wrong, but I don't see why the new value
} is.  You're now trying to add yet another rule that the padding flags
} are sometimes ignored.

Sorry; no, I'm not; that was just a remark in passing, it has nothing
really to do with the point I was making.  Please ignore it for this
thread.

} > This is distinct from ${${ARY}[3]}, which always was "1".
} > 
} > } % typeset -u param=upper
} > } % UPPER=VALUE
} > } % print ${(P)param}
} > } 
} > } prints nothing, even though $param outputs UPPER, because of the way
} > } flags are handled in the wrong place.
} > 
} > Again I think this was correct as it was.
} 
} I really can't see how. I think it just creates a confusing
} distinction on how parameters are accessed. Why would you set the -u
} flag for a parameter and then expect it not to be applied?  What's
} the point in having both flags on substitution and on parameters
} themselves if the latter only apply at the same time?

I would set it to save me from having to remember to do ${(U)param}
every time I mention $param, regardless of what order the flags are
applied ... but mentioning $param and mentioning ${(P)param} aren't
the same thing.

} > ${(P)param} and ${(P)${param}} should be distinct, because the first
} > uses the value of $param as a name whereas the second treats the
} > substitution of $param as a name.
} 
} To me that's just plain confusing.

Let's consider a parallel situation:  Parameter references in math
expressions.  If I write:

    setopt octalzeroes
    integer -Z5 x=9

Then $(( x )) has the value 9 but $(( $x )) is "bad math expression".
Why wasn't the parameter flag applied to x in the first case?  Do you
think I should have expected it to be?  What if x was not an integer?

So, although I still think that ${(P)param} should use the un-altered
value [*], I'm willing to let the other part of this go because of the
following bad inconsistency with the "old way":

torch% typeset -Z5 x=6
torch% print $#x
5
torch% print $x[4]

torch% 

Either $#x should report the "real" length, or $x[4] should index into
the string whose length was counted.  I suspect that changing $#x in
this case would break a lot more things than changing the subscript.

[*] In part because they're the closest thing we have to namerefs, and
namerefs would not allow these kinds of transformations on the value.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: PATCH: internal parameter flags (resend)
  2007-12-15  5:56     ` Bart Schaefer
@ 2007-12-16 13:52       ` Peter Stephenson
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Stephenson @ 2007-12-16 13:52 UTC (permalink / raw)
  To: Zsh Hackers' List

On Fri, 14 Dec 2007 21:56:19 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> So, although I still think that ${(P)param} should use the un-altered
> value

In any case, I've thought of a killer argument why at least one point
where we fetch a value internally must retrieve the unaltered value,
which is vared.  There's no argument in that case.  As my whole idea was
to avoid complicated rules about what happens where, applying the old
behaviour to (P) etc. as well is about the only neat way out.

> I'm willing to let the other part of this go because of the
> following bad inconsistency with the "old way":
> 
> torch% typeset -Z5 x=6
> torch% print $#x
> 5
> torch% print $x[4]
> 
> torch% 
> 
> Either $#x should report the "real" length, or $x[4] should index into
> the string whose length was counted.  I suspect that changing $#x in
> this case would break a lot more things than changing the subscript.

I'm much more convinced about this part than the other part.

What I've done is keep the ordering of flag application and subscripting
from the new patch, but added an extra value flag to indicate a call
from the substitution code at the point where the flags used to be
applied.  So this restricts the effect to something that looks
uncontroversial.

I've also (for the first time that I can see) tried to document the
point at which the flags are applied.  The simplest way of doing this I
could see was to say that generally they are only applied with $param
expansions, but explicitly to say they're not applied with the
substitution (P) flag.


Ismail Dönmez <ismail@pardus.org.tr> wrote>
>Indeed only one warning left:
>
>expn.yo:1065: No macro: time(...)

This was a typo in the original patch.


Index: Doc/Zsh/builtins.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/builtins.yo,v
retrieving revision 1.101
diff -u -r1.101 builtins.yo
--- Doc/Zsh/builtins.yo	12 Dec 2007 18:43:29 -0000	1.101
+++ Doc/Zsh/builtins.yo	16 Dec 2007 13:42:30 -0000
@@ -1429,6 +1429,11 @@
 flags, and all those flags are introduced with tt(PLUS()), the matching
 parameter names are printed but their values are not.
 
+Attribute flags that transform the final value (tt(-L), tt(-R), tt(-Z),
+tt(-l), tt(u)) are only applied to the expanded value at the point
+of a `tt($)' expansion.  They are not applied when a parameter is
+retrieved internally by the shell for any purpose.
+
 The following attribute flags may be specified:
 
 startitem()
Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.85
diff -u -r1.85 expn.yo
--- Doc/Zsh/expn.yo	13 Dec 2007 21:57:18 -0000	1.85
+++ Doc/Zsh/expn.yo	16 Dec 2007 13:42:32 -0000
@@ -785,8 +785,12 @@
 )
 item(tt(P))(
 This forces the value of the parameter var(name) to be interpreted as a
-further parameter name, whose value will be used where appropriate. If
-used with a nested parameter or command substitution, the result of that
+further parameter name, whose value will be used where appropriate.
+Note that flags set with one of the tt(typeset) family of commands
+(in particular case transformations) are not applied to the value of
+var(name) used in htis fashion.
+
+If used with a nested parameter or command substitution, the result of that
 will be taken as a parameter name in the same way.  For example, if you
 have `tt(foo=bar)' and `tt(bar=baz)', the strings tt(${(P)foo}),
 tt(${(P)${foo}}), and tt(${(P)$(echo bar)}) will be expanded to `tt(baz)'.
@@ -1062,7 +1066,7 @@
 substitution then applies the modifier tt(:h) and takes the directory part
 of the path.)
 )
-time(tt(2.) em(Internal Parameter Flags))(
+item(tt(2.) em(Internal Parameter Flags))(
 Any parameter flags set by one of the tt(typeset) family of commands,
 in particular the tt(L), tt(R), tt(Z), tt(u) and tt(l) flags for padding
 and capitalization, are applied directly to the parameter value.
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.138
diff -u -r1.138 params.c
--- Src/params.c	13 Dec 2007 20:52:56 -0000	1.138
+++ Src/params.c	16 Dec 2007 13:42:35 -0000
@@ -1889,129 +1889,131 @@
 	break;
     }
 
-    if (v->pm->node.flags & (PM_LEFT|PM_RIGHT_B|PM_RIGHT_Z)) {
-	int fwidth = v->pm->width ? v->pm->width : MB_METASTRLEN(s);
-	switch (v->pm->node.flags & (PM_LEFT | PM_RIGHT_B | PM_RIGHT_Z)) {
-	    char *t, *tend;
-	    unsigned int t0;
-
-	case PM_LEFT:
-	case PM_LEFT | PM_RIGHT_Z:
-	    t = s;
-	    if (v->pm->node.flags & PM_RIGHT_Z)
-		while (*t == '0')
-		    t++;
-	    else
-		while (iblank(*t))
-		    t++;
-	    MB_METACHARINIT();
-	    for (tend = t, t0 = 0; t0 < fwidth && *tend; t0++)
-		tend += MB_METACHARLEN(tend);
-	    /*
-	     * t0 is the number of characters from t used,
-	     * hence (fwidth - t0) is the number of padding
-	     * characters.  fwidth is a misnomer: we use
-	     * character counts, not character widths.
-	     *
-	     * (tend - t) is the number of bytes we need
-	     * to get fwidth characters or the entire string;
-	     * the characters may be multiple bytes.
-	     */
-	    fwidth -= t0; /* padding chars remaining */
-	    t0 = tend - t; /* bytes to copy from string */
-	    s = (char *) hcalloc(t0 + fwidth + 1);
-	    memcpy(s, t, t0);
-	    if (fwidth)
-		memset(s + t0, ' ', fwidth);
-	    s[t0 + fwidth] = '\0';
-	    break;
-	case PM_RIGHT_B:
-	case PM_RIGHT_Z:
-	case PM_RIGHT_Z | PM_RIGHT_B:
-	    {
-		int zero = 1;
-		/* Calculate length in possibly multibyte chars */
-		unsigned int charlen = MB_METASTRLEN(s);
-
-		if (charlen < fwidth) {
-		    char *valprefend = s;
-		    int preflen;
-		    if (v->pm->node.flags & PM_RIGHT_Z) {
-			/*
-			 * This is a documented feature: when deciding
-			 * whether to pad with zeroes, ignore
-			 * leading blanks already in the value;
-			 * only look for numbers after that.
-			 * Not sure how useful this really is.
-			 * It's certainly confusing to code around.
-			 */
-			for (t = s; iblank(*t); t++)
-			    ;
-			/*
-			 * Allow padding after initial minus
-			 * for numeric variables.
-			 */
-			if ((v->pm->node.flags &
-			     (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) &&
-			    *t == '-')
-			    t++;
+    if (v->flags & VALFLAG_SUBST) {
+	if (v->pm->node.flags & (PM_LEFT|PM_RIGHT_B|PM_RIGHT_Z)) {
+	    int fwidth = v->pm->width ? v->pm->width : MB_METASTRLEN(s);
+	    switch (v->pm->node.flags & (PM_LEFT | PM_RIGHT_B | PM_RIGHT_Z)) {
+		char *t, *tend;
+		unsigned int t0;
+
+	    case PM_LEFT:
+	    case PM_LEFT | PM_RIGHT_Z:
+		t = s;
+		if (v->pm->node.flags & PM_RIGHT_Z)
+		    while (*t == '0')
+			t++;
+		else
+		    while (iblank(*t))
+			t++;
+		MB_METACHARINIT();
+		for (tend = t, t0 = 0; t0 < fwidth && *tend; t0++)
+		    tend += MB_METACHARLEN(tend);
+		/*
+		 * t0 is the number of characters from t used,
+		 * hence (fwidth - t0) is the number of padding
+		 * characters.  fwidth is a misnomer: we use
+		 * character counts, not character widths.
+		 *
+		 * (tend - t) is the number of bytes we need
+		 * to get fwidth characters or the entire string;
+		 * the characters may be multiple bytes.
+		 */
+		fwidth -= t0; /* padding chars remaining */
+		t0 = tend - t; /* bytes to copy from string */
+		s = (char *) hcalloc(t0 + fwidth + 1);
+		memcpy(s, t, t0);
+		if (fwidth)
+		    memset(s + t0, ' ', fwidth);
+		s[t0 + fwidth] = '\0';
+		break;
+	    case PM_RIGHT_B:
+	    case PM_RIGHT_Z:
+	    case PM_RIGHT_Z | PM_RIGHT_B:
+		{
+		    int zero = 1;
+		    /* Calculate length in possibly multibyte chars */
+		    unsigned int charlen = MB_METASTRLEN(s);
+
+		    if (charlen < fwidth) {
+			char *valprefend = s;
+			int preflen;
+			if (v->pm->node.flags & PM_RIGHT_Z) {
+			    /*
+			     * This is a documented feature: when deciding
+			     * whether to pad with zeroes, ignore
+			     * leading blanks already in the value;
+			     * only look for numbers after that.
+			     * Not sure how useful this really is.
+			     * It's certainly confusing to code around.
+			     */
+			    for (t = s; iblank(*t); t++)
+				;
+			    /*
+			     * Allow padding after initial minus
+			     * for numeric variables.
+			     */
+			    if ((v->pm->node.flags &
+				 (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) &&
+				*t == '-')
+				t++;
+			    /*
+			     * Allow padding after initial 0x or
+			     * base# for integer variables.
+			     */
+			    if (v->pm->node.flags & PM_INTEGER) {
+				if (isset(CBASES) &&
+				    t[0] == '0' && t[1] == 'x')
+				    t += 2;
+				else if ((valprefend = strchr(t, '#')))
+				    t = valprefend + 1;
+			    }
+			    valprefend = t;
+			    if (!*t)
+				zero = 0;
+			    else if (v->pm->node.flags &
+				     (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) {
+				/* zero always OK */
+			    } else if (!idigit(*t))
+				zero = 0;
+			}
+			/* number of characters needed for padding */
+			fwidth -= charlen;
+			/* bytes from original string */
+			t0 = strlen(s);
+			t = (char *) hcalloc(fwidth + t0 + 1);
+			/* prefix guaranteed to be single byte chars */
+			preflen = valprefend - s;
+			memset(t + preflen, 
+			       (((v->pm->node.flags & PM_RIGHT_B)
+				 || !zero) ?       ' ' : '0'), fwidth);
 			/*
-			 * Allow padding after initial 0x or
-			 * base# for integer variables.
+			 * Copy - or 0x or base# before any padding
+			 * zeroes.
 			 */
-			if (v->pm->node.flags & PM_INTEGER) {
-			    if (isset(CBASES) &&
-				t[0] == '0' && t[1] == 'x')
-				t += 2;
-			    else if ((valprefend = strchr(t, '#')))
-				t = valprefend + 1;
-			}
-			valprefend = t;
-			if (!*t)
-			    zero = 0;
-			else if (v->pm->node.flags &
-				 (PM_INTEGER|PM_EFLOAT|PM_FFLOAT)) {
-			    /* zero always OK */
-			} else if (!idigit(*t))
-			    zero = 0;
+			if (preflen)
+			    memcpy(t, s, preflen);
+			memcpy(t + preflen + fwidth,
+			       valprefend, t0 - preflen);
+			t[fwidth + t0] = '\0';
+			s = t;
+		    } else {
+			/* Need to skip (charlen - fwidth) chars */
+			for (t0 = charlen - fwidth; t0; t0--)
+			    s += MB_METACHARLEN(s);
 		    }
-		    /* number of characters needed for padding */
-		    fwidth -= charlen;
-		    /* bytes from original string */
-		    t0 = strlen(s);
-		    t = (char *) hcalloc(fwidth + t0 + 1);
-		    /* prefix guaranteed to be single byte chars */
-		    preflen = valprefend - s;
-		    memset(t + preflen, 
-			   (((v->pm->node.flags & PM_RIGHT_B)
-			     || !zero) ?       ' ' : '0'), fwidth);
-		    /*
-		     * Copy - or 0x or base# before any padding
-		     * zeroes.
-		     */
-		    if (preflen)
-			memcpy(t, s, preflen);
-		    memcpy(t + preflen + fwidth,
-			   valprefend, t0 - preflen);
-		    t[fwidth + t0] = '\0';
-		    s = t;
-		} else {
-		    /* Need to skip (charlen - fwidth) chars */
-		    for (t0 = charlen - fwidth; t0; t0--)
-			s += MB_METACHARLEN(s);
 		}
+		break;
 	    }
+	}
+	switch (v->pm->node.flags & (PM_LOWER | PM_UPPER)) {
+	case PM_LOWER:
+	    s = casemodify(s, CASMOD_LOWER);
+	    break;
+	case PM_UPPER:
+	    s = casemodify(s, CASMOD_UPPER);
 	    break;
 	}
     }
-    switch (v->pm->node.flags & (PM_LOWER | PM_UPPER)) {
-    case PM_LOWER:
-	s = casemodify(s, CASMOD_LOWER);
-	break;
-    case PM_UPPER:
-	s = casemodify(s, CASMOD_UPPER);
-	break;
-    }
     if (v->start == 0 && v->end == -1)
 	return s;
 
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.81
diff -u -r1.81 subst.c
--- Src/subst.c	13 Dec 2007 20:52:56 -0000	1.81
+++ Src/subst.c	16 Dec 2007 13:42:36 -0000
@@ -2059,8 +2059,11 @@
 		 * There really is a value.  Padding and case
 		 * transformations used to be handled here, but
 		 * are now handled in getstrvalue() for greater
-		 * consistency.
+		 * consistency.  However, we get unexpected effects
+		 * if we allow them to applied on every call, so
+		 * set the flag that allows them to be substituted.
 		 */
+		v->flags |= VALFLAG_SUBST;
 		val = getstrvalue(v);
 	    }
 	}
Index: Src/zsh.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/zsh.h,v
retrieving revision 1.117
diff -u -r1.117 zsh.h
--- Src/zsh.h	6 Jul 2007 21:52:39 -0000	1.117
+++ Src/zsh.h	16 Dec 2007 13:42:37 -0000
@@ -599,7 +599,8 @@
 
 enum {
     VALFLAG_INV =	0x0001,	/* We are performing inverse subscripting */
-    VALFLAG_EMPTY =	0x0002	/* Subscripted range is empty */
+    VALFLAG_EMPTY =	0x0002,	/* Subscripted range is empty */
+    VALFLAG_SUBST =     0x0004  /* Substitution, so apply padding, case flags */
 };
 
 #define MAX_ARRLEN    262144
Index: Test/B02typeset.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/B02typeset.ztst,v
retrieving revision 1.17
diff -u -r1.17 B02typeset.ztst
--- Test/B02typeset.ztst	13 Dec 2007 20:52:56 -0000	1.17
+++ Test/B02typeset.ztst	16 Dec 2007 13:42:37 -0000
@@ -430,17 +430,17 @@
  local case1=upper
  typeset -u case1
  print $case1
- UPPER="VALUE OF \$UPPER"
+ upper="VALUE OF \$UPPER"
  print ${(P)case1}
-0:Upper case conversion
+0:Upper case conversion, does not apply to values used internally
 >UPPER
 >VALUE OF $UPPER
 
  local case2=LOWER
  typeset -l case2
  print $case2
- lower="value of \$lower"
+ LOWER="value of \$lower"
  print ${(P)case2}
-0:Lower case conversion
+0:Lower case conversion, does not apply to values used internally
 >lower
 >value of $lower

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-12-16 13:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-12-13 20:43 PATCH: internal parameter flags (resend) Peter Stephenson
2007-12-14  4:46 ` Bart Schaefer
2007-12-14 10:14   ` Peter Stephenson
2007-12-15  5:56     ` Bart Schaefer
2007-12-16 13:52       ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).