zsh-workers
 help / color / mirror / code / Atom feed
* PATCH: Array subscript documentation
@ 2001-04-22 18:51 Bart Schaefer
  2001-04-22 22:40 ` Peter Stephenson
  0 siblings, 1 reply; 3+ messages in thread
From: Bart Schaefer @ 2001-04-22 18:51 UTC (permalink / raw)
  To: zsh-workers

As usually happens when writing documentation, I also found a small bug with
the interpretation of the (k) and (K) subscript flags, so a patch and test
for that is included.

I added some additional cross-references between the parameter expansion and
array parameter sections, plus a few `quotes' in the expansion section to
match the style that had already existed in the array section.

Hoefully this doc covers all the questions that have come up recently; in
particular, using `assoc[(e)*]=star' to assign to the element with a literal
key `*' is something I should have thought of long ago (and works in any
version of zsh that has associative arrays, I didn't have to change any
code for that at all).

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.27
diff -u -r1.27 expn.yo
--- Doc/Zsh/expn.yo	2001/04/01 16:19:17	1.27
+++ Doc/Zsh/expn.yo	2001/04/22 18:44:10
@@ -556,11 +556,15 @@
 substitutes the value of tt($foo) with both `tt(head)' and `tt(tail)'
 deleted.  The form with tt($LPAR())...tt(RPAR()) is often useful in
 combination with the flags described next; see the examples below.
+Each var(name) or nested tt(${)...tt(}) in a parameter expansion may
+also be followed by a subscript expression as described in
+ifzman(em(Array Parameters) in zmanref(zshparam))\
+ifnzman(noderef(Array Parameters)).
 
-Note that double quotes may appear around nested substitutions, in which
+Note that double quotes may appear around nested expressions, in which
 case only the part inside is treated as quoted; for example,
 tt(${(f)"$(foo)"}) quotes the result of tt($(foo)), but the flag `tt((f))'
-(see below) is applied using the rules for unquoted substitutions.  Note
+(see below) is applied using the rules for unquoted expansions.  Note
 further that quotes are themselves nested in this context; for example, in
 tt("${(@f)"$(foo)"}"), there are two sets of quotes, one surrounding the
 whole expression, the other (redundant) surrounding the tt($(foo)) as
@@ -579,19 +583,19 @@
 
 startitem()
 item(tt(A))(
-Create an array parameter with tt(${)...tt(=)...tt(}),
-tt(${)...tt(:=)...tt(}) or tt(${)...tt(::=)...tt(}).
-If this flag is repeated (as in tt(AA)), create an associative
+Create an array parameter with `tt(${)...tt(=)...tt(})',
+`tt(${)...tt(:=)...tt(})' or `tt(${)...tt(::=)...tt(})'.
+If this flag is repeated (as in `tt(AA)'), create an associative
 array parameter.  Assignment is made before sorting or padding.
 The var(name) part may be a subscripted range for ordinary
 arrays; the var(word) part em(must) be converted to an array, for
-example by using tt(${(AA)=)var(name)tt(=)...tt(}) to activate word
+example by using `tt(${(AA)=)var(name)tt(=)...tt(})' to activate word
 splitting, when creating an associative array.
 )
 item(tt(@))(
 In double quotes, array elements are put into separate words.
-E.g., tt("${(@)foo}") is equivalent to tt("${foo[@]}") and
-tt("${(@)foo[1,2]}") is the same as tt("$foo[1]" "$foo[2]").
+E.g., `tt("${(@)foo}")' is equivalent to `tt("${foo[@]}")' and
+`tt("${(@)foo[1,2]}")' is the same as `tt("$foo[1]" "$foo[2]")'.
 )
 item(tt(e))(
 Perform em(parameter expansion), em(command substitution) and
Index: Doc/Zsh/params.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/params.yo,v
retrieving revision 1.6
diff -u -r1.6 params.yo
--- Doc/Zsh/params.yo	2000/12/04 12:03:26	1.6
+++ Doc/Zsh/params.yo	2001/04/22 18:44:24
@@ -8,13 +8,14 @@
 `tt(*)', `tt(@)', `tt(#)', `tt(?)', `tt(-)', `tt($)', or `tt(!)'.
 The value may be a em(scalar) (a string),
 an integer, an array (indexed numerically), or an em(associative)
-array (an unordered set of name-value pairs, indexed by name).
-To assign a scalar or integer value to a parameter,
-use the tt(typeset) builtin.
+array (an unordered set of name-value pairs, indexed by name).  To declare
+the type of a parameter, or to assign a scalar or integer value to a
+parameter, use the tt(typeset) builtin.
 findex(typeset, use of)
-To assign an array value, use `tt(set -A) var(name) var(value) ...'.
-findex(set, use of)
-The value of a parameter may also be assigned by writing:
+
+The value of a scalar or integer parameter may also be assigned by
+writing:
+cindex(assignment)
 
 indent(var(name)tt(=)var(value))
 
@@ -22,6 +23,12 @@
 is subject to arithmetic evaluation.  See noderef(Array Parameters)
 for additional forms of assignment.
 
+To refer to the value of a parameter, write `tt($)var(name)' or
+`tt(${)var(name)tt(})'.  See
+ifzman(em(Parameter Expansion) in zmanref(zshexpn))\
+ifnzman(noderef(Parameter Expansion))
+for complete details.
+
 In the parameter lists that follow, the mark `<S>' indicates that the
 parameter is special.
 Special parameters cannot have their type changed, and they stay special even
@@ -36,40 +43,74 @@
 endmenu()
 texinode(Array Parameters)(Positional Parameters)()(Parameters)
 sect(Array Parameters)
-The value of an array parameter may be assigned by writing:
+To assign an array value, write one of:
+findex(set, use of)
+cindex(array assignment)
 
+indent(tt(set -A) var(name) var(value) ...)
 indent(var(name)tt(=LPAR())var(value) ...tt(RPAR()))
 
 If no parameter var(name) exists, an ordinary array parameter is created.
-Associative arrays must be declared first, by `tt(typeset -A) var(name)'.
-When var(name) refers to an associative array, the parenthesized list is
-interpreted as alternating keys and values:
+If the parameter var(name) exists and is a scalar, it is replaced by a new
+array.  Ordinary array parameters may also be explicitly declared with:
+findex(typeset, use of)
+
+indent(tt(typeset -a) var(name))
+
+Associative arrays em(must) be declared before assignment, by using:
+
+indent(tt(typeset -A) var(name))
 
+When var(name) refers to an associative array, the list in an assignment
+is interpreted as alternating keys and values:
+
+indent(set -A var(name) var(key) var(value) ...)
 indent(var(name)tt(=LPAR())var(key) var(value) ...tt(RPAR()))
+
+Every var(key) must have a var(value) in this case.  Note that this
+assigns to the entire array, deleting any elements that do not appear
+in the list.
 
-Every var(key) must have a var(value) in this case.  To create an empty
-array or associative array, use:
+To create an empty array (including associative arrays), use one of:
 
+indent(tt(set -A) var(name))
 indent(var(name)tt(=LPAR()RPAR()))
 
-Individual elements of an array may be selected using a
-subscript.  A subscript of the form `tt([)var(exp)tt(])'
-selects the single element var(exp), where var(exp) is
-an arithmetic expression which will be subject to arithmetic
-expansion as if it were surrounded by `tt($LPAR()LPAR())...tt(RPAR()RPAR())'.
-The elements are numbered beginning with 1 unless the
-tt(KSH_ARRAYS) option is set when they are numbered from zero.
+subsect(Array Subscripts)
 cindex(subscripts)
+
+Individual elements of an array may be selected using a subscript.  A
+subscript of the form `tt([)var(exp)tt(])' selects the single element
+var(exp), where var(exp) is an arithmetic expression which will be subject
+to arithmetic expansion as if it were surrounded by
+`tt($LPAR()LPAR())...tt(RPAR()RPAR())'.  The elements are numbered
+beginning with 1, unless the tt(KSH_ARRAYS) option is set in which case
+they are numbered from zero.
 pindex(KSH_ARRAYS, use of)
 
-The same subscripting syntax is used for associative arrays,
-except that no arithmetic expansion is applied to var(exp).
+Subscripts may be used inside braces used to delimit a parameter name, thus
+`tt(${foo[2]})' is equivalent to `tt($foo[2])'.  If the tt(KSH_ARRAYS)
+option is set, the braced form is the only one that works, as bracketed
+expressions otherwise are not treated as subscripts.
 
-A subscript of the form `tt([*])' or `tt([@])' evaluates to all
-elements of an array; there is no difference between the two
-except when they appear within double quotes.
-`tt("$foo[*]")' evaluates to `tt("$foo[1] $foo[2] )...tt(")', while
-`tt("$foo[@]")' evaluates to `tt("$foo[1]" "$foo[2]")', etc.
+The same subscripting syntax is used for associative arrays, except that
+no arithmetic expansion is applied to var(exp).  However, the parsing
+rules for arithmetic expressions still apply, which affects the way that
+certain special characters must be protected from interpretation.  See
+em(Subscript Parsing) below for details.
+
+A subscript of the form `tt([*])' or `tt([@])' evaluates to all elements
+of an array; there is no difference between the two except when they
+appear within double quotes.
+`tt("$foo[*]")' evaluates to `tt("$foo[1] $foo[2] )...tt(")', whereas
+`tt("$foo[@]")' evaluates to `tt("$foo[1]" "$foo[2]" )...'.  For
+associative arrays, `tt([*])' or `tt([@])' evaluate to all the values (not
+the keys, but see em(Subscript Flags) below), in no particular order.
+When an array parameter is referenced as `tt($)var(name)' (with no
+subscript) it evaluates to `tt($)var(name)tt([*])', unless the tt(KSH_ARRAYS)
+option is set in which case it evaluates to `tt(${)var(name)tt([0]})' (for
+an associative array, this means the value of the key `tt(0)', which may
+not exist even if there are values for other keys).
 
 A subscript of the form `tt([)var(exp1)tt(,)var(exp2)tt(])'
 selects all elements in the range var(exp1) to var(exp2),
@@ -85,27 +126,45 @@
 For example, if tt(FOO) is set to `tt(foobar)', then
 `tt(echo $FOO[2,5])' prints `tt(ooba)'.
 
-Subscripts may be used inside braces used to delimit a parameter name, thus
-`tt(${foo[2]})' is equivalent to `tt($foo[2])'.  If the tt(KSH_ARRAYS)
-option is set, the braced form is the only one that will
-work, the subscript otherwise not being treated specially.
+subsect(Array Element Assignment)
+
+A subscript may be used on the left side of an assignment like so:
 
-If a subscript is used on the left side of an assignment the selected
-element or range is replaced by the expression on the right side.  An
-array (but not an associative array) may be created by assignment to a
-range or element.  Arrays do not nest, so assigning a parenthesized list
-of values to an element or range changes the number of elements in the
-array, shifting the other elements to accommodate the new values.  (This
-is not supported for associative arrays.)
+indent(var(name)tt([)var(exp)tt(]=)var(value))
 
+In this form of assignment the element or range specified by var(exp)
+is replaced by the expression on the right side.  An array (but not an
+associative array) may be created by assignment to a range or element.
+Arrays do not nest, so assigning a parenthesized list of values to an
+element or range changes the number of elements in the array, shifting the
+other elements to accommodate the new values.  (This is not supported for
+associative arrays.)
+
+This syntax also works as an argument to the tt(typeset) command:
+
+indent(tt(typeset) tt(")var(name)tt([)var(exp)tt(]"=)var(value))
+
+The var(value) may em(not) be a parenthesized list in this case; only
+single-element assignments may be made with tt(typeset).  Note that quotes
+are necessary in this case to prevent the brackets from being interpreted
+as filename generation operators.  The tt(noglob) precommand modifier
+could be used instead.
+
 To delete an element of an ordinary array, assign `tt(LPAR()RPAR())' to
-that element.
-To delete an element of an associative array, use the tt(unset) command.
+that element.  To delete an element of an associative array, use the
+tt(unset) command:
 
-If the opening bracket or the comma is directly followed by an opening
-parentheses the string up to the matching closing one is considered to
-be a list of flags. The flags currently understood are:
+indent(tt(unset) tt(")var(name)tt([)var(exp)tt(]"))
 
+subsect(Subscript Flags)
+cindex(subscript flags)
+
+If the opening bracket, or the comma in a range, in any subscript
+expression is directly followed by an opening parenthesis, the string up
+to the matching closing one is considered to be a list of flags, as in
+`var(name)tt([LPAR())var(flags)tt(RPAR())var(exp)tt(])'.  The flags
+currently understood are:
+
 startitem()
 item(tt(w))(
 If the parameter subscripted is a scalar than this flag makes
@@ -126,54 +185,176 @@
 separated by newlines.  This is a shorthand for `tt(pws:\n:)'.
 )
 item(tt(r))(
-Reverse subscripting:  if this flag is given, the var(exp) is taken as a
-pattern and the  result is the first matching array element, substring or
-word (if the parameter is an array, if it is a scalar, or if it is a scalar
-and the `tt(w)' flag is given, respectively).  The subscript used is the
-number of the matching element, so that pairs of subscripts such as
-`tt($foo[(r))var(??)tt(,3])' and `tt($foo[(r))var(??)tt(,(r)f*])'
-are possible.  If the parameter is an associative array, only the value part
-of each pair is compared to the pattern.
+Reverse subscripting: if this flag is given, the var(exp) is taken as a
+pattern and the result is the first matching array element, substring or
+word (if the parameter is an array, if it is a scalar, or if it is a
+scalar and the `tt(w)' flag is given, respectively).  The subscript used
+is the number of the matching element, so that pairs of subscripts such as
+`tt($foo[(r))var(??)tt(,3])' and `tt($foo[(r))var(??)tt(,(r)f*])' are
+possible.  If the parameter is an associative array, only the value part
+of each pair is compared to the pattern, and the result is that value.
+Reverse subscripts may be used for assigning to ordinary array elements,
+but not for assigning to associative arrays.
 )
 item(tt(R))(
 Like `tt(r)', but gives the last match.  For associative arrays, gives
 all possible matches.
 )
-item(tt(k))(
-If used in a subscript on a parameter that is not an associative
-array, this behaves like `tt(r)', but if used on an association, it
-makes the keys be interpreted as patterns and returns the first value
-whose key matches the var(exp).
-)
-item(tt(K))(
-On an association this is like `tt(k)' but returns all values whose
-keys match the var(exp). On other types of parameters this has the
-same effect as `tt(R)'.
-)
 item(tt(i))(
-like `tt(r)', but gives the index of the match instead; this may not
-be combined with a second argument.  For associative arrays, the key
-part of each pair is compared to the pattern, and the first matching
-key found is used.
+Like `tt(r)', but gives the index of the match instead; this may not be
+combined with a second argument.  On the left side of an assignment,
+behaves like `tt(r)'.  For associative arrays, the key part of each pair
+is compared to the pattern, and the first matching key found is the
+result.
 )
 item(tt(I))(
-like `tt(i)', but gives the index of the last match, or all possible
+Like `tt(i)', but gives the index of the last match, or all possible
 matching keys in an associative array.
 )
+item(tt(k))(
+If used in a subscript on an associative array, this flag causes the keys
+to be interpreted as patterns, and returns the value for the first key
+found where var(exp) is matched by the key.  This flag does not work on
+the left side of an assignment to an associative array element.  If used
+on another type of parameter, this behaves like `tt(r)'.
+)
+item(tt(K))(
+On an associative array this is like `tt(k)' but returns all values where
+var(exp) is matched by the keys.  On other types of parameters this has
+the same effect as `tt(R)'.
+)
 item(tt(n:)var(expr)tt(:))(
-if combined with `tt(r)', `tt(R)', `tt(i)' or `tt(I)', makes them give
+If combined with `tt(r)', `tt(R)', `tt(i)' or `tt(I)', makes them give
 the var(n)th or var(n)th last match (if var(expr) evaluates to
 var(n)).  This flag is ignored when the array is associative.
 )
 item(tt(b:)var(expr)tt(:))(
-if combined with `tt(r)', `tt(R)', `tt(i)' or `tt(I)', makes them begin
+If combined with `tt(r)', `tt(R)', `tt(i)' or `tt(I)', makes them begin
 at the var(n)th or var(n)th last element, word, or character (if var(expr)
 evaluates to var(n)).  This flag is ignored when the array is associative.
 )
 item(tt(e))(
-This option has no effect and retained for backward compatibility only.
+This flag has no effect and for ordinary arrays is retained for backward
+compatibility only.  For associative arrays, this flag can be used to
+force tt(*) or tt(@) to be interpreted as a single key rather than as a
+reference to all values.  This flag may be used on the left side of an
+assignment.
 )
 enditem()
+
+See em(Parameter Expansion Flags) (\
+ifzman(zmanref(zshexpn))\
+ifnzman(noderef(Parameter Expansion))\
+) for additional ways to manipulate the results of array subscripting.
+
+subsect(Subscript Parsing)
+
+This discussion applies mainly to associative array key strings and to
+patterns used for reverse subscripting (the `tt(r)', `tt(R)', `tt(i)',
+etc. flags), but it may also affect parameter substitutions that appear
+as part of an arithmetic expression in an ordinary subscript.
+
+The basic rule to remember when writing a subscript expression is that all
+text between the opening `tt([)' and the closing `tt(])' is interpreted
+em(as if) it were in double quotes (\
+ifzman(see zmanref(zshmisc))\
+ifnzman(noderef(Quoting))\
+).  However, unlike double quotes which normally cannot nest, subscript
+expressions may appear inside double-quoted strings or inside other
+subscript expressions (or both!), so the rules have two important
+differences.
+
+The first difference is that brackets (`tt([)' and `tt(])') must appear as
+balanced pairs in a subscript expression unless they are preceded by a
+backslash (`tt(\)').  Therefore, within a subscript expression (and unlike
+true double-quoting) the sequence `tt(\[)' becomes `tt([)', and similarly
+`tt(\])' becomes `tt(])'.  This applies even in cases where a backslash is
+not normally required; for example, the pattern `tt([^[])' (to match any
+character other than an open bracket) should be written `tt([^\[])' in a
+reverse-subscript pattern.  However, note that `tt(\[^\[\])' and even
+`tt(\[^[])' mean the em(same) thing, because backslashes are always
+stripped when they appear before brackets!
+
+The same rule applies to parentheses (`tt(LPAR())' and `tt(RPAR())') and
+braces (`tt({)' and `tt(})'): they must appear either in balanced pairs or
+preceded by a backslash, and backslashes that protect parentheses or
+braces are removed during parsing.  This is because parameter expansions
+may be surrounded balanced braces, and subscript flags are introduced by
+balanced parens.
+
+The second difference is that a double-quote (`tt(")') may appear as part
+of a subscript expression without being preceded by a backslash, and
+therefore that the two characters `tt(\")' remain as two characters in the
+subscript (in true double-quoting, `tt(\")' becomes `tt(")').  However,
+because of the standard shell quoting rules, any double-quotes that appear
+must occur in balanced pairs unless preceded by a backslash.  This makes
+it more difficult to write a subscript expression that contains an odd
+number of double-quote characters, but the reason for this difference is
+so that when a subscript expression appears inside true double-quotes, one
+can still write `tt(\")' (rather than `tt(\\\")') for `tt(")'.
+
+To use an odd number of double quotes as a key in an assignment, use the
+tt(typeset) builtin and an enclosing pair of double quotes; to refer to
+the value of that key, again use double quotes:
+
+example(typeset -A aa
+typeset "aa[one\"two\"three\"quotes]"=QQQ
+print "$aa[one\"two\"three\"quotes]")
+
+It is important to note that the quoting rules do not change when a
+parameter expansion with a subscript is nested inside another subscript
+expression.  That is, it is not necessary to use additional backslashes
+within the inner subscript expression; they are removed only once, from
+the innermost subscript outwards.  Parameters are also expanded from the
+innermost subscript first, as each expansion is encountered left to right
+in the outer expression.
+
+A further complication arises from a way in which subscript parsing is
+em(not) different from double quote parsing.  As in true double-quoting,
+the sequences `tt(\*)', and `tt(\@)' remain as two characters when they
+appear in a subscript expression.  To use a literal `tt(*)' or `tt(@)' as
+an associative array key, the `tt(e)' flag must be used:
+
+example(typeset -A aa
+aa[(e)*]=star
+print $aa[(e)*])
+
+A last detail must be considered when reverse subscripting is performed.
+Parameters appearing in the subscript expression are first expanded and
+then the complete expression is interpreted as a pattern.  This has two
+effects: first, parameters behave as if tt(GLOB_SUBST) were on (and it
+cannot be turned off); second, backslashes are interpreted twice, once
+when parsing the array subscript and again when parsing the pattern.  In a
+reverse subscript, it's necessary to use em(four) backslashes to cause a
+single backslash to match literally in the pattern.  For complex patterns,
+it is often easiest to assign the desired pattern to a parameter and then
+refer to that parameter in the subscript, because then the backslashes,
+brackets, parentheses, etc., are seen only when the complete expression is
+converted to a pattern.  To match the value of a parameter literally in a
+reverse subscript, rather than as a pattern,
+use `tt(${LPAR()q)tt(RPAR())var(name)tt(})' (\
+ifzman(see zmanref(zshexpn))\
+ifnzman(noderef(Parameter Expansion))\
+) to quote the expanded value.
+
+Note that the `tt(k)' and `tt(K)' flags are reverse subscripting for an
+ordinary array, but are em(not) reverse subscripting for an associative
+array!  (For an associative array, the keys in the array itself are
+interpreted as patterns by those flags; the subscript is a plain string
+in that case.)
+
+One final note, not directly related to subscripting: the numeric names
+of positional parameters (\
+ifzman(described below)\
+ifnzman(noderef(Positional Parameters))\
+) are parsed specially, so for example `tt($2foo)' is equivalent to
+`tt(${2}foo)'.  Therefore, to use subscript syntax to extract a substring
+from a positional parameter, the expansion must be surrounded by braces;
+for example, `tt(${2[3,5]})' evaluates to the third through fifth
+characters of the second positional parameter, but `tt($2[3,5])' is the
+entire second parameter concatenated with the filename generation pattern
+`tt([3,5])'.
+
 texinode(Positional Parameters)(Local Parameters)(Array Parameters)(Parameters)
 sect(Positional Parameters)
 The positional parameters provide access to the command-line arguments
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.39
diff -u -r1.39 params.c
--- Src/params.c	2001/04/22 14:46:59	1.39
+++ Src/params.c	2001/04/22 18:44:54
@@ -924,14 +924,17 @@
 		       (ishash || c != ',')) || i); t++) {
 	/* Untokenize INULL() except before brackets, for parsestr() */
 	if (INULL(c)) {
-	    if (t[1] == '[' || t[1] == ']') {
+	    c = t[1];
+	    if (c == '[' || c == ']' ||
+		c == '(' || c == ')' ||
+		c == '{' || c == '}') {
 		/* This test handles nested subscripts in hash keys */
 		if (ishash && i)
-		    *t = ztokens[c - Pound];
+		    *t = ztokens[*t - Pound];
 		needtok = 1;
 		++t;
 	    } else
-		*t = ztokens[c - Pound];
+		*t = ztokens[*t - Pound];
 	    continue;
 	}
 	/* Inbrack and Outbrack are probably never found here ... */
@@ -950,7 +953,7 @@
      * are not backslashed after parsestr().  Otherwise leave them alone *
      * so that the brackets will be escaped when we patcompile() or when *
      * subscript arithmetic is performed (for nested subscripts).        */
-    if (ishash && !rev)
+    if (ishash && (keymatch || !rev))
 	remnulargs(s);
     if (needtok) {
 	if (parsestr(s))
@@ -1034,8 +1037,10 @@
 		}
 	    }
 	}
-	tokenize(s);
-	remnulargs(s);
+	if (!keymatch) {
+	    tokenize(s);
+	    remnulargs(s);
+	}
 
 	if (keymatch || (pprog = patcompile(s, 0, NULL))) {
 	    int len;
@@ -1044,10 +1049,9 @@
 		if (ishash) {
 		    scanprog = pprog;
 		    scanstr = s;
-		    if (keymatch) {
-			untokenize(s);
+		    if (keymatch)
 			v->isarr |= SCANPM_KEYMATCH;
-		    } else if (ind)
+		    else if (ind)
 			v->isarr |= SCANPM_MATCHKEY;
 		    else
 			v->isarr |= SCANPM_MATCHVAL;
Index: Test/D06subscript.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D06subscript.ztst,v
retrieving revision 1.2
diff -u -r1.2 D06subscript.ztst
--- Test/D06subscript.ztst	2001/04/21 18:49:13	1.2
+++ Test/D06subscript.ztst	2001/04/22 18:44:56
@@ -100,13 +100,16 @@
   print -R ${(k)A[(r)qxstar]} $A[${(q)x}]
   # A[*] is interpreted specially, assignment to it fails silently (oops)
   A[*]=star
-  A[\*]=backstar
   print -R ${(k)A[(r)star]} $A[$x]
+  A[(e)*]=star
+  A[\*]=backstar
+  print -R ${(k)A[(r)star]} $A[(e)*]
   print -R ${(k)A[(r)backstar]} $A[\*]
 0:Associative array assignment
 >* xstar
 >\* qxstar
 >xstar
+>* star
 >\* backstar
 
   o='['
@@ -136,3 +139,9 @@
 >zounds
 >zounds
 >zounds
+
+  print -R ${(o)A[(K)\]]}
+  print -R ${(o)A[(K)\\\]]}
+0:Associative array keys interpreted as patterns
+>\2 backcbrack cbrack star
+>\\\4 \\\? star zounds

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com

Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net   


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: PATCH: Array subscript documentation
  2001-04-22 18:51 PATCH: Array subscript documentation Bart Schaefer
@ 2001-04-22 22:40 ` Peter Stephenson
  2001-04-23 15:05   ` Bart Schaefer
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Stephenson @ 2001-04-22 22:40 UTC (permalink / raw)
  To: Zsh hackers list

> +example(typeset -A aa
> +typeset "aa[one\"two\"three\"quotes]"=QQQ
> +print "$aa[one\"two\"three\"quotes]")

Unless there's something remaining uncommitted, the last line still doesn't
work.  The assignment strips the backslashes, but the expansion doesn't.

I suppose that's because the Bnull's don't get stripped till after the end
of the parameter expansion.  But I don't really understand.

-- 
Peter Stephenson <pws@pwstephenson.fsnet.co.uk>
Work: pws@csr.com
Web: http://www.pwstephenson.fsnet.co.uk


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: PATCH: Array subscript documentation
  2001-04-22 22:40 ` Peter Stephenson
@ 2001-04-23 15:05   ` Bart Schaefer
  0 siblings, 0 replies; 3+ messages in thread
From: Bart Schaefer @ 2001-04-23 15:05 UTC (permalink / raw)
  To: Zsh hackers list

On Apr 22, 11:40pm, Peter Stephenson wrote:
} Subject: Re: PATCH: Array subscript documentation
}
} > +example(typeset -A aa
} > +typeset "aa[one\"two\"three\"quotes]"=QQQ
} > +print "$aa[one\"two\"three\"quotes]")
} 
} Unless there's something remaining uncommitted, the last line still
} doesn't work. The assignment strips the backslashes, but the expansion
} doesn't.

Oh, dear.  When I tested this, I forgot to do the `typeset -A' first, so
naturally it appeared to work because both strings evaluate to zero as
arithmetic on an ordinary array.

} I suppose that's because the Bnull's don't get stripped till after the
} end of the parameter expansion. But I don't really understand.

Dnulls, actually, and yes, that's exactly it.  Turns out getindex() needs
to know whether it's being called from inside double quotes.  I'm going
to commit the following patch, which fixes the bug above, and then look
at ways to eliminate the extra strchr(), as the caller of getindex()
ought to be equipped to supply this information.

I was distracted for quite a while trying to fix this bug:

% typeset -A aa
% typeset "aa[one\"two\"three\"quotes]"=QQQ
% print $aa[one"two"three"quotes]"
QQQ

Note in the print line, that the quotes are balanced but that the fourth
quote is outside the brackets.  This should be a parse error.  However,
this bug is present in 4.0.1-pre-2 and other earlier versions of zsh, so
I eventually gave up on it.

Incidentally, an extra thank you goes to everyone who contributed to the 
test suite, especially PWS and Sven.  I wouldn't have been willing/able
to fiddle with this whole subscripting issue if there hadn't been a way
to thoroughly check that I wasn't breaking a vital bit of shell parsing.

diff -ru -x CVS zsh-forge/current/Src/lex.c zsh-4.0/Src/lex.c
--- zsh-forge/current/Src/lex.c	Sat Apr 21 11:40:35 2001
+++ zsh-4.0/Src/lex.c	Sun Apr 22 20:58:02 2001
@@ -1305,7 +1305,8 @@
 		    c == endchar || c == '`' ||
 		    (endchar == ']' && (c == '[' || c == ']' ||
 					c == '(' || c == ')' ||
-					c == '{' || c == '}')))
+					c == '{' || c == '}' ||
+					(c == '"' && sub))))
 		    add(Bnull);
 		else {
 		    /* lexstop is implicitly handled here */
@@ -1390,7 +1391,7 @@
 		err = (!brct-- && math);
 	    break;
 	case '"':
-	    if (intick || endchar == ']' || (!endchar && !bct))
+	    if (intick || ((endchar == ']' || !endchar) && !bct))
 		break;
 	    if (bct) {
 		add(Dnull);
@@ -1463,7 +1464,7 @@
 
 /**/
 mod_export char *
-parse_subscript(char *s)
+parse_subscript(char *s, int sub)
 {
     int l = strlen(s), err;
     char *t;
@@ -1477,7 +1478,7 @@
     len = 0;
     bptr = tokstr = s;
     bsiz = l + 1;
-    err = dquote_parse(']', 1);
+    err = dquote_parse(']', sub);
     if (err) {
 	err = *bptr;
 	*bptr = 0;
diff -ru -x CVS zsh-forge/current/Src/params.c zsh-4.0/Src/params.c
--- zsh-forge/current/Src/params.c	Sun Apr 22 11:43:22 2001
+++ zsh-4.0/Src/params.c	Mon Apr 23 07:46:42 2001
@@ -785,7 +785,7 @@
 	return 0;
 
     /* Require balanced [ ] pairs with something between */
-    if (!(ss = parse_subscript(++ss)))
+    if (!(ss = parse_subscript(++ss, 1)))
 	return 0;
     untokenize(s);
     return !ss[1];
@@ -922,18 +922,18 @@
     for (t = s, i = 0;
 	 (c = *t) && ((c != Outbrack &&
 		       (ishash || c != ',')) || i); t++) {
-	/* Untokenize INULL() except before brackets, for parsestr() */
+	/* Untokenize INULL() except before brackets and double-quotes */
 	if (INULL(c)) {
 	    c = t[1];
 	    if (c == '[' || c == ']' ||
 		c == '(' || c == ')' ||
 		c == '{' || c == '}') {
 		/* This test handles nested subscripts in hash keys */
-		if (ishash && i)
+		if (ishash && i)
 		    *t = ztokens[*t - Pound];
 		needtok = 1;
 		++t;
-	    } else
+	    } else if (c != '"')
 		*t = ztokens[*t - Pound];
 	    continue;
 	}
@@ -1181,16 +1181,17 @@
 {
     int start, end, inv = 0;
     char *s = *pptr, *tbrack;
+    int dq = !!strchr(s, Dnull);
 
     *s++ = '[';
-    s = parse_subscript(s);	/* Error handled after untokenizing */
+    s = parse_subscript(s, dq);	/* Error handled after untokenizing */
     /* Now we untokenize everthing except INULL() markers so we can check *
      * for the '*' and '@' special subscripts.  The INULL()s are removed  *
      * in getarg() after we know whether we're doing reverse indexing.    */
     for (tbrack = *pptr + 1; *tbrack && tbrack != s; tbrack++) {
 	if (INULL(*tbrack) && !*++tbrack)
 	    break;
-	if (itok(*tbrack))
+	if (itok(*tbrack))	/* Need to check for Nularg here? */
 	    *tbrack = ztokens[*tbrack - Pound];
     }
     /* If we reached the end of the string (s == NULL) we have an error */
diff -ru -x CVS zsh-forge/current/Test/D06subscript.ztst zsh-4.0/Test/D06subscript.ztst
--- zsh-forge/current/Test/D06subscript.ztst	Sun Apr 22 11:43:22 2001
+++ zsh-4.0/Test/D06subscript.ztst	Sun Apr 22 21:44:07 2001
@@ -145,3 +145,18 @@
 0:Associative array keys interpreted as patterns
 >\2 backcbrack cbrack star
 >\\\4 \\\? star zounds
+
+  typeset "A[one\"two\"three\"quotes]"=QQQ
+  typeset 'A[one\"two\"three\"quotes]'=qqq
+  print -R "$A[one\"two\"three\"quotes]"
+  print -R $A[one\"two\"three\"quotes]
+  A[one"two"three"four"quotes]=QqQq
+  print -R $A[one"two"three"four"quotes]
+  print -R $A[$A[(i)one\"two\"three\"quotes]]
+  print -R "$A[$A[(i)one\"two\"three\"quotes]]"
+0:Associative array keys with double quotes
+>QQQ
+>qqq
+>QqQq
+>qqq
+>QQQ

-- 
Bart Schaefer                                 Brass Lantern Enterprises
http://www.well.com/user/barts              http://www.brasslantern.com

Zsh: http://www.zsh.org | PHPerl Project: http://phperl.sourceforge.net   


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2001-04-23 15:07 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-04-22 18:51 PATCH: Array subscript documentation Bart Schaefer
2001-04-22 22:40 ` Peter Stephenson
2001-04-23 15:05   ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).