zsh-workers
 help / color / mirror / code / Atom feed
* PATCH: bash-style substrings & subarrays
@ 2010-11-17 16:54 Peter Stephenson
  2010-11-18 12:44 ` Peter Stephenson
  2010-11-19 18:01 ` Bart Schaefer
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Stephenson @ 2010-11-17 16:54 UTC (permalink / raw)
  To: Zsh hackers list

This implements the ${NAME:OFFSET} and ${NAME:OFFSET:LENGTH} syntax.
This is basically for compatibility; we don't need the extra
functionality, but it's a syntax people are nowadays assuming they can
use.  The clash with what we've got is minor and probably mostly
negligible: modifiers take precedence, but this only applies when the
first character after the colon is alphabetic or &, which you wouldn't
obviously need, and the clash with ${NAME:-WORD} when OFFSET starts with
a - is not specific to zsh.

One thing I have not yet tried to do is the fact that the offset is
offset by 1 when the variable is * or @ in bash (i.e. corresponding to
having KSH_ARRAYS set, except it doesn't this time), i.e. ${*:1:1} gives
you $1 not $2.  Yech.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.121
diff -p -u -r1.121 expn.yo
--- Doc/Zsh/expn.yo	15 Oct 2010 18:56:17 -0000	1.121
+++ Doc/Zsh/expn.yo	17 Nov 2010 16:46:32 -0000
@@ -585,6 +585,45 @@ If var(name) is an array
 the matching array elements are removed (use the `tt((M))' flag to
 remove the non-matched elements).
 )
+xitem(tt(${)var(name)tt(:)var(offset)tt(}))
+item(tt(${)var(name)tt(:)var(offset)tt(:)var(length)tt(}))(
+This syntax gives effects similar to parameter subscripting
+in the form tt($)var(name)tt({)var(offset)tt(,)var(end)tt(}) but in
+a form compatible with other shells.
+
+If the variable var(name) is a scalar, substitute the contents
+starting from offset var(offset); if var(name) is an array,
+substitute elements from element var(offset).  If var(length) is
+given, substitute that many characters or elements, otherwise the
+entire rest of the scalar or array.
+
+var(offset) is treated similarly to a parameter subscript:
+the offset of the first character or element in var(name)
+is 0 if the option tt(KSH_ARRAYS) is set, else 1; a negative
+subscript counts backwards so that -1 corresponds to the last
+character or element.
+
+var(length) is always treated directly as a length and hence may not be
+negative.
+
+var(offset) and var(length) undergo the same set of shell substitutions
+as for scalar assignment; in addition, they are then subject to arithmetic
+evaluation.  Hence, for example
+
+example(print ${foo:3}
+print ${foo: 1 + 2}
+print ${foo:$(( 1 + 2))}
+print ${foo:$(echo 1 + 2)})
+
+all have the same effect.
+
+Note that if offset is negative, the tt(-) may not appear immediately
+after the tt(:) as this indicates the
+tt(${)var(name)tt(:-)var(word)tt(}) form of substitution; a space
+may be inserted before the tt(-).  Furthermore, neither var(offset) nor
+var(length) may begin with an alphabetic character or tt(&) as these are
+used to indicate history-style modifiers.
+)
 xitem(tt(${)var(name)tt(/)var(pattern)tt(/)var(repl)tt(}))
 item(tt(${)var(name)tt(//)var(pattern)tt(/)var(repl)tt(}))(
 Replace the longest possible match of var(pattern) in the expansion of
Index: Src/lex.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/lex.c,v
retrieving revision 1.56
diff -p -u -r1.56 lex.c
--- Src/lex.c	14 Sep 2010 14:46:26 -0000	1.56
+++ Src/lex.c	17 Nov 2010 16:46:32 -0000
@@ -1398,7 +1398,12 @@ gettokstr(int c, int sub)
 }
 
 
-/* Return non-zero for error (character to unget), else zero */
+/*
+ * Parse input as if in double quotes.
+ * endchar is the end character to expect.
+ * sub has got something to do with whether we are doing quoted substitution.
+ * Return non-zero for error (character to unget), else zero
+ */
 
 /**/
 static int
@@ -1591,14 +1596,20 @@ parsestrnoerr(char *s)
     return err;
 }
 
+/*
+ * Parse a subscript in string s.
+ * sub is passed down to dquote_parse().
+ * endchar is the final character.
+ * Return the next character, or NULL.
+ */
 /**/
 mod_export char *
-parse_subscript(char *s, int sub)
+parse_subscript(char *s, int sub, int endchar)
 {
     int l = strlen(s), err;
     char *t;
 
-    if (!*s || *s == ']')
+    if (!*s || *s == endchar)
 	return 0;
     lexsave();
     untokenize(t = dupstring(s));
@@ -1607,15 +1618,16 @@ parse_subscript(char *s, int sub)
     len = 0;
     bptr = tokstr = s;
     bsiz = l + 1;
-    err = dquote_parse(']', sub);
+    err = dquote_parse(endchar, sub);
     if (err) {
 	err = *bptr;
-	*bptr = 0;
+	*bptr = '\0';
 	untokenize(s);
 	*bptr = err;
-	s = 0;
-    } else
+	s = NULL;
+    } else {
 	s = bptr;
+    }
     strinend();
     inpop();
     DPUTS(cmdsp, "BUG: parse_subscript: cmdstack not empty.");
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.164
diff -p -u -r1.164 params.c
--- Src/params.c	3 Nov 2010 22:40:34 -0000	1.164
+++ Src/params.c	17 Nov 2010 16:46:33 -0000
@@ -1013,7 +1013,7 @@ isident(char *s)
 	return 0;
 
     /* Require balanced [ ] pairs with something between */
-    if (!(ss = parse_subscript(++ss, 1)))
+    if (!(ss = parse_subscript(++ss, 1, ']')))
 	return 0;
     untokenize(s);
     return !ss[1];
@@ -1628,7 +1628,7 @@ getindex(char **pptr, Value v, int flags
 
     *s++ = '[';
     /* Error handled after untokenizing */
-    s = parse_subscript(s, flags & SCANPM_DQUOTED);
+    s = parse_subscript(s, flags & SCANPM_DQUOTED, ']');
     /* Now we untokenize everything except inull() markers so we can check *
      * for the '*' and '@' special subscripts.  The inull()s are removed  *
      * in getarg() after we know whether we're doing reverse indexing.    */
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.108
diff -p -u -r1.108 subst.c
--- Src/subst.c	22 Oct 2010 16:32:36 -0000	1.108
+++ Src/subst.c	17 Nov 2010 16:46:33 -0000
@@ -1371,6 +1371,43 @@ untok_and_escape(char *s, int escapes, i
     return dst;
 }
 
+/*
+ * See if an argument str looks like a subscript or length following
+ * a colon and parse it.  It must be followed by a ':' or nothing.
+ * If this succeeds, expand and return the evaulated expression if
+ * found, else return NULL.
+ *
+ * We assume this is what is meant if the first character is not
+ * an alphabetic character or '&', which signify modifiers.
+ *
+ * Set *endp to point to the next character following.
+ */
+static char *
+check_colon_subscript(char *str, char **endp)
+{
+    int sav;
+
+    /* Could this be a modifier (or empty)? */
+    if (!*str || ialpha(*str) || *str == '&')
+	return NULL;
+
+    *endp = parse_subscript(str, 0, ':');
+    if (!*endp) {
+	/* No trailing colon? */
+	*endp = parse_subscript(str, 0, '\0');
+	if (!*endp)
+	    return NULL;
+    }
+    sav = **endp;
+    **endp = '\0';
+    if (parsestr(str = dupstring(str)))
+	return NULL;
+    singsub(&str);
+
+    **endp = sav;
+    return str;
+}
+
 /* parameter substitution */
 
 #define	isstring(c) ((c) == '$' || (char)(c) == String || (char)(c) == Qstring)
@@ -2683,6 +2720,97 @@ paramsubst(LinkList l, LinkNode n, char 
 	    }
 	    val = dupstring("");
 	}
+	if (colf && inbrace) {
+	    /*
+	     * Look for ${PARAM:OFFSET} or ${PARAM:OFFSET:LENGTH}.
+	     * This must appear before modifiers.  For compatibility
+	     * with bash we perform both standard string substitutions
+	     * and math eval.
+	     */
+	    char *check_offset2;
+	    char *check_offset = check_colon_subscript(s, &check_offset2);
+	    if (check_offset) {
+		zlong offset = mathevali(check_offset);
+		zlong length = (zlong)-1;
+		if (errflag)
+		    return NULL;
+		if ((*check_offset2 && *check_offset2 != ':')) {
+		    zerr("invalid subscript: %s", check_offset);
+		    return NULL;
+		}
+		if (*check_offset2) {
+		    check_offset = check_colon_subscript(check_offset2 + 1,
+							 &check_offset2);
+		    if (*check_offset2 && *check_offset2 != ':') {
+			zerr("invalid length: %s", check_offset);
+			return NULL;
+		    }
+		    length = mathevali(check_offset);
+		    if (errflag)
+			return NULL;
+		    if (length < (zlong)0) {
+			zerr("invalid length: %s", check_offset);
+			return NULL;
+		    }
+		}
+		if (!isset(KSHARRAYS) && offset > 0)
+		    offset--;
+		if (isarr) {
+		    int alen = arrlen(aval), count;
+		    char **srcptr, **dstptr, **newarr;
+
+		    if (offset < 0) {
+			offset += alen;
+			if (offset < 0)
+			    offset = 0;
+		    }
+		    if (length < 0)
+		      length = alen;
+		    if (offset > alen)
+			offset = alen;
+		    if (offset + length > alen)
+			length = alen - offset;
+		    count = length;
+		    srcptr = aval + offset;
+		    newarr = dstptr = (char **)
+			zhalloc((length+1)*sizeof(char *));
+		    while (count--)
+			*dstptr++ = dupstring(*srcptr++);
+		    *dstptr = (char *)NULL;
+		    aval = newarr;
+		} else {
+		    char *sptr, *eptr;
+		    if (offset < 0) {
+			MB_METACHARINIT();
+			for (sptr = val; *sptr; ) {
+			    sptr += MB_METACHARLEN(sptr);
+			    offset++;
+			}
+			if (offset < 0)
+			    offset = 0;
+		    }
+		    MB_METACHARINIT();
+		    for (sptr = val; *sptr && offset; ) {
+			sptr += MB_METACHARLEN(sptr);
+			offset--;
+		    }
+		    if (length >= 0) {
+			for (eptr = sptr; *eptr && length; ) {
+			    eptr += MB_METACHARLEN(eptr);
+			    length--;
+			}
+			val = dupstrpfx(sptr, eptr - sptr);
+		    } else {
+			val = dupstring(sptr);
+		    }
+		}
+		if (!*check_offset2) {
+		    colf = 0;
+		} else {
+		    s = check_offset2 + 1;
+		}
+	    }
+	}
 	if (colf) {
 	    /*
 	     * History style colon modifiers.  May need to apply
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.44
diff -p -u -r1.44 D04parameter.ztst
--- Test/D04parameter.ztst	6 Oct 2010 08:27:10 -0000	1.44
+++ Test/D04parameter.ztst	17 Nov 2010 16:46:33 -0000
@@ -1256,3 +1256,49 @@
 0:$ZSH_EVAL_CONTEXT and $zsh_eval_context
 >toplevel
 >shfunc cmdsubst
+
+   foo="123456789"
+   print ${foo:3}
+   print ${foo: 1 + 3}
+   print ${foo:$(( 2 + 3))}
+   print ${foo:$(echo 3 + 3)}
+   print ${foo:3:1}
+   print ${foo: 1 + 3:(4-2)/2}
+   print ${foo:$(( 2 + 3)):$(( 7 - 6 ))}
+   print ${foo:$(echo 3 + 3):`echo 4 - 3`}
+   print ${foo: -1}
+   print ${foo: -10}
+0:Bash-style subscripts, scalar
+>3456789
+>456789
+>56789
+>6789
+>3
+>4
+>5
+>6
+>9
+>123456789
+
+   foo=(1 2 3 4 5 6 7 8 9)
+   print ${foo:3}
+   print ${foo: 1 + 3}
+   print ${foo:$(( 2 + 3))}
+   print ${foo:$(echo 3 + 3)}
+   print ${foo:3:1}
+   print ${foo: 1 + 3:(4-2)/2}
+   print ${foo:$(( 2 + 3)):$(( 7 - 6 ))}
+   print ${foo:$(echo 3 + 3):`echo 4 - 3`}
+   print ${foo: -1}
+   print ${foo: -10}
+0:Bash-style subscripts, array
+>3 4 5 6 7 8 9
+>4 5 6 7 8 9
+>5 6 7 8 9
+>6 7 8 9
+>3
+>4
+>5
+>6
+>9
+>1 2 3 4 5 6 7 8 9

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-17 16:54 PATCH: bash-style substrings & subarrays Peter Stephenson
@ 2010-11-18 12:44 ` Peter Stephenson
  2010-11-19 18:01 ` Bart Schaefer
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Stephenson @ 2010-11-18 12:44 UTC (permalink / raw)
  To: Zsh hackers list

On Wed, 17 Nov 2010 16:54:17 +0000
Peter Stephenson <pws@csr.com> wrote:
> One thing I have not yet tried to do is the fact that the offset is
> offset by 1 when the variable is * or @ in bash (i.e. corresponding to
> having KSH_ARRAYS set, except it doesn't this time), i.e. ${*:1:1}
> gives you $1 not $2.

I refer honourable members to the answer I gave on a previous occasion.

> Yech.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.122
diff -p -u -r1.122 expn.yo
--- Doc/Zsh/expn.yo	18 Nov 2010 10:07:56 -0000	1.122
+++ Doc/Zsh/expn.yo	18 Nov 2010 12:38:42 -0000
@@ -623,6 +623,16 @@ tt(${)var(name)tt(:-)var(word)tt(}) form
 may be inserted before the tt(-).  Furthermore, neither var(offset) nor
 var(length) may begin with an alphabetic character or tt(&) as these are
 used to indicate history-style modifiers.
+
+For further compatibility with other shells there is a special case
+when the tt(KSH_ARRAYS) option is active, as in emulation of
+Bourne-style shells.  In this case array subscript 0 usually refers to the
+first element of the array.  However, if the substitution refers to the
+positional parameter array, e.g. tt($@) or tt($*), then offset 0
+instead refers to tt($0), offset 1 refers to tt($1), and so on.  In
+other words, the positional parameter array is effectively extended by
+prepending tt($0).  Hence tt(${*:0:1}) substitutes tt($0) and
+tt(${*:1:1}) substitutes tt($1).
 )
 xitem(tt(${)var(name)tt(/)var(pattern)tt(/)var(repl)tt(}))
 item(tt(${)var(name)tt(//)var(pattern)tt(/)var(repl)tt(}))(
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.109
diff -p -u -r1.109 subst.c
--- Src/subst.c	18 Nov 2010 10:07:56 -0000	1.109
+++ Src/subst.c	18 Nov 2010 12:38:42 -0000
@@ -1636,6 +1636,12 @@ paramsubst(LinkList l, LinkNode n, char 
      * and the argument passing to fetchvalue has another kludge.
      */
     int subexp;
+    /*
+     * If we're referring to the positional parameters, then
+     * e.g ${*:1:1} refers to $1 even if KSH_ARRAYS is in effect.
+     * This is for compatibility.
+     */
+    int horrible_offset_hack = 0;
 
     *s++ = '\0';
     /*
@@ -2281,6 +2287,12 @@ paramsubst(LinkList l, LinkNode n, char 
 		val = getstrvalue(v);
 	    }
 	}
+	/* See if this is a reference to the positional parameters. */
+	if (v && v->pm && v->pm->gsu.a == &vararray_gsu &&
+	    (char ***)v->pm->u.data == &pparams)
+	    horrible_offset_hack = 1;
+	else
+	    horrible_offset_hack = 0;
 	/*
 	 * Finished with the original parameter and its indices;
 	 * carry on looping to see if we need to do more indexing.
@@ -2732,6 +2744,7 @@ paramsubst(LinkList l, LinkNode n, char 
 	    if (check_offset) {
 		zlong offset = mathevali(check_offset);
 		zlong length = (zlong)-1;
+		int offset_hack_argzero = 0;
 		if (errflag)
 		    return NULL;
 		if ((*check_offset2 && *check_offset2 != ':')) {
@@ -2753,8 +2766,21 @@ paramsubst(LinkList l, LinkNode n, char 
 			return NULL;
 		    }
 		}
-		if (!isset(KSHARRAYS) && offset > 0)
-		    offset--;
+		if (!isset(KSHARRAYS) || horrible_offset_hack) {
+		    /*
+		     * As part of the 'orrible hoffset 'ack,
+		     * (what hare you? Han 'orrible hoffset 'ack,
+		     * sergeant major), if we are given a ksh/bash/POSIX
+		     * style array which includes offset 0, we use
+		     * $0.
+		     */
+		    if (isset(KSHARRAYS) && horrible_offset_hack &&
+			offset == 0 && isarr) {
+			offset_hack_argzero = 1;
+		    } else if (offset > 0) {
+			offset--;
+		    }
+		}
 		if (isarr) {
 		    int alen = arrlen(aval), count;
 		    char **srcptr, **dstptr, **newarr;
@@ -2764,6 +2790,8 @@ paramsubst(LinkList l, LinkNode n, char 
 			if (offset < 0)
 			    offset = 0;
 		    }
+		    if (offset_hack_argzero)
+			alen++;
 		    if (length < 0)
 		      length = alen;
 		    if (offset > alen)
@@ -2774,6 +2802,10 @@ paramsubst(LinkList l, LinkNode n, char 
 		    srcptr = aval + offset;
 		    newarr = dstptr = (char **)
 			zhalloc((length+1)*sizeof(char *));
+		    if (count && offset_hack_argzero) {
+			*dstptr++ = dupstring(argzero);
+			count--;
+		    }
 		    while (count--)
 			*dstptr++ = dupstring(*srcptr++);
 		    *dstptr = (char *)NULL;
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.45
diff -p -u -r1.45 D04parameter.ztst
--- Test/D04parameter.ztst	18 Nov 2010 10:07:56 -0000	1.45
+++ Test/D04parameter.ztst	18 Nov 2010 12:38:42 -0000
@@ -1302,3 +1302,32 @@
 >6
 >9
 >1 2 3 4 5 6 7 8 9
+
+   testfn() {
+     emulate -L sh
+     set -A foo 1 2 3
+     set -- 1 2 3
+     str=abc
+     echo ${foo[*]:0:1}
+     echo ${foo[*]:1:1}
+     echo ${foo[*]: -1:1}
+     :
+     echo ${*:0:1}
+     echo ${*:1:1}
+     echo ${*: -1:1}
+     :
+     echo ${str:0:1}
+     echo ${str:1:1}
+     echo ${str: -1:1}
+   }
+   testfn
+0:Bash-style subscripts, Bourne-style indexing
+>1
+>2
+>3
+>testfn
+>1
+>3
+>a
+>b
+>c

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-17 16:54 PATCH: bash-style substrings & subarrays Peter Stephenson
  2010-11-18 12:44 ` Peter Stephenson
@ 2010-11-19 18:01 ` Bart Schaefer
  2010-11-20 21:15   ` Peter Stephenson
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2010-11-19 18:01 UTC (permalink / raw)
  To: Zsh hackers list

On Nov 17,  4:54pm, Peter Stephenson wrote:
}
} This implements the ${NAME:OFFSET} and ${NAME:OFFSET:LENGTH} syntax.
} This is basically for compatibility; we don't need the extra
} functionality, but it's a syntax people are nowadays assuming they can
} use.

I'm wondering whether :OFFSET:LENGTH shouldn't always use KSH_ARRAYS
semantics, or be a valid syntax only when KSH_ARRAYS is set?  If it's
for compatibility with people who are assuming it works, those people
are also going to assume it has zero-offset, aren't they?

There may be some error cases not yet caught:

schaefer<508> foo=123456789
schaefer<509> unset y x
schaefer<510> echo ${foo:$y:$x} 
zsh: bad math expression: illegal character: Ý

That Ý looks like uninitialized memory garbage.

-- 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-19 18:01 ` Bart Schaefer
@ 2010-11-20 21:15   ` Peter Stephenson
  2010-11-21  6:34     ` Bart Schaefer
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2010-11-20 21:15 UTC (permalink / raw)
  To: Zsh hackers list

On Fri, 19 Nov 2010 10:01:45 -0800
Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Nov 17,  4:54pm, Peter Stephenson wrote:
> }
> } This implements the ${NAME:OFFSET} and ${NAME:OFFSET:LENGTH} syntax.
> } This is basically for compatibility; we don't need the extra
> } functionality, but it's a syntax people are nowadays assuming they can
> } use.
> 
> I'm wondering whether :OFFSET:LENGTH shouldn't always use KSH_ARRAYS
> semantics, or be a valid syntax only when KSH_ARRAYS is set?  If it's
> for compatibility with people who are assuming it works, those people
> are also going to assume it has zero-offset, aren't they?

I can see that if you interpret the word "OFFSET" literally it's
different from a subscript and you might interpret it as starting from 0
in any case.  I still think on balance consistency with normal
subscripting is preferable.  I don't think half-measures compatibility
with other shells is particularly useful, in the end is probably more
confusing when you find some things work the way you expect and some
things don't.  However, with a stress in the documentation on the fact
that it's an offset, not a subscript, I can see there's an argument for
the other way.

> There may be some error cases not yet caught:
> 
> schaefer<508> foo=123456789
> schaefer<509> unset y x
> schaefer<510> echo ${foo:$y:$x} 
> zsh: bad math expression: illegal character: Ý
> 
> That Ý looks like uninitialized memory garbage.

It's the fact that empty strings turn into Nularg. I should be tidying
up the string after the expansion.  After the patch empty expansions evaluate
to 0, consistent with the effect of 

% print $(( ))
0

Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.110
diff -p -u -r1.110 subst.c
--- Src/subst.c	18 Nov 2010 13:57:19 -0000	1.110
+++ Src/subst.c	20 Nov 2010 20:58:22 -0000
@@ -1403,6 +1403,8 @@ check_colon_subscript(char *str, char **
     if (parsestr(str = dupstring(str)))
 	return NULL;
     singsub(&str);
+    remnulargs(str);
+    untokenize(str);
 
     **endp = sav;
     return str;

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-20 21:15   ` Peter Stephenson
@ 2010-11-21  6:34     ` Bart Schaefer
  2010-11-21 17:02       ` Peter Stephenson
  0 siblings, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2010-11-21  6:34 UTC (permalink / raw)
  To: Zsh hackers list

On Nov 20,  9:15pm, Peter Stephenson wrote:
} Subject: Re: PATCH: bash-style substrings & subarrays
}
} On Fri, 19 Nov 2010 10:01:45 -0800
} Bart Schaefer <schaefer@brasslantern.com> wrote:
} > On Nov 17,  4:54pm, Peter Stephenson wrote:
} > }
} > } This implements the ${NAME:OFFSET} and ${NAME:OFFSET:LENGTH} syntax.
} > 
} > I'm wondering whether :OFFSET:LENGTH shouldn't always use KSH_ARRAYS
} > semantics, or be a valid syntax only when KSH_ARRAYS is set?
} 
} I can see that if you interpret the word "OFFSET" literally it's
} different from a subscript and you might interpret it as starting
} from 0 in any case. I still think on balance consistency with normal
} subscripting is preferable.

I agree that consistency wth normal subscripting is preferable, but
subscripting behaves like a pair of offsets only when KSH_ARRAYS is
set (hence the second alternative I suggested).

However, I'm mostly indifferent.

} I don't think half-measures compatibility with other shells is
} particularly useful, in the end is probably more confusing when you
} find some things work the way you expect and some things don't.

I'm confused about how that relates to the foregoing, sorry ...?

Thanks for the additional patch.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-21  6:34     ` Bart Schaefer
@ 2010-11-21 17:02       ` Peter Stephenson
  2010-11-21 20:11         ` Bart Schaefer
  2010-11-23 11:14         ` Peter Stephenson
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Stephenson @ 2010-11-21 17:02 UTC (permalink / raw)
  To: Zsh hackers list

Bart Schaefer wrote:
> } I don't think half-measures compatibility with other shells is
> } particularly useful, in the end is probably more confusing when you
> } find some things work the way you expect and some things don't.
> 
> I'm confused about how that relates to the foregoing, sorry ...?

If KSH_ARRAYS is not set, lots of aspects of arrays don't work in a
fashion consistent with other shells, so the fact that one feature you
chanced upon does so, doesn't help you write scripts properly.  If you
want proper compatibility you need a full emulation.

This doesn't impinge on the argument for saying the newly imitated
syntax only deals with offsets and for that reason shouldn't be affected
by KSH_ARRAYS, however.  I'd be happy to get any other views.  Should
${foo:1} always start 1 character/element beyond the first one,
regardless which subscripting rules are in use?  I'm now inclining in
that direction.

-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-21 17:02       ` Peter Stephenson
@ 2010-11-21 20:11         ` Bart Schaefer
  2010-11-21 20:51           ` Greg Klanderman
  2010-11-23 11:14         ` Peter Stephenson
  1 sibling, 1 reply; 10+ messages in thread
From: Bart Schaefer @ 2010-11-21 20:11 UTC (permalink / raw)
  To: Zsh hackers list

On Nov 21,  5:02pm, Peter Stephenson wrote:
}
} If KSH_ARRAYS is not set, lots of aspects of arrays don't work in a
} fashion consistent with other shells, so the fact that one feature you
} chanced upon does so, doesn't help you write scripts properly.  If you
} want proper compatibility you need a full emulation.

That would seem to me to be an argument for making the feature
unavailable when not emulating, rather than for providing it but with
zsh-specific semantics.

Akin to how @(...) or +(...) have no special meaning without KSH_GLOB.

Speaking of full emulation, has anyone looked at ksh's "typeset -T" ?
http://www2.research.att.com/~gsf/man/man1/ksh-man.html#Variable%20Assignments
http://www2.research.att.com/~gsf/man/man1/ksh-man.html#Type%20Variables

-- 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-21 20:11         ` Bart Schaefer
@ 2010-11-21 20:51           ` Greg Klanderman
  0 siblings, 0 replies; 10+ messages in thread
From: Greg Klanderman @ 2010-11-21 20:51 UTC (permalink / raw)
  To: zsh-workers

>>>>> On November 21, 2010 Bart Schaefer <schaefer@brasslantern.com> wrote:

> Speaking of full emulation, has anyone looked at ksh's "typeset -T" ?
> http://www2.research.att.com/~gsf/man/man1/ksh-man.html#Variable%20Assignments
> http://www2.research.att.com/~gsf/man/man1/ksh-man.html#Type%20Variables

Cool; having even simple structures would make fixing the fake-files
and fake-dirs completion styles to not actually stat fake things a lot
easier whenever I get around to tackling that.

Greg


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-21 17:02       ` Peter Stephenson
  2010-11-21 20:11         ` Bart Schaefer
@ 2010-11-23 11:14         ` Peter Stephenson
  2010-11-25 10:35           ` Peter Stephenson
  1 sibling, 1 reply; 10+ messages in thread
From: Peter Stephenson @ 2010-11-23 11:14 UTC (permalink / raw)
  To: Zsh hackers list

On Sun, 21 Nov 2010 17:02:38 +0000
Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> Should ${foo:1} always start 1 character/element beyond the
> first one, regardless which subscripting rules are in use?  I'm now
> inclining in that direction.

Nobody commented but this is the change with some more careful
documentation.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.123
diff -p -u -r1.123 expn.yo
--- Doc/Zsh/expn.yo	18 Nov 2010 13:57:19 -0000	1.123
+++ Doc/Zsh/expn.yo	23 Nov 2010 11:09:33 -0000
@@ -588,23 +588,29 @@ remove the non-matched elements).
 xitem(tt(${)var(name)tt(:)var(offset)tt(}))
 item(tt(${)var(name)tt(:)var(offset)tt(:)var(length)tt(}))(
 This syntax gives effects similar to parameter subscripting
-in the form tt($)var(name)tt({)var(offset)tt(,)var(end)tt(}) but in
-a form compatible with other shells.
+in the form tt($)var(name)tt({)var(start)tt(,)var(end)tt(}), but is
+compatible with other shells; note that both var(offset) and var(length)
+are interpreted differently from the components of a subscript.
+
+If var(offset) is non-negative, then if the variable var(name) is a
+scalar substitute the contents starting var(offset) characters from the
+first character of the string, and if var(name) is an array substitute
+elements starting var(offset) elements from the first element.  If
+var(length) is given, substitute that many characters or elements,
+otherwise the entire rest of the scalar or array.
+
+A positive var(offset) is always treated as the offset of a character or
+element in var(name) from the first character or element of the array
+(this is different from native zsh subscript notation).  Hence 0
+refers to the first character or element regardless of the setting of
+the option tt(KSH_ARRAYS).
 
-If the variable var(name) is a scalar, substitute the contents
-starting from offset var(offset); if var(name) is an array,
-substitute elements from element var(offset).  If var(length) is
-given, substitute that many characters or elements, otherwise the
-entire rest of the scalar or array.
-
-var(offset) is treated similarly to a parameter subscript:
-the offset of the first character or element in var(name)
-is 0 if the option tt(KSH_ARRAYS) is set, else 1; a negative
-subscript counts backwards so that -1 corresponds to the last
-character or element.
+A negative offset counts backwards from the end of the scalar or array,
+so that -1 corresponds to the last character or element, and so on.
 
 var(length) is always treated directly as a length and hence may not be
-negative.
+negative.  The option tt(MULTIBYTE) is obeyed, i.e. the offset and length
+count multibyte characters where appropriate.
 
 var(offset) and var(length) undergo the same set of shell substitutions
 as for scalar assignment; in addition, they are then subject to arithmetic
@@ -615,19 +621,29 @@ print ${foo: 1 + 2}
 print ${foo:$(( 1 + 2))}
 print ${foo:$(echo 1 + 2)})
 
-all have the same effect.
+all have the same effect, extracting the string starting at the fourth
+character of tt($foo) if the substution would otherwise return a scalar,
+or the array starting at the fourth element if tt($foo) would return an
+array.  Note that with the option tt(KSH_ARRAYS) tt($foo) always returns
+a scalar (regardless of the use of the offset syntax) and a form
+such as tt($foo[*]:3) is required to extract elements of an array named
+tt(foo).
 
-Note that if var(offset) is negative, the tt(-) may not appear immediately
+If var(offset) is negative, the tt(-) may not appear immediately
 after the tt(:) as this indicates the
-tt(${)var(name)tt(:-)var(word)tt(}) form of substitution; a space
+tt(${)var(name)tt(:-)var(word)tt(}) form of substitution.  Instead, a space
 may be inserted before the tt(-).  Furthermore, neither var(offset) nor
 var(length) may begin with an alphabetic character or tt(&) as these are
-used to indicate history-style modifiers.
+used to indicate history-style modifiers.  To substitute a value from a
+variable, the recommended approach is to proceed it with a tt($) as this
+signifies the intention (parameter substitution can easily be rendered
+unreadable); however, as arithmetic substitution is performed, the
+expression tt(${var: offs}) does work, retrieving the offset from
+tt($offs).
 
 For further compatibility with other shells there is a special case
-when the tt(KSH_ARRAYS) option is active, as in emulation of
-Bourne-style shells.  In this case array subscript 0 usually refers to the
-first element of the array.  However, if the substitution refers to the
+for array offset 0.  This usually accesses to the
+first element of the array.  However, if the substitution refers the
 positional parameter array, e.g. tt($@) or tt($*), then offset 0
 instead refers to tt($0), offset 1 refers to tt($1), and so on.  In
 other words, the positional parameter array is effectively extended by
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.111
diff -p -u -r1.111 subst.c
--- Src/subst.c	20 Nov 2010 23:46:26 -0000	1.111
+++ Src/subst.c	23 Nov 2010 11:09:34 -0000
@@ -1640,7 +1640,7 @@ paramsubst(LinkList l, LinkNode n, char 
     int subexp;
     /*
      * If we're referring to the positional parameters, then
-     * e.g ${*:1:1} refers to $1 even if KSH_ARRAYS is in effect.
+     * e.g ${*:1:1} refers to $1.
      * This is for compatibility.
      */
     int horrible_offset_hack = 0;
@@ -2768,16 +2768,15 @@ paramsubst(LinkList l, LinkNode n, char 
 			return NULL;
 		    }
 		}
-		if (!isset(KSHARRAYS) || horrible_offset_hack) {
+		if (horrible_offset_hack) {
 		    /*
 		     * As part of the 'orrible hoffset 'ack,
 		     * (what hare you? Han 'orrible hoffset 'ack,
 		     * sergeant major), if we are given a ksh/bash/POSIX
-		     * style array which includes offset 0, we use
-		     * $0.
+		     * style positional parameter array which includes
+		     * offset 0, we use $0.
 		     */
-		    if (isset(KSHARRAYS) && horrible_offset_hack &&
-			offset == 0 && isarr) {
+		    if (offset == 0 && isarr) {
 			offset_hack_argzero = 1;
 		    } else if (offset > 0) {
 			offset--;
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.46
diff -p -u -r1.46 D04parameter.ztst
--- Test/D04parameter.ztst	18 Nov 2010 13:57:19 -0000	1.46
+++ Test/D04parameter.ztst	23 Nov 2010 11:09:34 -0000
@@ -1268,15 +1268,15 @@
    print ${foo:$(echo 3 + 3):`echo 4 - 3`}
    print ${foo: -1}
    print ${foo: -10}
-0:Bash-style subscripts, scalar
->3456789
+0:Bash-style offsets, scalar
 >456789
 >56789
 >6789
->3
+>789
 >4
 >5
 >6
+>7
 >9
 >123456789
 
@@ -1291,15 +1291,15 @@
    print ${foo:$(echo 3 + 3):`echo 4 - 3`}
    print ${foo: -1}
    print ${foo: -10}
-0:Bash-style subscripts, array
->3 4 5 6 7 8 9
+0:Bash-style offsets, array
 >4 5 6 7 8 9
 >5 6 7 8 9
 >6 7 8 9
->3
+>7 8 9
 >4
 >5
 >6
+>7
 >9
 >1 2 3 4 5 6 7 8 9
 
@@ -1321,7 +1321,7 @@
      echo ${str: -1:1}
    }
    testfn
-0:Bash-style subscripts, Bourne-style indexing
+0:Bash-style offsets, Bourne-style indexing
 >1
 >2
 >3

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PATCH: bash-style substrings & subarrays
  2010-11-23 11:14         ` Peter Stephenson
@ 2010-11-25 10:35           ` Peter Stephenson
  0 siblings, 0 replies; 10+ messages in thread
From: Peter Stephenson @ 2010-11-25 10:35 UTC (permalink / raw)
  To: Zsh hackers list

On Tue, 23 Nov 2010 11:14:23 +0000
Peter Stephenson <Peter.Stephenson@csr.com> wrote:
> On Sun, 21 Nov 2010 17:02:38 +0000
> Peter Stephenson <p.w.stephenson@ntlworld.com> wrote:
> > Should ${foo:1} always start 1 character/element beyond the
> > first one, regardless which subscripting rules are in use?  I'm now
> > inclining in that direction.
> 
> Nobody commented but this is the change with some more careful
> documentation.

I've committed this, doesn't seem controversial.

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-11-25 11:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-17 16:54 PATCH: bash-style substrings & subarrays Peter Stephenson
2010-11-18 12:44 ` Peter Stephenson
2010-11-19 18:01 ` Bart Schaefer
2010-11-20 21:15   ` Peter Stephenson
2010-11-21  6:34     ` Bart Schaefer
2010-11-21 17:02       ` Peter Stephenson
2010-11-21 20:11         ` Bart Schaefer
2010-11-21 20:51           ` Greg Klanderman
2010-11-23 11:14         ` Peter Stephenson
2010-11-25 10:35           ` Peter Stephenson

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).