zsh-workers
 help / color / mirror / code / Atom feed
* Obscure UTF-8 bug in parameter expansion?
@ 2005-09-05 19:25 Bart Schaefer
  2005-09-05 21:42 ` Obscure [not UTF-8] " Bart Schaefer
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Schaefer @ 2005-09-05 19:25 UTC (permalink / raw)
  To: zsh-workers

(See zsh-users/9389 for background.)

Using the latest zsh from CVS, I was just trying out the command

	zargs -- sur pur -- echo {} -foo bar

and zargs went into an infinite loop.  I found this rather baffling,
as it had been pretty thoroughly tested, so I ran it with `set -x' and
encountered this:

[...]
: zargs:233; s=20480 
: zargs:234; l=ÿ 
[...]

The assignment to `l' there should always be a number, unless one has
abused the --max-lines option.  The actual declaration/assignment is:

local -a opts eof n s l P i
# An intervening zparseopts call may assign to l or may not
l=${${${l##*-(l|L|-max-lines(=|))}[-1]}:-${${l[1]:+1}:-$ARGC}}

So apparently when `l' should be the empty array (as it was declared on
line 68) it is instead the garbage string y-with-an-umlaut.  I'm pretty
sure this did not occur before --enable-multibyte became the default.
However, recompiling with --disable-multibyte doesn't change anything,
so the problem may be in code that changed independent of the multibyte
support.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Obscure [not UTF-8] bug in parameter expansion?
  2005-09-05 19:25 Obscure UTF-8 bug in parameter expansion? Bart Schaefer
@ 2005-09-05 21:42 ` Bart Schaefer
  2005-09-06 10:16   ` Peter Stephenson
  0 siblings, 1 reply; 4+ messages in thread
From: Bart Schaefer @ 2005-09-05 21:42 UTC (permalink / raw)
  To: zsh-workers

On Sep 5,  7:25pm, Bart Schaefer wrote:
} 
} [...]
} : zargs:233; s=20480 
} : zargs:234; l=ÿ 
} [...]

I've narrowed this down to a test case:

foo() { typeset -a l; echo X${${l##*}[-1]}X }

It looks like some recent change has broken negative indexing on an
empty match result (it happens with a match result on either an array
or a string, but not on a plain empty string).

This is broken in 4.2.5 and 4.2.3 as well, but not in 4.2.0.  I don't
have 4.2.1, 4.2.2 or 4.2.4 handy to test against.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Obscure [not UTF-8] bug in parameter expansion?
  2005-09-05 21:42 ` Obscure [not UTF-8] " Bart Schaefer
@ 2005-09-06 10:16   ` Peter Stephenson
  2005-09-06 15:06     ` Bart Schaefer
  0 siblings, 1 reply; 4+ messages in thread
From: Peter Stephenson @ 2005-09-06 10:16 UTC (permalink / raw)
  To: zsh-workers

Bart Schaefer <schaefer@brasslantern.com> wrote:
> On Sep 5,  7:25pm, Bart Schaefer wrote:
> } 
> } [...]
> } : zargs:233; s=20480 
> } : zargs:234; l=ÿ 
> } [...]
> 
> I've narrowed this down to a test case:
> 
> foo() { typeset -a l; echo X${${l##*}[-1]}X }
> 
> It looks like some recent change has broken negative indexing on an
> empty match result (it happens with a match result on either an array
> or a string, but not on a plain empty string).
> 
> This is broken in 4.2.5 and 4.2.3 as well, but not in 4.2.0.  I don't
> have 4.2.1, 4.2.2 or 4.2.4 handy to test against.

We usually take account of the length of a value in getstrvalue() and
siblings, so it may be this chunk in getstrvalue() that needs extending
somehow (s is the full string, here an empty string):

    if (v->start == 0 && v->end == -1)
	return s;

    if (v->start < 0)
	v->start += strlen(s);
    if (v->end < 0)
	v->end += strlen(s) + 1;
    s = (v->start > (int)strlen(s)) ? dupstring("") : dupstring(s + v->start);
    if (v->end <= v->start)
	s[0] = '\0';
    else if (v->end - v->start <= (int)strlen(s))
	s[v->end - v->start + (s[v->end - v->start - 1] == Meta)] = '\0';

(If you wondering why it's a string not an array, ${l##*} is treated as a
scalar on return when l is empty.  I don't know if this is right or not
because multsub() is a black box to me.  A strict reading of rule 1 of the
parameter substitution rules suggests it might be wrong, but on the other
hand the user shouldn't really see the effect.  However, the bug in this
chunk of code is present anyway, see below.)

You can see that if strlen(s) is 0 and v->start is -1, which I believe is
correct up to this point, you can end up with an invalid index.  v->end
starts as -1 (this is normal) and so gets massaged to 0.

I think the right fix is to sanitise v->start to 0 if it is still negative.
This affects this example:

l=(foo)
print ${${l}[1][-4,1]}

With this fix, you get "f"; if instead you rejected the whole thing if
v->start were invalid you would get an empty string.  The version
here seems to fit better with the usual zsh philosophy of treating
nonexistent bits as empty.  (The example is different because l is
returned as an array.)

You can see the underlying bug in the current code with a simple example:

% l=foo
% print ${l[-4,1]}
 f

The stuff before the f was garbage.

There was a flurry of activity on this chunk of code back in 2000, but
nothing since.  Maybe it previously worked as a side effect of something
else (possibly a change in multsub()), but as far as I can see this is the
right place to fix it.

If this looks OK I will patch it 4.2.

Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.101
diff -u -r1.101 params.c
--- Src/params.c	22 Aug 2005 11:43:36 -0000	1.101
+++ Src/params.c	6 Sep 2005 10:08:14 -0000
@@ -1565,8 +1565,11 @@
     if (v->start == 0 && v->end == -1)
 	return s;
 
-    if (v->start < 0)
+    if (v->start < 0) {
 	v->start += strlen(s);
+	if (v->start < 0)
+	    v->start = 0;
+    }
     if (v->end < 0)
 	v->end += strlen(s) + 1;
     s = (v->start > (int)strlen(s)) ? dupstring("") : dupstring(s + v->start);
Index: Test/D06subscript.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D06subscript.ztst,v
retrieving revision 1.8
diff -u -r1.8 D06subscript.ztst
--- Test/D06subscript.ztst	13 Aug 2001 17:45:03 -0000	1.8
+++ Test/D06subscript.ztst	6 Sep 2005 10:08:14 -0000
@@ -178,3 +178,8 @@
 0:Keys with double quotes and the (P) expansion flag
 >lower
 >upper
+
+  typeset -a empty_array
+  echo X${${l##*}[-1]}X
+0:Negative index applied to substition result from empty array
+>XX


-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Obscure [not UTF-8] bug in parameter expansion?
  2005-09-06 10:16   ` Peter Stephenson
@ 2005-09-06 15:06     ` Bart Schaefer
  0 siblings, 0 replies; 4+ messages in thread
From: Bart Schaefer @ 2005-09-06 15:06 UTC (permalink / raw)
  To: zsh-workers

On Sep 6, 11:16am, Peter Stephenson wrote:
} 
} If this looks OK I will patch it 4.2.

It looks right to me.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-09-06 15:06 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-09-05 19:25 Obscure UTF-8 bug in parameter expansion? Bart Schaefer
2005-09-05 21:42 ` Obscure [not UTF-8] " Bart Schaefer
2005-09-06 10:16   ` Peter Stephenson
2005-09-06 15:06     ` Bart Schaefer

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).