zsh-workers
 help / color / mirror / code / Atom feed
* zsh crashes on completeion of utf-8 file-names.
@ 2003-12-21 14:44 Zvi Har'El
  2004-01-05 14:17 ` Peter Stephenson
  0 siblings, 1 reply; 8+ messages in thread
From: Zvi Har'El @ 2003-12-21 14:44 UTC (permalink / raw)
  To: Zsh hackers list

Hi,

I know that zsh-4.1.1 still doesn't support utf-8, but as realeased it could
do completion on utf-8 file names. However, I recently updated from the cvs
and now zsh crashes on completions of names, when I have two candidates of the
form RA and RB, and I hit R<TAB>. This happens when R=U+05E8 (0xd7 0xa8) or
U+05E9 (0xd7 0xa9) and A and B are U+05D0 and U+05D1. This is the Hebrew
range. I tried to recreate the problem in the latin1 supplamental rage
(U+0080..U+00FF) and didn't succeed. I produced a debug trace by configuring
zsh with CFLAGS=-g and LDFLAGS=-g and here it is:

/usr/local/src/build/zsh$ gdb Src/zsh
GNU gdb Red Hat Linux (5.3post-0.20021129.18rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(gdb) run
Starting program: /usr/local/src/build/zsh/Src/zsh 
/usr/local/src/build/zsh$ echo $ZSH_VERSION
4.1.1-dev-1
/usr/local/src/build/zsh$ ls tmp
doit  רא  רב  שא  שב
/usr/local/src/build/zsh$ ls tmp/ר
Program received signal SIGSEGV, Segmentation fault.
0x080b6b3a in ztrsub (t=0x81419b3 "", 
    s=0x8249001 <Address 0x8249001 out of bounds>) at utils.c:2875
2875            if (*s++ == Meta) {
(gdb) bt
#0  0x080b6b3a in ztrsub (t=0x81419b3 "", 
    s=0x8249001 <Address 0x8249001 out of bounds>) at utils.c:2875
#1  0x402af206 in unmetafy_line () at zle_tricky.c:918
#2  0x402aed72 in docomplete (lst=0) at zle_tricky.c:820
#3  0x402ad97a in expandorcomplete (args=0x402bf250) at zle_tricky.c:288
#4  0x402ad53d in completecall (args=0x402bf250) at zle_tricky.c:182
#5  0x402a1912 in execzlefunc (func=0x402bd648, args=0x402bf250)
    at zle_main.c:903
#6  0x402a1047 in zlecore () at zle_main.c:696
#7  0x402a1609 in zleread (lp=0x80e77c0 "%~%(#.#.$) ", rp=0x0, flags=7, 
    context=0) at zle_main.c:840
#8  0x0807c161 in inputline () at input.c:277
#9  0x0807c02b in ingetc () at input.c:214
#10 0x08074079 in ihgetc () at hist.c:241
#11 0x08082b59 in gettok () at lex.c:631
#12 0x08082461 in yylex () at lex.c:347
#13 0x080991f1 in parse_event () at parse.c:449
#14 0x08079332 in loop (toplevel=1, justonce=0) at init.c:128
#15 0x0807bcb1 in zsh_main (argc=1, argv=0xbfffe8d4) at init.c:1272
#16 0x08052226 in main (argc=1, argv=0xbfffe8d4) at main.c:37
#17 0x42015704 in __libc_start_main () from /lib/tls/libc.so.6

(gdb) p line
$1 = (unsigned char *) 0x81419a8 "ls tmp/ר�\203"
(gdb) p line[6]
$5 = 47 '/'
(gdb) p line[7]  <==== This and the following one is the UTF-8 0xd7 0xa8
$6 = 215 '�'
(gdb) p line[8] 
$7 = 168 '�'
(gdb) p line[9]  <==== This is a UTF-8 0xd7 byte without the following one
$8 = 215 '�'
(gdb) p line[10] <==== This is zsh's "Meta", at the end of the string!!!
$9 = 131 '\203'
(gdb) p line[11]
$10 = 0 '\0'


One can easily see that the code of ztrsub at Src/util.c line 2870 is really
buggy, since if DEBUG is not set, one never checks for the end of string, and
if Meta falls in the end, we are screwed up. However, This code was already
there when the tag zsh-4_1_1 was generated, so I cannot see what triggered the
problem. Really, this "Meta" stuff shouldn't be the last character in the
string!  


-- 
Dr. Zvi Har'El     mailto:rl@math.technion.ac.il     Department of Mathematics
tel:+972-54-227607 icq:179294841     Technion - Israel Institute of Technology
fax:+972-4-8293388 http://www.math.technion.ac.il/~rl/     Haifa 32000, ISRAEL
"If you can't say somethin' nice, don't say nothin' at all." -- Thumper (1942)
                             Sunday, 26 Kislev 5764, 21 December 2003,  4:18PM


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh crashes on completeion of utf-8 file-names.
  2003-12-21 14:44 zsh crashes on completeion of utf-8 file-names Zvi Har'El
@ 2004-01-05 14:17 ` Peter Stephenson
  2004-01-05 16:07   ` Peter Stephenson
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2004-01-05 14:17 UTC (permalink / raw)
  To: Zsh hackers list

"Zvi Har'El" wrote:
> I know that zsh-4.1.1 still doesn't support utf-8, but as realeased it could
> do completion on utf-8 file names. However, I recently updated from the cvs
> and now zsh crashes on completions of names, when I have two candidates of th
> e
> form RA and RB, and I hit R<TAB>. This happens when R=U+05E8 (0xd7 0xa8) or
> U+05E9 (0xd7 0xa9) and A and B are U+05D0 and U+05D1.

I managed to reproduce this with zsh -f on a directory containing only
the files

touch $'\xd7\xa8\xd7\x90'
touch $'\xd7\xa8\xd7\x91'

As you say, it shouldn't be possible to have a Meta immediately before
the last character.  If there was a Meta or a NUL in the input it should
have been xored with 32 and a Meta stuck in front of it.  This suggests
the bug is actually earlier than ztrsub.

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR Ltd., Science Park, Milton Road,
Cambridge, CB4 0WH, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh crashes on completeion of utf-8 file-names.
  2004-01-05 14:17 ` Peter Stephenson
@ 2004-01-05 16:07   ` Peter Stephenson
  2004-01-05 17:08     ` Wayne Davison
  0 siblings, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2004-01-05 16:07 UTC (permalink / raw)
  To: Zsh hackers list

Peter Stephenson wrote:
> "Zvi Har'El" wrote:
> > I know that zsh-4.1.1 still doesn't support utf-8, but as realeased it coul
> d
> > do completion on utf-8 file names. However, I recently updated from the cvs
> > and now zsh crashes on completions of names, when I have two candidates of 
> th
> > e
> > form RA and RB, and I hit R<TAB>. This happens when R=U+05E8 (0xd7 0xa8) or
> > U+05E9 (0xd7 0xa9) and A and B are U+05D0 and U+05D1.
> 
> I managed to reproduce this with zsh -f on a directory containing only
> the files
> 
> touch $'\xd7\xa8\xd7\x90'
> touch $'\xd7\xa8\xd7\x91'

Phew.  I hope this really is the fix because I don't want to have to
trawl around down there again...

Deep inside the completion code, it works out whether two matches have a
common prefix.  It was treating Meta characters as ordinary characters.
So the unambiguous part of the string to insert got truncated at the
Meta.

This would have shown up any time the only possible matches differed by
a metafied character.  Unless you have a lot of strings with 8-bit
characters, that's probably not very common.

This seems to fix it here.

(By the way, every time I send a message saying I think setting lastval
= 0 in bin_eval is the correct fix for the eval "" bug, it disappears.
So let's see if tacking it on here confuses the system sufficiently.)

Index: Src/Zle/compmatch.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/compmatch.c,v
retrieving revision 1.37
diff -u -r1.37 compmatch.c
--- Src/Zle/compmatch.c	8 Aug 2001 07:41:01 -0000	1.37
+++ Src/Zle/compmatch.c	5 Jan 2004 16:04:39 -0000
@@ -1584,8 +1584,15 @@
 	if (check_cmdata(md, sfx))
 	    return ret;
 
+	/*
+	 * Look for a common prefix.  Be careful not to include
+	 * a widowed Meta in the prefix.  If we do include metafied
+	 * characters, at this stage we still need the overall length
+	 * including Meta's as separate characters.
+	 */
 	for (l = 0, p = str, q = md->str;
-	     l < len && l < md->len && p[ind] == q[ind];
+	     l < len && l < md->len && p[ind] == q[ind]
+		 && (p[ind] != Meta || p[ind+1] == q[ind+1]);
 	     l++, p += add, q += add);
 
 	if (l) {

-- 
Peter Stephenson <pws@csr.com>                  Software Engineer
CSR Ltd., Science Park, Milton Road,
Cambridge, CB4 0WH, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh crashes on completeion of utf-8 file-names.
  2004-01-05 16:07   ` Peter Stephenson
@ 2004-01-05 17:08     ` Wayne Davison
  2004-02-06 16:57       ` Wayne Davison
  0 siblings, 1 reply; 8+ messages in thread
From: Wayne Davison @ 2004-01-05 17:08 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

On Mon, Jan 05, 2004 at 04:07:54PM +0000, Peter Stephenson wrote:
> (By the way, every time I send a message saying I think setting lastval
> = 0 in bin_eval is the correct fix for the eval "" bug, it disappears.
> So let's see if tacking it on here confuses the system sufficiently.)

Looks like you managed to fool it that time!  I checked this in.

> +	     l < len && l < md->len && p[ind] == q[ind]
> +		 && (p[ind] != Meta || p[ind+1] == q[ind+1]);
>  	     l++, p += add, q += add);

I think it would be more optimal to sanity-check the last value after
the loop finishes, like this (untested):

--- Src/Zle/compmatch.c	8 Aug 2001 07:41:01 -0000	1.37
+++ Src/Zle/compmatch.c	5 Jan 2004 17:03:07 -0000
@@ -1589,7 +1589,9 @@
 	     l++, p += add, q += add);
 
 	if (l) {
-	    /* There was a common prefix, use it. */
+	    /* There was a common prefix, use it, but don't end on meta. */
+	    if (p[ind-add] == Meta)
+		l--;
 	    md->len -= l; len -= l;
 	    if (sfx) {
 		md->str -= l; str -= l;

..wayne..


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh crashes on completeion of utf-8 file-names.
  2004-01-05 17:08     ` Wayne Davison
@ 2004-02-06 16:57       ` Wayne Davison
  2004-02-06 19:03         ` Bart Schaefer
  2004-02-09 22:34         ` Wayne Davison
  0 siblings, 2 replies; 8+ messages in thread
From: Wayne Davison @ 2004-02-06 16:57 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 283 bytes --]

On Mon, Jan 05, 2004 at 09:08:50AM -0800, Wayne Davison wrote:
> I think it would be more optimal to sanity-check the last value after
> the loop finishes, like this (untested):

Attached is a patch that should actually work (unlike the previous one).
See if you like it.

..wayne..

[-- Attachment #2: meta.patch --]
[-- Type: text/plain, Size: 982 bytes --]

--- Src/Zle/compmatch.c	20 Jan 2004 10:55:28 -0000	1.39
+++ Src/Zle/compmatch.c	6 Feb 2004 16:46:42 -0000
@@ -1585,16 +1585,18 @@ sub_match(Cmdata md, char *str, int len,
 	    return ret;
 
 	/*
-	 * Look for a common prefix.  Be careful not to include
-	 * a widowed Meta in the prefix.  If we do include metafied
+	 * Look for a common prefix.  If we do include metafied
 	 * characters, at this stage we still need the overall length
 	 * including Meta's as separate characters.
 	 */
 	for (l = 0, p = str, q = md->str;
-	     l < len && l < md->len && p[ind] == q[ind]
-		 && (p[ind] != Meta || p[ind+1] == q[ind+1]);
-	     l++, p += add, q += add);
+	     l < len && l < md->len && p[ind] == q[ind];
+	     l++, p += add, q += add) {}
 
+	/* Make sure we don't end with a widowed Meta (which can only
+	 * happen in a forward scan). */
+	if (l && add == 1 && p[-1] == Meta)
+	    l--;
 	if (l) {
 	    /* There was a common prefix, use it. */
 	    md->len -= l; len -= l;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh crashes on completeion of utf-8 file-names.
  2004-02-06 16:57       ` Wayne Davison
@ 2004-02-06 19:03         ` Bart Schaefer
  2004-02-09 22:34         ` Wayne Davison
  1 sibling, 0 replies; 8+ messages in thread
From: Bart Schaefer @ 2004-02-06 19:03 UTC (permalink / raw)
  To: Zsh hackers list

On Feb 6,  8:57am, Wayne Davison wrote:
}
} Attached is a patch that should actually work (unlike the previous one).
} See if you like it.

Looks good to me ...


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh crashes on completeion of utf-8 file-names.
  2004-02-06 16:57       ` Wayne Davison
  2004-02-06 19:03         ` Bart Schaefer
@ 2004-02-09 22:34         ` Wayne Davison
  2004-02-09 22:36           ` Wayne Davison
  1 sibling, 1 reply; 8+ messages in thread
From: Wayne Davison @ 2004-02-09 22:34 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

[-- Attachment #1: Type: text/plain, Size: 475 bytes --]

I was thinking about the case of the reverse scan, and it occurred to me
that if the character that followed the Meta matched a normal character
in the other string, a reverse scan could end up in the middle of a meta
sequence.  This patch fixes this.  Note that this code depends on the
(apparent) fact that a meta char cannot be followed by the same value
(e.g. we must be sure that when we see a meta value, it is the start of
a meta sequence and not the end).

..wayne..

[-- Attachment #2: meta.patch --]
[-- Type: text/plain, Size: 706 bytes --]

--- compmatch.c	9 Feb 2004 05:49:52 -0000	1.40
+++ compmatch.c	9 Feb 2004 22:03:27 -0000
@@ -1593,10 +1593,15 @@ sub_match(Cmdata md, char *str, int len,
 	     l < len && l < md->len && p[ind] == q[ind];
 	     l++, p += add, q += add) {}
 
-	/* Make sure we don't end with a widowed Meta (which can only
-	 * happen in a forward scan). */
-	if (l && add == 1 && p[-1] == Meta)
-	    l--;
+	/* Make sure we don't end in the middle of a Meta sequence. */
+	if (add = 1) {
+	    if (l && p[-1] == Meta)
+		l--;
+	} else {
+	    if (l && ((l < len && p[-1] == Meta)
+		   || (l < md->len && q[-1] == Meta)))
+		l--;
+	}
 	if (l) {
 	    /* There was a common prefix, use it. */
 	    md->len -= l; len -= l;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh crashes on completeion of utf-8 file-names.
  2004-02-09 22:34         ` Wayne Davison
@ 2004-02-09 22:36           ` Wayne Davison
  0 siblings, 0 replies; 8+ messages in thread
From: Wayne Davison @ 2004-02-09 22:36 UTC (permalink / raw)
  To: Peter Stephenson; +Cc: Zsh hackers list

On Mon, Feb 09, 2004 at 02:34:40PM -0800, Wayne Davison wrote:
> +	if (add = 1) {

Note also that I had already fixed this line to use "==", but failed to
update my patch before sending it.

..wayne..


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2004-02-09 22:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-12-21 14:44 zsh crashes on completeion of utf-8 file-names Zvi Har'El
2004-01-05 14:17 ` Peter Stephenson
2004-01-05 16:07   ` Peter Stephenson
2004-01-05 17:08     ` Wayne Davison
2004-02-06 16:57       ` Wayne Davison
2004-02-06 19:03         ` Bart Schaefer
2004-02-09 22:34         ` Wayne Davison
2004-02-09 22:36           ` Wayne Davison

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).