zsh-workers
 help / color / mirror / code / Atom feed
* zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
@ 2010-03-18 10:50 Vincent Lefevre
  2010-03-18 14:30 ` Peter Stephenson
  2010-03-18 14:55 ` Bart Schaefer
  0 siblings, 2 replies; 8+ messages in thread
From: Vincent Lefevre @ 2010-03-18 10:50 UTC (permalink / raw)
  To: zsh-workers

Hi,

I get a reproducible freeze of zsh 4.3.10 when searching the history
backward (history-incremental-search-backward). I think the problem
comes from the fact that the history contains command lines encoded
in ISO-8859-1 while I'm currently under UTF-8 locales (the freeze is
not reproducible under ISO-8859-1 locales).

Is this a known problem? Has it been fixed?

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
  2010-03-18 10:50 zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre
@ 2010-03-18 14:30 ` Peter Stephenson
  2010-03-18 15:20   ` Vincent Lefevre
  2010-03-18 14:55 ` Bart Schaefer
  1 sibling, 1 reply; 8+ messages in thread
From: Peter Stephenson @ 2010-03-18 14:30 UTC (permalink / raw)
  To: zsh-workers

Vincent Lefevre wrote:
> Hi,
> 
> I get a reproducible freeze of zsh 4.3.10 when searching the history
> backward (history-incremental-search-backward). I think the problem
> comes from the fact that the history contains command lines encoded
> in ISO-8859-1 while I'm currently under UTF-8 locales (the freeze is
> not reproducible under ISO-8859-1 locales).
> 
> Is this a known problem? Has it been fixed?

I can't think of anything that's changed that would have fixed this (but
it's been a long time now and we're due for another releases).  I won't
be doing any work on it until I get a reproducible case I can run.  It
sounds like it should be fairly trivial to fix if I can find out what
(misinterpreted) characters are causing a freeze at what point in the
code.

-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
  2010-03-18 10:50 zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre
  2010-03-18 14:30 ` Peter Stephenson
@ 2010-03-18 14:55 ` Bart Schaefer
  2010-03-18 15:24   ` Vincent Lefevre
  1 sibling, 1 reply; 8+ messages in thread
From: Bart Schaefer @ 2010-03-18 14:55 UTC (permalink / raw)
  To: zsh-workers

On Mar 18, 11:50am, Vincent Lefevre wrote:
} Subject: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
} 
} Is this a known problem? Has it been fixed?

I guess it's "known" since you reported it about 10 days ago. :-/

Looking back quickly at that report ... I ignored it because I have
no idea how to interpret output from "sample".  I'd never even heard
of "sample" until you referenced it there.

} I get a reproducible freeze of zsh 4.3.10 when searching the history
} backward (history-incremental-search-backward). I think the problem
} comes from the fact that the history contains command lines encoded
} in ISO-8859-1 while I'm currently under UTF-8 locales (the freeze is
} not reproducible under ISO-8859-1 locales).

Combining this information with what I see in the "sample" output, my
guess is that an attempt to iterate over a multibyte string using
the multibyte libraries is hitting a spot where interpreting ISO-8859
as UTF-8 results in some kind of failure to advance the character
index, which then causes the search to keep scanning the same history
entry repeatedly.

Probably here:

	    while (charpos < end_pos) {
		ret = mb_metacharlenconv_r(zlemetaline + charpos, &wc, &mbs);
		if (charpos <= pos && pos < charpos + ret)
		    isearch_startpos = charcount;
		charcount++;
		charpos += ret;
	    }

There needs to be some kind of sane behavior if ret == 0, or the
mb_metacharlenconv_r() function in utils.c must assure that it never
does return zero:

    if (ptr > s) {
	return 1 + (*s == Meta);	/* Treat as single byte character */
    } else
	return 0;		/* Probably shouldn't happen */

Possibly related to this thread back in November:

http://www.zsh.org/mla/workers/2009/msg01103.html

There doesn't seem to have been any resolution to that, but it's hard
to follow all the way to the end with the archive.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
  2010-03-18 14:30 ` Peter Stephenson
@ 2010-03-18 15:20   ` Vincent Lefevre
  2010-03-18 15:44     ` Peter Stephenson
  0 siblings, 1 reply; 8+ messages in thread
From: Vincent Lefevre @ 2010-03-18 15:20 UTC (permalink / raw)
  To: zsh-workers

On 2010-03-18 14:30:20 +0000, Peter Stephenson wrote:
> I can't think of anything that's changed that would have fixed this (but
> it's been a long time now and we're due for another releases).  I won't
> be doing any work on it until I get a reproducible case I can run.  It
> sounds like it should be fairly trivial to fix if I can find out what
> (misinterpreted) characters are causing a freeze at what point in the
> code.

I think this should be reproducible with shared history file:

1. Type in an ISO-8859-1 terminal: echo zzé zz
2. In a UTF-8 terminal: history-incremental-search-backward then type z

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
  2010-03-18 14:55 ` Bart Schaefer
@ 2010-03-18 15:24   ` Vincent Lefevre
  0 siblings, 0 replies; 8+ messages in thread
From: Vincent Lefevre @ 2010-03-18 15:24 UTC (permalink / raw)
  To: zsh-workers

On 2010-03-18 07:55:23 -0700, Bart Schaefer wrote:
> On Mar 18, 11:50am, Vincent Lefevre wrote:
> } Subject: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
> } 
> } Is this a known problem? Has it been fixed?
> 
> I guess it's "known" since you reported it about 10 days ago. :-/

I didn't remember, and at that time, I didn't know that this was
an encoding problem. I can now confirm this problem, under both
Mac OS X and Linux. See my other mail for a testcase.

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
  2010-03-18 15:20   ` Vincent Lefevre
@ 2010-03-18 15:44     ` Peter Stephenson
  2010-03-20  0:29       ` PATCH: Displaying invalid characters Peter Stephenson
  2010-03-25 16:43       ` zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Stephenson @ 2010-03-18 15:44 UTC (permalink / raw)
  To: zsh-workers

On Thu, 18 Mar 2010 16:20:56 +0100
Vincent Lefevre <vincent@vinc17.net> wrote:
> On 2010-03-18 14:30:20 +0000, Peter Stephenson wrote:
> > I can't think of anything that's changed that would have fixed this (but
> > it's been a long time now and we're due for another releases).  I won't
> > be doing any work on it until I get a reproducible case I can run.  It
> > sounds like it should be fairly trivial to fix if I can find out what
> > (misinterpreted) characters are causing a freeze at what point in the
> > code.
> 
> I think this should be reproducible with shared history file:
> 
> 1. Type in an ISO-8859-1 terminal: echo zzé zz
> 2. In a UTF-8 terminal: history-incremental-search-backward then type z

Yes, I can see that.  All that's really on offer with the wrong character
set is it doesn't hang...

Index: Src/Zle/zle_hist.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_hist.c,v
retrieving revision 1.63
diff -p -u -r1.63 zle_hist.c
--- Src/Zle/zle_hist.c	6 Sep 2009 21:37:14 -0000	1.63
+++ Src/Zle/zle_hist.c	18 Mar 2010 15:42:53 -0000
@@ -1454,6 +1454,8 @@ doisearch(char **args, int dir, int patt
 	    memset(&mbs, 0, sizeof(mbs));
 	    while (charpos < end_pos) {
 		ret = mb_metacharlenconv_r(zlemetaline + charpos, &wc, &mbs);
+		if (ret <= 0) /* Unrecognised, treat as single char */
+		    ret = 1;
 		if (charpos <= pos && pos < charpos + ret)
 		    isearch_startpos = charcount;
 		charcount++;


-- 
Peter Stephenson <pws@csr.com>            Software Engineer
Tel: +44 (0)1223 692070                   Cambridge Silicon Radio Limited
Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


^ permalink raw reply	[flat|nested] 8+ messages in thread

* PATCH: Displaying invalid characters
  2010-03-18 15:44     ` Peter Stephenson
@ 2010-03-20  0:29       ` Peter Stephenson
  2010-03-25 16:43       ` zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre
  1 sibling, 0 replies; 8+ messages in thread
From: Peter Stephenson @ 2010-03-20  0:29 UTC (permalink / raw)
  To: zsh-workers

On Thu, 18 Mar 2010 15:44:24 +0000
Peter Stephenson <pws@csr.com> wrote:
> All that's really on offer with the wrong character
> set is it doesn't hang...

Up to now the shell has given up processing a line when it encounters an
invalid character, i.e. a multibyte sequence that doesn't convert to a
wide character. Although it can't show the character correctly
(it simply doesn't know enough), it should be able to show the
character specially in a similar manner to unprintable characters
and continue looking at the line.

All we need to fix this efficiently is a range of 256 entries in wchar_t
that's guaranteed to be invalid.  I find from the Linux Unicode manual
entry that if __STDC_ISO_10646__ is defined, so that wchar_t is a
Unicode code point, the immediate range starting at 0xE000 is private to
the specific application, so we can use that.  So invalid characters
now appear highlighted as two hex digits in angle brackets and the rest
of the line is properly processed (unless you got seriously unlucky with
clashes between the two character sets, which is beyond our power to
fix---of course in the reverse of the case Vincent had the chunks of
UTF-8 characters are often valid ISO-8859-1, but that's relatively
benign).

It doesn't cover all systems, but as you'll see this was a really rather
easy change, so it's worth doing.  It's likely this doesn't work
properly in completion which I haven't touched.

If there are other systems that don't define __STDC_ISO_10646__ but are
known to have similar ranges in wchar_t I'm happy to add the appropriate
definitions.

Index: Doc/Zsh/zle.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/zle.yo,v
retrieving revision 1.81
diff -p -u -r1.81 zle.yo
--- Doc/Zsh/zle.yo	5 Dec 2009 19:38:07 -0000	1.81
+++ Doc/Zsh/zle.yo	20 Mar 2010 00:21:06 -0000
@@ -2286,6 +2286,20 @@ angle brackets.  The number is the code 
 character set; this may or may not be Unicode, depending on the operating
 system.
 )
+item(Invalid multibyte characters)(
+If the tt(MULTIBYTE) option is in effect, any sequence of one or more
+bytes that does not form a valid character in the current character
+set is treated as a series of bytes each shown as a special character.
+This case can be distinguished from other unprintable characters
+as the bytes are represented as two hexadecimal digits between angle
+brackets, as distinct from the four or eight digits that are used for
+unprintable characters that are nonetheless valid in the current
+character set.
+
+Not all systems support this: for it to work, the system's representation of
+wide characters must be code values from the Universal Character Set,
+as defined by IS0 10646 (also known as Unicode).
+)
 enditem()
 
 If tt(zle_highlight) is not set or no value applies to a particular
Index: Src/Zle/zle.h
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle.h,v
retrieving revision 1.41
diff -p -u -r1.41 zle.h
--- Src/Zle/zle.h	24 Apr 2009 09:00:38 -0000	1.41
+++ Src/Zle/zle.h	20 Mar 2010 00:21:06 -0000
@@ -419,6 +419,20 @@ typedef struct {
 typedef REFRESH_ELEMENT *REFRESH_STRING;
 
 
+#if defined(MULTIBYTE_SUPPORT) && defined(__STDC_ISO_10646__)
+#define ZSH_INVALID_WCHAR_BASE	(0xe000U)
+#define ZSH_INVALID_WCHAR_TEST(x)			\
+    ((unsigned)(x) >= ZSH_INVALID_WCHAR_BASE &&		\
+     (unsigned)(x) <= (ZSH_INVALID_WCHAR_BASE + 255u))
+#define ZSH_INVALID_WCHAR_TO_CHAR(x)			\
+    ((char)((unsigned)(x) - ZSH_INVALID_WCHAR_BASE))
+#define ZSH_INVALID_WCHAR_TO_INT(x)			\
+    ((int)((unsigned)(x) - ZSH_INVALID_WCHAR_BASE))
+#define ZSH_CHAR_TO_INVALID_WCHAR(x)		\
+    ((wchar_t)(STOUC(x) + ZSH_INVALID_WCHAR_BASE))
+#endif
+
+
 #ifdef DEBUG
 #define METACHECK()		\
 	DPUTS(zlemetaline == NULL, "line not metafied")
Index: Src/Zle/zle_refresh.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_refresh.c,v
retrieving revision 1.77
diff -p -u -r1.77 zle_refresh.c
--- Src/Zle/zle_refresh.c	5 Dec 2009 19:38:07 -0000	1.77
+++ Src/Zle/zle_refresh.c	20 Mar 2010 00:21:07 -0000
@@ -1263,7 +1263,11 @@ zrefresh(void)
 	    }
 	}
 #ifdef MULTIBYTE_SUPPORT
-	else if (iswprint(*t) && (width = WCWIDTH(*t)) > 0) {
+	else if (
+#ifdef __STDC_ISO_10646__
+		 !ZSH_INVALID_WCHAR_TEST(*t) &&
+#endif
+		 iswprint(*t) && (width = WCWIDTH(*t)) > 0) {
 	    int ichars;
 	    if (width > rpms.sen - rpms.s) {
 		int started = 0;
@@ -1367,6 +1371,12 @@ zrefresh(void)
 	    wchar_t wc;
 	    int started = 0;
 
+#ifdef __STDC_ISO_10646__
+	    if (ZSH_INVALID_WCHAR_TEST(*t)) {
+		int c = ZSH_INVALID_WCHAR_TO_INT(*t);
+		sprintf(dispchars, "<%.02x>", c);
+	    } else
+#endif
 	    if ((unsigned)*t > 0xffffU) {
 		sprintf(dispchars, "<%.08x>", (unsigned)*t);
 	    } else {
Index: Src/Zle/zle_utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_utils.c,v
retrieving revision 1.55
diff -p -u -r1.55 zle_utils.c
--- Src/Zle/zle_utils.c	3 Jan 2009 18:12:15 -0000	1.55
+++ Src/Zle/zle_utils.c	20 Mar 2010 00:21:07 -0000
@@ -120,11 +120,19 @@ zlecharasstring(ZLE_CHAR_T inchar, char 
     size_t ret;
     char *ptr;
 
-    ret = wctomb(buf, inchar);
-    if (ret <= 0) {
-	/* Ick. */
-	buf[0] = '?';
-	return 1;
+#ifdef __STDC_ISO_10646__
+    if (ZSH_INVALID_WCHAR_TEST(inchar)) {
+	buf[0] = ZSH_INVALID_WCHAR_TO_CHAR(inchar);
+	ret = 1;
+    } else
+#endif
+    {
+	ret = wctomb(buf, inchar);
+	if (ret <= 0) {
+	    /* Ick. */
+	    buf[0] = '?';
+	    return 1;
+	}
     }
     ptr = buf + ret - 1;
     for (;;) {
@@ -196,13 +204,20 @@ zlelineasstring(ZLE_STRING_T instr, int 
     for (i=0; i < inll; i++, incs--) {
 	if (incs == 0)
 	    outcs = mb_len;
-	j = wcrtomb(s + mb_len, instr[i], &mbs);
-	if (j == -1) {
-	    /* invalid char; what to do? */
-	    s[mb_len++] = ZWC('?');
-	    memset(&mbs, 0, sizeof(mbs));
-	} else {
-	    mb_len += j;
+#ifdef __STDC_ISO_10646__
+	if (ZSH_INVALID_WCHAR_TEST(instr[i])) {
+	    s[mb_len++] = ZSH_INVALID_WCHAR_TO_CHAR(instr[i]);
+	} else
+#endif
+	{
+	    j = wcrtomb(s + mb_len, instr[i], &mbs);
+	    if (j == -1) {
+		/* invalid char */
+		s[mb_len++] = ZWC('?');
+		memset(&mbs, 0, sizeof(mbs));
+	    } else {
+		mb_len += j;
+	    }
 	}
     }
     if (incs == 0)
@@ -332,6 +347,13 @@ stringaszleline(char *instr, int incs, i
 	while (ll > 0) {
 	    size_t cnt = mbrtowc(outptr, inptr, ll, &mbs);
 
+#ifdef __STDC_ISO_10646__
+	    if (cnt == MB_INCOMPLETE || cnt == MB_INVALID) {
+		/* Use private encoding for invalid single byte */
+		*outptr = ZSH_CHAR_TO_INVALID_WCHAR(*inptr);
+		cnt = 1;
+	    }
+#else
 	    /*
 	     * At this point we don't handle either incomplete (-2) or
 	     * invalid (-1) multibyte sequences.  Use the current length
@@ -339,6 +361,7 @@ stringaszleline(char *instr, int incs, i
 	     */
 	    if (cnt == MB_INCOMPLETE || cnt == MB_INVALID)
 		break;
+#endif
 
 	    if (cnt == 0) {
 		/* Converting '\0' returns 0, but a '\0' is a real


-- 
Peter Stephenson <p.w.stephenson@ntlworld.com>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8
  2010-03-18 15:44     ` Peter Stephenson
  2010-03-20  0:29       ` PATCH: Displaying invalid characters Peter Stephenson
@ 2010-03-25 16:43       ` Vincent Lefevre
  1 sibling, 0 replies; 8+ messages in thread
From: Vincent Lefevre @ 2010-03-25 16:43 UTC (permalink / raw)
  To: zsh-workers

On 2010-03-18 15:44:24 +0000, Peter Stephenson wrote:
> Index: Src/Zle/zle_hist.c
> ===================================================================
> RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_hist.c,v
> retrieving revision 1.63
> diff -p -u -r1.63 zle_hist.c
> --- Src/Zle/zle_hist.c	6 Sep 2009 21:37:14 -0000	1.63
> +++ Src/Zle/zle_hist.c	18 Mar 2010 15:42:53 -0000
> @@ -1454,6 +1454,8 @@ doisearch(char **args, int dir, int patt
>  	    memset(&mbs, 0, sizeof(mbs));
>  	    while (charpos < end_pos) {
>  		ret = mb_metacharlenconv_r(zlemetaline + charpos, &wc, &mbs);
> +		if (ret <= 0) /* Unrecognised, treat as single char */
> +		    ret = 1;
>  		if (charpos <= pos && pos < charpos + ret)
>  		    isearch_startpos = charcount;
>  		charcount++;

Since I've recompiled zsh to use this patch, zsh crashes more and
more often under Mac OS X (under ISO-8859-1 locales). I'll try the
new one...

-- 
Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-03-25 16:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-18 10:50 zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre
2010-03-18 14:30 ` Peter Stephenson
2010-03-18 15:20   ` Vincent Lefevre
2010-03-18 15:44     ` Peter Stephenson
2010-03-20  0:29       ` PATCH: Displaying invalid characters Peter Stephenson
2010-03-25 16:43       ` zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre
2010-03-18 14:55 ` Bart Schaefer
2010-03-18 15:24   ` Vincent Lefevre

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).