* zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 @ 2010-03-18 10:50 Vincent Lefevre 2010-03-18 14:30 ` Peter Stephenson 2010-03-18 14:55 ` Bart Schaefer 0 siblings, 2 replies; 8+ messages in thread From: Vincent Lefevre @ 2010-03-18 10:50 UTC (permalink / raw) To: zsh-workers Hi, I get a reproducible freeze of zsh 4.3.10 when searching the history backward (history-incremental-search-backward). I think the problem comes from the fact that the history contains command lines encoded in ISO-8859-1 while I'm currently under UTF-8 locales (the freeze is not reproducible under ISO-8859-1 locales). Is this a known problem? Has it been fixed? -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 2010-03-18 10:50 zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre @ 2010-03-18 14:30 ` Peter Stephenson 2010-03-18 15:20 ` Vincent Lefevre 2010-03-18 14:55 ` Bart Schaefer 1 sibling, 1 reply; 8+ messages in thread From: Peter Stephenson @ 2010-03-18 14:30 UTC (permalink / raw) To: zsh-workers Vincent Lefevre wrote: > Hi, > > I get a reproducible freeze of zsh 4.3.10 when searching the history > backward (history-incremental-search-backward). I think the problem > comes from the fact that the history contains command lines encoded > in ISO-8859-1 while I'm currently under UTF-8 locales (the freeze is > not reproducible under ISO-8859-1 locales). > > Is this a known problem? Has it been fixed? I can't think of anything that's changed that would have fixed this (but it's been a long time now and we're due for another releases). I won't be doing any work on it until I get a reproducible case I can run. It sounds like it should be fairly trivial to fix if I can find out what (misinterpreted) characters are causing a freeze at what point in the code. -- Peter Stephenson <pws@csr.com> Software Engineer Tel: +44 (0)1223 692070 Cambridge Silicon Radio Limited Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 2010-03-18 14:30 ` Peter Stephenson @ 2010-03-18 15:20 ` Vincent Lefevre 2010-03-18 15:44 ` Peter Stephenson 0 siblings, 1 reply; 8+ messages in thread From: Vincent Lefevre @ 2010-03-18 15:20 UTC (permalink / raw) To: zsh-workers On 2010-03-18 14:30:20 +0000, Peter Stephenson wrote: > I can't think of anything that's changed that would have fixed this (but > it's been a long time now and we're due for another releases). I won't > be doing any work on it until I get a reproducible case I can run. It > sounds like it should be fairly trivial to fix if I can find out what > (misinterpreted) characters are causing a freeze at what point in the > code. I think this should be reproducible with shared history file: 1. Type in an ISO-8859-1 terminal: echo zzé zz 2. In a UTF-8 terminal: history-incremental-search-backward then type z -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 2010-03-18 15:20 ` Vincent Lefevre @ 2010-03-18 15:44 ` Peter Stephenson 2010-03-20 0:29 ` PATCH: Displaying invalid characters Peter Stephenson 2010-03-25 16:43 ` zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre 0 siblings, 2 replies; 8+ messages in thread From: Peter Stephenson @ 2010-03-18 15:44 UTC (permalink / raw) To: zsh-workers On Thu, 18 Mar 2010 16:20:56 +0100 Vincent Lefevre <vincent@vinc17.net> wrote: > On 2010-03-18 14:30:20 +0000, Peter Stephenson wrote: > > I can't think of anything that's changed that would have fixed this (but > > it's been a long time now and we're due for another releases). I won't > > be doing any work on it until I get a reproducible case I can run. It > > sounds like it should be fairly trivial to fix if I can find out what > > (misinterpreted) characters are causing a freeze at what point in the > > code. > > I think this should be reproducible with shared history file: > > 1. Type in an ISO-8859-1 terminal: echo zzé zz > 2. In a UTF-8 terminal: history-incremental-search-backward then type z Yes, I can see that. All that's really on offer with the wrong character set is it doesn't hang... Index: Src/Zle/zle_hist.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_hist.c,v retrieving revision 1.63 diff -p -u -r1.63 zle_hist.c --- Src/Zle/zle_hist.c 6 Sep 2009 21:37:14 -0000 1.63 +++ Src/Zle/zle_hist.c 18 Mar 2010 15:42:53 -0000 @@ -1454,6 +1454,8 @@ doisearch(char **args, int dir, int patt memset(&mbs, 0, sizeof(mbs)); while (charpos < end_pos) { ret = mb_metacharlenconv_r(zlemetaline + charpos, &wc, &mbs); + if (ret <= 0) /* Unrecognised, treat as single char */ + ret = 1; if (charpos <= pos && pos < charpos + ret) isearch_startpos = charcount; charcount++; -- Peter Stephenson <pws@csr.com> Software Engineer Tel: +44 (0)1223 692070 Cambridge Silicon Radio Limited Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, UK Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom ^ permalink raw reply [flat|nested] 8+ messages in thread
* PATCH: Displaying invalid characters 2010-03-18 15:44 ` Peter Stephenson @ 2010-03-20 0:29 ` Peter Stephenson 2010-03-25 16:43 ` zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre 1 sibling, 0 replies; 8+ messages in thread From: Peter Stephenson @ 2010-03-20 0:29 UTC (permalink / raw) To: zsh-workers On Thu, 18 Mar 2010 15:44:24 +0000 Peter Stephenson <pws@csr.com> wrote: > All that's really on offer with the wrong character > set is it doesn't hang... Up to now the shell has given up processing a line when it encounters an invalid character, i.e. a multibyte sequence that doesn't convert to a wide character. Although it can't show the character correctly (it simply doesn't know enough), it should be able to show the character specially in a similar manner to unprintable characters and continue looking at the line. All we need to fix this efficiently is a range of 256 entries in wchar_t that's guaranteed to be invalid. I find from the Linux Unicode manual entry that if __STDC_ISO_10646__ is defined, so that wchar_t is a Unicode code point, the immediate range starting at 0xE000 is private to the specific application, so we can use that. So invalid characters now appear highlighted as two hex digits in angle brackets and the rest of the line is properly processed (unless you got seriously unlucky with clashes between the two character sets, which is beyond our power to fix---of course in the reverse of the case Vincent had the chunks of UTF-8 characters are often valid ISO-8859-1, but that's relatively benign). It doesn't cover all systems, but as you'll see this was a really rather easy change, so it's worth doing. It's likely this doesn't work properly in completion which I haven't touched. If there are other systems that don't define __STDC_ISO_10646__ but are known to have similar ranges in wchar_t I'm happy to add the appropriate definitions. Index: Doc/Zsh/zle.yo =================================================================== RCS file: /cvsroot/zsh/zsh/Doc/Zsh/zle.yo,v retrieving revision 1.81 diff -p -u -r1.81 zle.yo --- Doc/Zsh/zle.yo 5 Dec 2009 19:38:07 -0000 1.81 +++ Doc/Zsh/zle.yo 20 Mar 2010 00:21:06 -0000 @@ -2286,6 +2286,20 @@ angle brackets. The number is the code character set; this may or may not be Unicode, depending on the operating system. ) +item(Invalid multibyte characters)( +If the tt(MULTIBYTE) option is in effect, any sequence of one or more +bytes that does not form a valid character in the current character +set is treated as a series of bytes each shown as a special character. +This case can be distinguished from other unprintable characters +as the bytes are represented as two hexadecimal digits between angle +brackets, as distinct from the four or eight digits that are used for +unprintable characters that are nonetheless valid in the current +character set. + +Not all systems support this: for it to work, the system's representation of +wide characters must be code values from the Universal Character Set, +as defined by IS0 10646 (also known as Unicode). +) enditem() If tt(zle_highlight) is not set or no value applies to a particular Index: Src/Zle/zle.h =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/zle.h,v retrieving revision 1.41 diff -p -u -r1.41 zle.h --- Src/Zle/zle.h 24 Apr 2009 09:00:38 -0000 1.41 +++ Src/Zle/zle.h 20 Mar 2010 00:21:06 -0000 @@ -419,6 +419,20 @@ typedef struct { typedef REFRESH_ELEMENT *REFRESH_STRING; +#if defined(MULTIBYTE_SUPPORT) && defined(__STDC_ISO_10646__) +#define ZSH_INVALID_WCHAR_BASE (0xe000U) +#define ZSH_INVALID_WCHAR_TEST(x) \ + ((unsigned)(x) >= ZSH_INVALID_WCHAR_BASE && \ + (unsigned)(x) <= (ZSH_INVALID_WCHAR_BASE + 255u)) +#define ZSH_INVALID_WCHAR_TO_CHAR(x) \ + ((char)((unsigned)(x) - ZSH_INVALID_WCHAR_BASE)) +#define ZSH_INVALID_WCHAR_TO_INT(x) \ + ((int)((unsigned)(x) - ZSH_INVALID_WCHAR_BASE)) +#define ZSH_CHAR_TO_INVALID_WCHAR(x) \ + ((wchar_t)(STOUC(x) + ZSH_INVALID_WCHAR_BASE)) +#endif + + #ifdef DEBUG #define METACHECK() \ DPUTS(zlemetaline == NULL, "line not metafied") Index: Src/Zle/zle_refresh.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_refresh.c,v retrieving revision 1.77 diff -p -u -r1.77 zle_refresh.c --- Src/Zle/zle_refresh.c 5 Dec 2009 19:38:07 -0000 1.77 +++ Src/Zle/zle_refresh.c 20 Mar 2010 00:21:07 -0000 @@ -1263,7 +1263,11 @@ zrefresh(void) } } #ifdef MULTIBYTE_SUPPORT - else if (iswprint(*t) && (width = WCWIDTH(*t)) > 0) { + else if ( +#ifdef __STDC_ISO_10646__ + !ZSH_INVALID_WCHAR_TEST(*t) && +#endif + iswprint(*t) && (width = WCWIDTH(*t)) > 0) { int ichars; if (width > rpms.sen - rpms.s) { int started = 0; @@ -1367,6 +1371,12 @@ zrefresh(void) wchar_t wc; int started = 0; +#ifdef __STDC_ISO_10646__ + if (ZSH_INVALID_WCHAR_TEST(*t)) { + int c = ZSH_INVALID_WCHAR_TO_INT(*t); + sprintf(dispchars, "<%.02x>", c); + } else +#endif if ((unsigned)*t > 0xffffU) { sprintf(dispchars, "<%.08x>", (unsigned)*t); } else { Index: Src/Zle/zle_utils.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_utils.c,v retrieving revision 1.55 diff -p -u -r1.55 zle_utils.c --- Src/Zle/zle_utils.c 3 Jan 2009 18:12:15 -0000 1.55 +++ Src/Zle/zle_utils.c 20 Mar 2010 00:21:07 -0000 @@ -120,11 +120,19 @@ zlecharasstring(ZLE_CHAR_T inchar, char size_t ret; char *ptr; - ret = wctomb(buf, inchar); - if (ret <= 0) { - /* Ick. */ - buf[0] = '?'; - return 1; +#ifdef __STDC_ISO_10646__ + if (ZSH_INVALID_WCHAR_TEST(inchar)) { + buf[0] = ZSH_INVALID_WCHAR_TO_CHAR(inchar); + ret = 1; + } else +#endif + { + ret = wctomb(buf, inchar); + if (ret <= 0) { + /* Ick. */ + buf[0] = '?'; + return 1; + } } ptr = buf + ret - 1; for (;;) { @@ -196,13 +204,20 @@ zlelineasstring(ZLE_STRING_T instr, int for (i=0; i < inll; i++, incs--) { if (incs == 0) outcs = mb_len; - j = wcrtomb(s + mb_len, instr[i], &mbs); - if (j == -1) { - /* invalid char; what to do? */ - s[mb_len++] = ZWC('?'); - memset(&mbs, 0, sizeof(mbs)); - } else { - mb_len += j; +#ifdef __STDC_ISO_10646__ + if (ZSH_INVALID_WCHAR_TEST(instr[i])) { + s[mb_len++] = ZSH_INVALID_WCHAR_TO_CHAR(instr[i]); + } else +#endif + { + j = wcrtomb(s + mb_len, instr[i], &mbs); + if (j == -1) { + /* invalid char */ + s[mb_len++] = ZWC('?'); + memset(&mbs, 0, sizeof(mbs)); + } else { + mb_len += j; + } } } if (incs == 0) @@ -332,6 +347,13 @@ stringaszleline(char *instr, int incs, i while (ll > 0) { size_t cnt = mbrtowc(outptr, inptr, ll, &mbs); +#ifdef __STDC_ISO_10646__ + if (cnt == MB_INCOMPLETE || cnt == MB_INVALID) { + /* Use private encoding for invalid single byte */ + *outptr = ZSH_CHAR_TO_INVALID_WCHAR(*inptr); + cnt = 1; + } +#else /* * At this point we don't handle either incomplete (-2) or * invalid (-1) multibyte sequences. Use the current length @@ -339,6 +361,7 @@ stringaszleline(char *instr, int incs, i */ if (cnt == MB_INCOMPLETE || cnt == MB_INVALID) break; +#endif if (cnt == 0) { /* Converting '\0' returns 0, but a '\0' is a real -- Peter Stephenson <p.w.stephenson@ntlworld.com> Web page now at http://homepage.ntlworld.com/p.w.stephenson/ ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 2010-03-18 15:44 ` Peter Stephenson 2010-03-20 0:29 ` PATCH: Displaying invalid characters Peter Stephenson @ 2010-03-25 16:43 ` Vincent Lefevre 1 sibling, 0 replies; 8+ messages in thread From: Vincent Lefevre @ 2010-03-25 16:43 UTC (permalink / raw) To: zsh-workers On 2010-03-18 15:44:24 +0000, Peter Stephenson wrote: > Index: Src/Zle/zle_hist.c > =================================================================== > RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_hist.c,v > retrieving revision 1.63 > diff -p -u -r1.63 zle_hist.c > --- Src/Zle/zle_hist.c 6 Sep 2009 21:37:14 -0000 1.63 > +++ Src/Zle/zle_hist.c 18 Mar 2010 15:42:53 -0000 > @@ -1454,6 +1454,8 @@ doisearch(char **args, int dir, int patt > memset(&mbs, 0, sizeof(mbs)); > while (charpos < end_pos) { > ret = mb_metacharlenconv_r(zlemetaline + charpos, &wc, &mbs); > + if (ret <= 0) /* Unrecognised, treat as single char */ > + ret = 1; > if (charpos <= pos && pos < charpos + ret) > isearch_startpos = charcount; > charcount++; Since I've recompiled zsh to use this patch, zsh crashes more and more often under Mac OS X (under ISO-8859-1 locales). I'll try the new one... -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 2010-03-18 10:50 zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre 2010-03-18 14:30 ` Peter Stephenson @ 2010-03-18 14:55 ` Bart Schaefer 2010-03-18 15:24 ` Vincent Lefevre 1 sibling, 1 reply; 8+ messages in thread From: Bart Schaefer @ 2010-03-18 14:55 UTC (permalink / raw) To: zsh-workers On Mar 18, 11:50am, Vincent Lefevre wrote: } Subject: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 } } Is this a known problem? Has it been fixed? I guess it's "known" since you reported it about 10 days ago. :-/ Looking back quickly at that report ... I ignored it because I have no idea how to interpret output from "sample". I'd never even heard of "sample" until you referenced it there. } I get a reproducible freeze of zsh 4.3.10 when searching the history } backward (history-incremental-search-backward). I think the problem } comes from the fact that the history contains command lines encoded } in ISO-8859-1 while I'm currently under UTF-8 locales (the freeze is } not reproducible under ISO-8859-1 locales). Combining this information with what I see in the "sample" output, my guess is that an attempt to iterate over a multibyte string using the multibyte libraries is hitting a spot where interpreting ISO-8859 as UTF-8 results in some kind of failure to advance the character index, which then causes the search to keep scanning the same history entry repeatedly. Probably here: while (charpos < end_pos) { ret = mb_metacharlenconv_r(zlemetaline + charpos, &wc, &mbs); if (charpos <= pos && pos < charpos + ret) isearch_startpos = charcount; charcount++; charpos += ret; } There needs to be some kind of sane behavior if ret == 0, or the mb_metacharlenconv_r() function in utils.c must assure that it never does return zero: if (ptr > s) { return 1 + (*s == Meta); /* Treat as single byte character */ } else return 0; /* Probably shouldn't happen */ Possibly related to this thread back in November: http://www.zsh.org/mla/workers/2009/msg01103.html There doesn't seem to have been any resolution to that, but it's hard to follow all the way to the end with the archive. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 2010-03-18 14:55 ` Bart Schaefer @ 2010-03-18 15:24 ` Vincent Lefevre 0 siblings, 0 replies; 8+ messages in thread From: Vincent Lefevre @ 2010-03-18 15:24 UTC (permalink / raw) To: zsh-workers On 2010-03-18 07:55:23 -0700, Bart Schaefer wrote: > On Mar 18, 11:50am, Vincent Lefevre wrote: > } Subject: zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 > } > } Is this a known problem? Has it been fixed? > > I guess it's "known" since you reported it about 10 days ago. :-/ I didn't remember, and at that time, I didn't know that this was an encoding problem. I can now confirm this problem, under both Mac OS X and Linux. See my other mail for a testcase. -- Vincent Lefèvre <vincent@vinc17.net> - Web: <http://www.vinc17.net/> 100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/> Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon) ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-03-25 16:50 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-03-18 10:50 zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre 2010-03-18 14:30 ` Peter Stephenson 2010-03-18 15:20 ` Vincent Lefevre 2010-03-18 15:44 ` Peter Stephenson 2010-03-20 0:29 ` PATCH: Displaying invalid characters Peter Stephenson 2010-03-25 16:43 ` zsh 4.3.10 history-incremental-search-backward freezes in UTF-8 Vincent Lefevre 2010-03-18 14:55 ` Bart Schaefer 2010-03-18 15:24 ` Vincent Lefevre
Code repositories for project(s) associated with this public inbox https://git.vuxu.org/mirror/zsh/ This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).