From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 9841 invoked from network); 8 Jan 2006 08:06:32 -0000 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00, FORGED_RCVD_HELO autolearn=ham version=3.1.0 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 8 Jan 2006 08:06:32 -0000 Received: (qmail 56224 invoked from network); 8 Jan 2006 08:06:26 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 8 Jan 2006 08:06:26 -0000 Received: (qmail 15428 invoked by alias); 8 Jan 2006 08:06:24 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 22140 Received: (qmail 15419 invoked from network); 8 Jan 2006 08:06:23 -0000 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by sunsite.dk with SMTP; 8 Jan 2006 08:06:23 -0000 Received: (qmail 56014 invoked from network); 8 Jan 2006 08:06:23 -0000 Received: from dsl3-63-249-88-2.cruzio.com (HELO dot.blorf.net) (63.249.88.2) by a.mx.sunsite.dk with SMTP; 8 Jan 2006 08:06:22 -0000 Received: by dot.blorf.net (Postfix, from userid 1000) id 41C1A8E41; Sun, 8 Jan 2006 00:06:21 -0800 (PST) Date: Sun, 8 Jan 2006 00:06:21 -0800 From: Wayne Davison To: Bart Schaefer Cc: zsh-workers@sunsite.dk Subject: Re: bug in completion/expansion of files with LANG=C Message-ID: <20060108080621.GA32692@dot.blorf.net> References: <20060106215829.GG10111@dot.blorf.net> <20060107224447.GA30232@dot.blorf.net> <1060108055620.ZM15382@candle.brasslantern.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="6TrnltStXW4iwmi0" Content-Disposition: inline In-Reply-To: <1060108055620.ZM15382@candle.brasslantern.com> User-Agent: Mutt/1.5.11 --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Sun, Jan 08, 2006 at 05:56:20AM +0000, Bart Schaefer wrote: > I prefer the \M- form because it gives you some clue what you should do > to generate the equivalent value from the keyboard. Fair enough -- let's just leave it alone, then. As for my patch in the grandparent email, I noticed some problems with it: the manpage for mbrtowc() says that the state of the mbstate_t object is undefined after the function returns -1, so the code should reset it to a known state. When the function returns -2, it means the code scanned to the end of the string without finding the end of a wide character, so perhaps we should treat all the remaining characters as invalid? I'm not certain that's the correct thing to do, so I'll leave the code handling -2 the same way as -1 for now. Finally, I wasn't setting the right visible width for the \M-... string (I had mistakenly hardwired it to "1"). While twiddling these things I noticed a couple other things that I think could be improved: 1. It looks to me like the code in wcs_nicechar() that calls wcswidth(&c, 1) could really just call wcwidth(c), right? If not, what am I missing? 2. The code in mb_niceformat() calls strlen() on the "fmt" string returned by wcs_nicechar(), but it seems to me that it could just use the width that wcs_nicechar() returned, right? Attached is an updated version of my patch that fixes the aforementioned bugs and implements the 2 improvements. ..wayne.. --6TrnltStXW4iwmi0 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="mb_niceformat.patch" --- Src/utils.c 15 Dec 2005 14:51:41 -0000 1.108 +++ Src/utils.c 8 Jan 2006 07:55:56 -0000 @@ -375,7 +375,7 @@ wcs_nicechar(wchar_t c, size_t *widthp, } if (widthp) - *widthp = (s - buf) + wcswidth(&c, 1); + *widthp = (s - buf) + wcwidth(c); if (swidep) *swidep = s; for (mbptr = mbstr; ret; s++, mbptr++, ret--) { @@ -3446,8 +3446,8 @@ niceztrlen(char const *s) mod_export size_t mb_niceformat(const char *s, FILE *stream, char **outstrp, int heap) { - size_t l = 0, newl, ret; - int umlen, outalloc, outleft; + size_t l = 0, outlen, outleft, ret; + int umlen, outalloc; wchar_t c; char *ums, *ptr, *fmt, *outstr, *outptr; mbstate_t ps; @@ -3473,31 +3473,31 @@ mb_niceformat(const char *s, FILE *strea while (umlen > 0) { ret = mbrtowc(&c, ptr, umlen, &ps); - if (ret == (size_t)-1 || ret == (size_t)-2) - { - /* - * We're a bit stuck here. I suppose we could - * just stick with \M-... for the individual bytes. - */ - break; - } - /* - * careful in case converting NULL returned 0: NULLs are real - * characters for us. - */ - if (c == L'\0' && ret == 0) + if (ret != (size_t)-1 && ret != (size_t)-2) { + /* Careful: converting '\0' returns 0, but a '\0' is a + * real character for us, so we should consume 1 byte. */ + if (c == L'\0') + ret = 1; + + fmt = wcs_nicechar(c, &outlen, NULL); + } else { + /* Get ps out of its undefined state. */ + memset(&ps, 0, sizeof ps); ret = 1; + + /* The byte didn't convert, so output it as a \M-... sequence. */ + fmt = nicechar(*(unsigned char*)ptr); + outlen = strlen(fmt); + } + umlen -= ret; ptr += ret; - - fmt = wcs_nicechar(c, &newl, NULL); - l += newl; + l += outlen; if (stream) zputs(fmt, stream); if (outstr) { /* Append to output string */ - int outlen = strlen(fmt); if (outlen >= outleft) { /* Reallocate to twice the length */ int outoffset = outptr - outstr; --6TrnltStXW4iwmi0--