From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 1331 invoked from network); 22 Jan 2008 09:58:20 -0000 X-Spam-Checker-Version: SpamAssassin 3.2.4 (2008-01-01) on f.primenet.com.au X-Spam-Level: X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.2.4 Received: from news.dotsrc.org (HELO a.mx.sunsite.dk) (130.225.247.88) by ns1.primenet.com.au with SMTP; 22 Jan 2008 09:58:20 -0000 Received-SPF: none (ns1.primenet.com.au: domain at sunsite.dk does not designate permitted sender hosts) Received: (qmail 35344 invoked from network); 22 Jan 2008 09:58:16 -0000 Received: from sunsite.dk (130.225.247.90) by a.mx.sunsite.dk with SMTP; 22 Jan 2008 09:58:16 -0000 Received: (qmail 12881 invoked by alias); 22 Jan 2008 09:58:13 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 24434 Received: (qmail 12872 invoked from network); 22 Jan 2008 09:58:12 -0000 Received: from bifrost.dotsrc.org (130.225.254.106) by sunsite.dk with SMTP; 22 Jan 2008 09:58:12 -0000 Received: from virusfilter.dotsrc.org (bifrost [127.0.0.1]) by spamfilter.dotsrc.org (Postfix) with ESMTP id DDD088058F58 for ; Tue, 22 Jan 2008 10:58:08 +0100 (CET) Received: from cluster-d.mailcontrol.com (cluster-d.mailcontrol.com [217.69.20.190]) by bifrost.dotsrc.org (Postfix) with ESMTP for ; Tue, 22 Jan 2008 10:58:08 +0100 (CET) Received: from cameurexb01.EUROPE.ROOT.PRI ([62.189.241.200]) by rly10d.srv.mailcontrol.com (MailControl) with ESMTP id m0M9vXx4013934 for ; Tue, 22 Jan 2008 09:58:03 GMT Received: from news01 ([10.103.143.38]) by cameurexb01.EUROPE.ROOT.PRI with Microsoft SMTPSVC(6.0.3790.1830); Tue, 22 Jan 2008 09:57:28 +0000 Date: Tue, 22 Jan 2008 09:57:28 +0000 From: Peter Stephenson To: "Zsh Hackers' List" Subject: Re: Unicode problem Message-ID: <20080122095728.62fc8e70@news01> In-Reply-To: <080121101649.ZM14116@torch.brasslantern.com> References: <20080117120932.4458d35a@news01> <200801211415.m0LEFbbU017355@news01.csr.com> <237967ef0801210629t201392a6h4d28fe80fcb5ab44@mail.gmail.com> <200801211445.m0LEjLS7029582@news01.csr.com> <080121101649.ZM14116@torch.brasslantern.com> Organization: CSR X-Mailer: Claws Mail 3.2.0 (GTK+ 2.10.14; i386-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 22 Jan 2008 09:57:28.0925 (UTC) FILETIME=[32C358D0:01C85CDD] X-Scanned-By: MailControl A-08-00-01 (www.mailcontrol.com) on 10.68.0.120 X-Virus-Scanned: ClamAV using ClamSMTP On Mon, 21 Jan 2008 10:16:49 -0800 Bart Schaefer wrote: > In the line editor I'm not so sure. Treating it like a non-printable > character seems like a good first step. OK, here is a first step. It turns out we haven't done very well with unprintable wide characters anyway: the only special handing is for control characters, and the code is the same as for ASCII control characters, which doesn't really work. So this covers any zero-width or unprintable characters not in the range 0 to 255 when multibyte support is enabled. Note it uses the native wide character type, not necessarily Unicode---I don't think it's appropriate at this level to assume Unicode. The character shows up as hex digits in angle brackets. Suggest improvements if you like, but it needs to be short. Play with this and see if it works: you can use insert-unicode-char to insert character 0xfeff. A possible way forward for the future is that I'd quite like to add functionality for highlighting parts of the command line after 4.3.5. (To be more accurate, I'd quite like someone else to add it, but I don't think that's going to happen.) Doing this within zle_refresh.c is the easy (or easiest) bit. Then the non-printable character could be reverse video, which is clearer. This obviously doesn't preclude adding combining character support but that's not going to happen today. Index: Src/Zle/zle_refresh.c =================================================================== RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_refresh.c,v retrieving revision 1.52 diff -u -r1.52 zle_refresh.c --- Src/Zle/zle_refresh.c 8 Jan 2008 15:07:02 -0000 1.52 +++ Src/Zle/zle_refresh.c 22 Jan 2008 09:54:30 -0000 @@ -447,6 +447,10 @@ int tmpalloced; /* flag to free tmpline when finished */ int remetafy; /* flag that zle line is metafied */ struct rparams rpms; +#ifdef MULTIBYTE_SUPPORT + int width; /* width of wide character */ +#endif + /* If this is called from listmatches() (indirectly via trashzle()), and * * that was called from the end of zrefresh(), then we don't need to do * @@ -633,8 +637,7 @@ while ((++t0) & 7); } #ifdef MULTIBYTE_SUPPORT - else if (iswprint(*t)) { - int width = wcwidth(*t); + else if (iswprint(*t) && (width = wcwidth(*t)) > 0) { if (width > rpms.sen - rpms.s) { /* * Too wide to fit. Insert spaces to end of current line. @@ -649,7 +652,7 @@ rpms.nvcs = rpms.s - nbuf[rpms.nvln = rpms.ln]; } } - if (width > rpms.sen - rpms.s) { + if (width > rpms.sen - rpms.s || width == 0) { /* * The screen width is too small to fit even one * occurrence. @@ -663,7 +666,11 @@ } } #endif - else if (ZC_icntrl(*t)) { /* other control character */ + else if (ZC_icntrl(*t) +#ifdef MULTIBYTE_SUPPORT + && (unsigned)*t <= 0xffU +#endif + ) { /* other control character */ *rpms.s++ = ZWC('^'); if (rpms.s == rpms.sen) { /* text wrapped */ @@ -671,9 +678,42 @@ break; } *rpms.s++ = (((unsigned int)*t & ~0x80u) > 31) ? ZWC('?') : (*t | ZWC('@')); - } else { /* normal character */ + } +#ifdef MULTIBYTE_SUPPORT + else { + /* + * Not printable or zero width. + * Resort to hackery. + */ + char dispchars[11]; + char *dispptr = dispchars; + wchar_t wc; + + if ((unsigned)*t > 0xffffU) { + sprintf(dispchars, "<%.08x>", (unsigned)*t); + } else { + sprintf(dispchars, "<%.04x>", (unsigned)*t); + } + while (*dispptr) { + if (mbtowc(&wc, dispptr, 1) == 1 /* paranoia */) + { + *rpms.s++ = wc; + if (rpms.s == rpms.sen) { + /* text wrapped */ + if (nextline(&rpms, 1)) + break; + } + } + dispptr++; + } + if (*dispptr) /* nextline said stop processing */ + break; + } +#else + else { /* normal character */ *rpms.s++ = *t; } +#endif if (rpms.s == rpms.sen) { /* text wrapped */ if (nextline(&rpms, 1)) -- Peter Stephenson Software Engineer CSR PLC, Churchill House, Cambridge Business Park, Cowley Road Cambridge, CB4 0WZ, UK Tel: +44 (0)1223 692070