From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 16082 invoked from network); 21 Oct 2001 18:52:55 -0000 Received: from sunsite.dk (130.225.247.90) by ns1.primenet.com.au with SMTP; 21 Oct 2001 18:52:55 -0000 Received: (qmail 25426 invoked by alias); 21 Oct 2001 18:52:49 -0000 Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm Precedence: bulk X-No-Archive: yes X-Seq: 16093 Received: (qmail 25409 invoked from network); 21 Oct 2001 18:52:48 -0000 Date: Sun, 21 Oct 2001 14:21:06 -0400 From: Clint Adams To: Bart Schaefer Cc: zsh-workers@sunsite.dk Subject: Re: multibyte backwarddeletechar Message-ID: <20011021142106.A19465@dman.com> References: <20011021114254.A17952@dman.com> <1011021171339.ZM14059@candle.brasslantern.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <1011021171339.ZM14059@candle.brasslantern.com>; from schaefer@brasslantern.com on Sun, Oct 21, 2001 at 05:13:38PM +0000 > I'm a bit surprised that this wouldn't cause significant confusion in > the ZLE display code. How did the multi-byte character get input in > the first place? Is it displayed as occupying one character position > on the screen, or several? If only one, doesn't the cursor end up in > the wrong place on most word- or line-oriented motions that cross it? That depends on the terminal emulator and font. If I run LANG=zh_TW.Big5 crxvt -ls -fm taipei16 -fn 8x16 -km big5 , each BIG5 character (2 octets) appears to take up the vertical space on one ASCII character, and horizontal space of two ASCII characters. If I run LANG=zh_TW.Big5 crxvt -ls -fm taipei14 -fn 8x16 -km big5 , each BIG5 character (2 octets) appears to take up the vertical space on one ASCII character, and horizontal space of two and a half (2.5) ASCII characters, although crxvt does some ugly overlapping resulting in ZLE not getting confused. If I run LANG=ja_JP.UTF-8 xterm -class UXTerm , each UTF-8 Kanji character (3 octets) appears to take up the same (2 horizontal, 1 vertical) space. In this case, ZLE does get horribly confused. If I run LANG=ru_RU.UTF-8 xterm -class UXTerm , each UTF-8 Cyrillic character (3 octets) appears to take up the horizontal and vertical space of one ASCII character. This also makes ZLE horribly confused. If I run LANG=fr_FR.UTF-8 xterm -class UXTerm , each UTF-8 French non-ASCII character (2 octets) appears to take up the horizontal and vertical space of one ASCII character. Again, this confuses ZLE. I imagine that 6-byte characters will generally take up less horizontal space than 6 ASCII characters as well. > If we're going to support wide and/or multi-byte characters, I think we > should Do It Right, not by pasting a zillion workarounds into individual > editor functions. I suspect that Doing It Right involves changing char *line to wchar_t *wline, and modifying all dependencies accordingly. Additionally, we'd need to figure out how much space each individual character consumes.