From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <zsh-workers-return-16093-mason-zsh=primenet.com.au@sunsite.dk>
Received: (qmail 16082 invoked from network); 21 Oct 2001 18:52:55 -0000
Received: from sunsite.dk (130.225.247.90)
  by ns1.primenet.com.au with SMTP; 21 Oct 2001 18:52:55 -0000
Received: (qmail 25426 invoked by alias); 21 Oct 2001 18:52:49 -0000
Mailing-List: contact zsh-workers-help@sunsite.dk; run by ezmlm
Precedence: bulk
X-No-Archive: yes
X-Seq: 16093
Received: (qmail 25409 invoked from network); 21 Oct 2001 18:52:48 -0000
Date: Sun, 21 Oct 2001 14:21:06 -0400
From: Clint Adams <clint@zsh.org>
To: Bart Schaefer <schaefer@brasslantern.com>
Cc: zsh-workers@sunsite.dk
Subject: Re: multibyte backwarddeletechar
Message-ID: <20011021142106.A19465@dman.com>
References: <20011021114254.A17952@dman.com> <1011021171339.ZM14059@candle.brasslantern.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <1011021171339.ZM14059@candle.brasslantern.com>; from schaefer@brasslantern.com on Sun, Oct 21, 2001 at 05:13:38PM +0000

> I'm a bit surprised that this wouldn't cause significant confusion in
> the ZLE display code.  How did the multi-byte character get input in
> the first place?  Is it displayed as occupying one character position
> on the screen, or several?  If only one, doesn't the cursor end up in
> the wrong place on most word- or line-oriented motions that cross it?

That depends on the terminal emulator and font.  If I run
LANG=zh_TW.Big5 crxvt -ls -fm taipei16 -fn 8x16 -km big5 ,
each BIG5 character (2 octets) appears to take up the
vertical space on one ASCII character, and horizontal space
of two ASCII characters.  If I run
LANG=zh_TW.Big5 crxvt -ls -fm taipei14 -fn 8x16 -km big5 ,
each BIG5 character (2 octets) appears to take up the
vertical space on one ASCII character, and horizontal space
of two and a half (2.5) ASCII characters, although crxvt
does some ugly overlapping resulting in ZLE not getting confused.
If I run LANG=ja_JP.UTF-8 xterm -class UXTerm ,
each UTF-8 Kanji character (3 octets) appears to take up
the same (2 horizontal, 1 vertical) space.  In this case,
ZLE does get horribly confused.  If I run
LANG=ru_RU.UTF-8 xterm -class UXTerm ,
each UTF-8 Cyrillic character (3 octets) appears to take
up the horizontal and vertical space of one ASCII character.
This also makes ZLE horribly confused.  If I run
LANG=fr_FR.UTF-8 xterm -class UXTerm ,
each UTF-8 French non-ASCII character (2 octets)
appears to take up the horizontal and vertical space of one
ASCII character.  Again, this confuses ZLE.

I imagine that 6-byte characters will generally take up
less horizontal space than 6 ASCII characters as well.

> If we're going to support wide and/or multi-byte characters, I think we
> should Do It Right, not by pasting a zillion workarounds into individual
> editor functions.

I suspect that Doing It Right involves changing char *line to
wchar_t *wline, and modifying all dependencies accordingly.
Additionally, we'd need to figure out how much space each
individual character consumes.