zsh-users
 help / color / mirror / code / Atom feed
From: Danek Duvall <duvall@comfychair.org>
To: "Jun T." <takimoto-j@kba.biglobe.ne.jp>
Cc: zsh-users@zsh.org
Subject: Re: zsh doesn't understand some multibyte characters
Date: Thu, 14 May 2015 10:32:50 -0700	[thread overview]
Message-ID: <20150514173250.GB14025@lorien.comfychair.org> (raw)
In-Reply-To: <CF275011-9284-4528-9027-26762F7EFE17@kba.biglobe.ne.jp>

On Fri, May 15, 2015 at 01:43:45AM +0900, Jun T. wrote:

> 
> 2015/05/14 03:29, Danek Duvall <duvall@comfychair.org> wrote
> > 
> > If I set
> > 
> >    comb_acute_mb[] = { (char)0xe2, (char)0x80, (char)0xa6 };
> > 
> > in the test, it thinks that character's wcwidth() is 2, not 1.
> 
> U+2026 is one of the characters whose "East Asian Width" property
> is set to "Ambiguous". Widths of these characters are *really* ambiguous;
> in western (monospaced) fonts they have a single width,
> while in (most of?) CJK fonts they have double width.
> 
> Usually, wcwidth() returns 1 for these characters so they are not
> displayed correctly in CJK fonts, unless applications take spacial care of
> them. For example, xterm has an option -cjk to handle this problem.
> 
> Your report indicates that Solaris is one of the rare systems in
> which wcwidth() returns 2 for U+2026.
> 
> Are there any fonts in which U+2026 has double width on Solaris?

Likely, but I don't know for sure, and I'm not sure how to tell.

As one of our globalization folks explained in a long-open bug against
Solaris' "broken" wcwidth(), we currently have a single width table, and
the ambiguous-width characters all(?) come back as width 2.  They're
proposing two tables, switched based on the locale -- if you're in an east
Asian locale, you'll get 2 for these, and otherwise 1, similarly to the way
that gnome-terminal uses VTE_CJK_WIDTH.

The only commentary mk_wcwidth() has about ambiguous character widths is in
the alternate _cjk implementation, which he doesn't recommend for general
use.  I don't know if the Solaris approach (double-width in CJK locales,
single-width elsewhere) is common enough to want to make this
runtime-configurable in programs that care; for instance, zsh could have a
setopt flag to switch to double-width when the user knew they were in that
environment.

I'm a bit surprised that xterm's -cjk option isn't automatic -- shouldn't
it know whether the font it's loading is double-width or not?  Either way,
it could respond to some escape code that programs which care (or even
wcwidth() itself or a standard replacement) could use to query it about the
current width.  Perhaps that's the ideal solution?

I'd started talking to Thomas Dickey about this a couple of years ago (I
keep running into this problem, start talking to people about it, decide
it's too hard and I don't have enough time, and drop it until the next time
around); perhaps I could pick that thread up again with that suggestion?

FWIW, I tried xterm -cjk, both with my normal western font and with a CJK
font, and in both cases it handles U+2026 fine, putting it in a double-wide
box.  Vim seemed to handle it, too.

> > I don't know why the zero-width combining character was chosen as the
> > test.
> 
> The test was first introduced to detect a broken wcwidth() on Mac OS X,
> where wcwidth() returns 1 for combining characters.

Which seems unambiguously broken, unlike the one on Solaris.

Thanks,
Danek


      reply	other threads:[~2015-05-14 17:33 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-13 16:14 Danek Duvall
2015-05-13 17:43 ` Bart Schaefer
2015-05-13 18:29   ` Danek Duvall
2015-05-13 20:20     ` Bart Schaefer
2015-05-13 21:24       ` Chet Ramey
2015-05-14 16:43     ` Jun T.
2015-05-14 17:32       ` Danek Duvall [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150514173250.GB14025@lorien.comfychair.org \
    --to=duvall@comfychair.org \
    --cc=takimoto-j@kba.biglobe.ne.jp \
    --cc=zsh-users@zsh.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/zsh/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).