From: Danek Duvall <duvall@comfychair.org>
To: "Jun T." <takimoto-j@kba.biglobe.ne.jp>
Cc: zsh-users@zsh.org
Subject: Re: zsh doesn't understand some multibyte characters
Date: Thu, 14 May 2015 10:32:50 -0700 [thread overview]
Message-ID: <20150514173250.GB14025@lorien.comfychair.org> (raw)
In-Reply-To: <CF275011-9284-4528-9027-26762F7EFE17@kba.biglobe.ne.jp>
On Fri, May 15, 2015 at 01:43:45AM +0900, Jun T. wrote:
>
> 2015/05/14 03:29, Danek Duvall <duvall@comfychair.org> wrote
> >
> > If I set
> >
> > comb_acute_mb[] = { (char)0xe2, (char)0x80, (char)0xa6 };
> >
> > in the test, it thinks that character's wcwidth() is 2, not 1.
>
> U+2026 is one of the characters whose "East Asian Width" property
> is set to "Ambiguous". Widths of these characters are *really* ambiguous;
> in western (monospaced) fonts they have a single width,
> while in (most of?) CJK fonts they have double width.
>
> Usually, wcwidth() returns 1 for these characters so they are not
> displayed correctly in CJK fonts, unless applications take spacial care of
> them. For example, xterm has an option -cjk to handle this problem.
>
> Your report indicates that Solaris is one of the rare systems in
> which wcwidth() returns 2 for U+2026.
>
> Are there any fonts in which U+2026 has double width on Solaris?
Likely, but I don't know for sure, and I'm not sure how to tell.
As one of our globalization folks explained in a long-open bug against
Solaris' "broken" wcwidth(), we currently have a single width table, and
the ambiguous-width characters all(?) come back as width 2. They're
proposing two tables, switched based on the locale -- if you're in an east
Asian locale, you'll get 2 for these, and otherwise 1, similarly to the way
that gnome-terminal uses VTE_CJK_WIDTH.
The only commentary mk_wcwidth() has about ambiguous character widths is in
the alternate _cjk implementation, which he doesn't recommend for general
use. I don't know if the Solaris approach (double-width in CJK locales,
single-width elsewhere) is common enough to want to make this
runtime-configurable in programs that care; for instance, zsh could have a
setopt flag to switch to double-width when the user knew they were in that
environment.
I'm a bit surprised that xterm's -cjk option isn't automatic -- shouldn't
it know whether the font it's loading is double-width or not? Either way,
it could respond to some escape code that programs which care (or even
wcwidth() itself or a standard replacement) could use to query it about the
current width. Perhaps that's the ideal solution?
I'd started talking to Thomas Dickey about this a couple of years ago (I
keep running into this problem, start talking to people about it, decide
it's too hard and I don't have enough time, and drop it until the next time
around); perhaps I could pick that thread up again with that suggestion?
FWIW, I tried xterm -cjk, both with my normal western font and with a CJK
font, and in both cases it handles U+2026 fine, putting it in a double-wide
box. Vim seemed to handle it, too.
> > I don't know why the zero-width combining character was chosen as the
> > test.
>
> The test was first introduced to detect a broken wcwidth() on Mac OS X,
> where wcwidth() returns 1 for combining characters.
Which seems unambiguously broken, unlike the one on Solaris.
Thanks,
Danek
prev parent reply other threads:[~2015-05-14 17:33 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-13 16:14 Danek Duvall
2015-05-13 17:43 ` Bart Schaefer
2015-05-13 18:29 ` Danek Duvall
2015-05-13 20:20 ` Bart Schaefer
2015-05-13 21:24 ` Chet Ramey
2015-05-14 16:43 ` Jun T.
2015-05-14 17:32 ` Danek Duvall [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150514173250.GB14025@lorien.comfychair.org \
--to=duvall@comfychair.org \
--cc=takimoto-j@kba.biglobe.ne.jp \
--cc=zsh-users@zsh.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/zsh/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).