* [musl] Hangul Jamo vowels and trailing consonants should probably be 0 width
@ 2021-12-27 22:38 Luis Javier Merino
2021-12-27 23:43 ` Rich Felker
0 siblings, 1 reply; 2+ messages in thread
From: Luis Javier Merino @ 2021-12-27 22:38 UTC (permalink / raw)
To: musl
Hello,
I've been looking at widths reported for Hangul Jamo in wcwidth implementations.
In glibc and MirBSD xterm, U+1160..U+11FF and U+D7B0..U+D7FF have 0 width.
In xterm/ncurses, glib(g_unichar_iszerowidth), and rust's
unicode-width U+1160..U+11FF have 0 width.
Konsole had U+1160..U+11FF with 0 width until October 2018, but moving
from a wcwidth() based on the Markus Kuhn one to one generated from
Unicode datafiles caused it to return width 1
(https://bugs.kde.org/show_bug.cgi?id=396435#c21).
libunistring, vim/NeoVim, ridiculousfish/widecharwidth seem to know
nothing about Hangul Jamo, and return width 1.
Some context follows:
Korean Hangul is a writing system which uses syllable blocks
consisting of alphabetic components. A syllable consists of one or
more Leading Consonants, one or more Vowels, and zero or more trailing
consonants.
Unicode has precomposed syllable blocks at U+AC00..U+D7A3 (11172).
There are also component Jamos:
Hangul Jamo (U+1100..U+11FF).
U+1100..U+115F Choseong (initial, Leading Consonants) have
East_Asian_Width=Wide and Hangul_Syllable_Type=Leading_Jamo
U+1160..U+11A7 Jungseong (medial, Vowels) have
East_Asian_Width=Neutral and Hangul_Syllable_Type=Vowel_Jamo
U+11A8..U+11FF Jongseong (final, Trailing consonants) have
East_Asian_Width=Neutral and Hangul_Syllable_Type=Trailing_Jamo
U+A960..U+A97F Hangul Jamo Extended-A (choseong) have East_Asian_Width=Wide
U+D7B0..U+D7FF Hangul Jamo Extended-B (jungseong and jongseong) have
East_Asian_Width=Neutral
U+3130..U+318F Hangul Compatibility Jamo have no conjoining behavior
U+FFA0..U+FFDF half-width forms have no conjoining behavior.
U+1100..U+11FF, U+A960..U+A97F, U+D7B0..U+D7FF have conjoining
behavior, a sequence of L+V+T* gets rendered as a syllable block.
wcwidth() implementations tend to give U+1100..U+115F width 2, and
U+1160..U+11FF width 0, so the resulting syllable block has the
correct total width.
U+D7B0..U+D7FF, should also have width 0.
glibc gave width 0 to conjoining jungseong and jongseong at:
commit 7a79e321c6f85b204036c33d85f6b2aa794e7c76
Author: Thorsten Glaser <tg@mirbsd.de>
Date: Fri Jul 14 14:02:50 2017 +0200
Refresh generated charmap data and ChangeLog
[BZ #21750]
* charmaps/UTF-8: Refresh.
diff --git a/localedata/ChangeLog b/localedata/ChangeLog
index 04ef5ad071..9e05b4a652 100644
--- a/localedata/ChangeLog
+++ b/localedata/ChangeLog
@@ -1,3 +1,17 @@
+2017-07-14 Thorsten Glaser <tg@mirbsd.de>
+
+ [BZ #21750]
+ * charmaps/UTF-8: Refresh.
+ * unicode-gen/utf8_gen.py (U+00AD): Set width to 1.
+ * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0.
+ * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2.
+ * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise.
+ * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining.
+ [BZ #19852]
+ * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before
+ UnicodeData lines so the latter have precedence; remove hack
+ to group output by EastAsianWidth ranges.
+
[ ... snip ...]
commit 6e540caa21616d5ec5511fafb22819204525138e
Author: Mike FABIAN <mfabian@redhat.com>
Date: Tue Jun 16 08:29:40 2020 +0200
Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to
0 [BZ #26120]
Reviewed-by: default avatarCarlos O'Donell <carlos@redhat.com>
diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8
index 14c5d4fa33..8cce47cd97 100644
--- a/localedata/charmaps/UTF-8
+++ b/localedata/charmaps/UTF-8
@@ -48920,6 +48920,8 @@ WIDTH
<UABE8> 0
<UABED> 0
<UAC00>...<UD7A3> 2
+<UD7B0>...<UD7C6> 0
+<UD7CB>...<UD7FB> 0
<UF900>...<UFA6D> 2
<UFA70>...<UFAD9> 2
<UFB1E> 0
Regards,
--
Luis Javier Merino Morán
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [musl] Hangul Jamo vowels and trailing consonants should probably be 0 width
2021-12-27 22:38 [musl] Hangul Jamo vowels and trailing consonants should probably be 0 width Luis Javier Merino
@ 2021-12-27 23:43 ` Rich Felker
0 siblings, 0 replies; 2+ messages in thread
From: Rich Felker @ 2021-12-27 23:43 UTC (permalink / raw)
To: Luis Javier Merino; +Cc: musl
On Mon, Dec 27, 2021 at 11:38:06PM +0100, Luis Javier Merino wrote:
> Hello,
>
> I've been looking at widths reported for Hangul Jamo in wcwidth implementations.
>
> In glibc and MirBSD xterm, U+1160..U+11FF and U+D7B0..U+D7FF have 0 width.
Thanks for reporting! Indeed this is a bug and possibly even a
regression since I thought it was right. It looks like it happened in
commit 1b0ce9af6d2aa7b92edaf3e9c631cb635bae22bd, "new wcwidth
implementation (fast table-based)" thanks to the Unicode data not
having this right. Indeed:
- R(0x1160, 0x11FF, 0),
I'll update the tools that generate the tables to account for the
omission.
Rich
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-12-27 23:44 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-27 22:38 [musl] Hangul Jamo vowels and trailing consonants should probably be 0 width Luis Javier Merino
2021-12-27 23:43 ` Rich Felker
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).