From mboxrd@z Thu Jan 1 00:00:00 1970 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-3.1 required=5.0 tests=DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FROM,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL autolearn=ham autolearn_force=no version=3.4.4 Received: (qmail 20469 invoked from network); 27 Dec 2021 22:55:43 -0000 Received: from mother.openwall.net (195.42.179.200) by inbox.vuxu.org with ESMTPUTF8; 27 Dec 2021 22:55:43 -0000 Received: (qmail 5672 invoked by uid 550); 27 Dec 2021 22:55:41 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Reply-To: musl@lists.openwall.com Received: (qmail 31964 invoked from network); 27 Dec 2021 22:30:55 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=TD/uLn9VK56xx1ypFyergSxwiTFeA/Z1/yg7YrSt+lc=; b=WimFCVWVcR+v4DldFCceFuArVFl3pD4mw6UxCydqMs0oix2aAvesydfCP04NglFp2g WjSeoHiT5P3U8qtwY4OPUgivm45MM0Qmh0tm3vKka2e+f2nKT3YyaPDIU37k2gBaOUpg Fj9ilmilIwr/UOZu19rFd4u+ED5jA8Tw4N0nq6ZRv18IKQ76WjdnwhUx3/KdRiLlARm1 SULUXNRR3lJOHzPSYP8aXlaji80j/9WWtyQVV46RDH8V9hzxxdS8lVh7ecSu9mbzBKyd 8i8XRyeojHFA+YThNVGo8n3zT2c3Mg072Ho+DN5anDYX7m9EwoulqV8kBkAJ5fsGvaWl vq0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to :content-transfer-encoding; bh=TD/uLn9VK56xx1ypFyergSxwiTFeA/Z1/yg7YrSt+lc=; b=oLzJtqIV+hpnY7uUqF5uW6aOSHQxztLmqRwPOPiH+3sXzXv1Pl8jF5yW5BPQjllgWI iy2yLxHGZuLe/ThvActNAjhfyeYcahEJ8SjRoXFsKnL9ZgBpok1DAz9E7ZSwdYX6n8H9 eb5TkxEd2FqHWRhdgWqt+E+6AwJlabqdDY2+akPizXJQww+40EXYoTJu8VbK2wEYkEdg vF5WjbvN2yROXtGk6oT9IMQq/YR9kQjWlHeGV5jFFG9qHGG4ir2dei+8rryeKPf3eXmY ixSqLgIP/8/M/llh2NN/KKZiOFR7smIuuaWec8jzSluN3L/tNKtFUR4jSGC2p8Dt6t4x BWIw== X-Gm-Message-State: AOAM530bvVbIN7fCmQxpDAPVoUQdl2U2cfK47m4VOz0MH8orjfZgpbd9 OuIJOpKeOMtaQPpjw97jU8D1FwfHk3VNGsKHuUPQsmpzAhM= X-Google-Smtp-Source: ABdhPJwOt7I3mTmy69k37AB/0P5zHhM2rIVNQ6+I6bUd28EKw+MElvCFllKMLlld35FVZQZ5vYQJHDcvMB3T/ZpiyQ0= X-Received: by 2002:a17:906:9756:: with SMTP id o22mr15069852ejy.324.1640644244005; Mon, 27 Dec 2021 14:30:44 -0800 (PST) MIME-Version: 1.0 From: Luis Javier Merino Date: Mon, 27 Dec 2021 23:38:06 +0100 Message-ID: To: musl@lists.openwall.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Subject: [musl] Hangul Jamo vowels and trailing consonants should probably be 0 width Hello, I've been looking at widths reported for Hangul Jamo in wcwidth implementat= ions. In glibc and MirBSD xterm, U+1160..U+11FF and U+D7B0..U+D7FF have 0 width. In xterm/ncurses, glib(g_unichar_iszerowidth), and rust's unicode-width U+1160..U+11FF have 0 width. Konsole had U+1160..U+11FF with 0 width until October 2018, but moving from a wcwidth() based on the Markus Kuhn one to one generated from Unicode datafiles caused it to return width 1 (https://bugs.kde.org/show_bug.cgi?id=3D396435#c21). libunistring, vim/NeoVim, ridiculousfish/widecharwidth seem to know nothing about Hangul Jamo, and return width 1. Some context follows: Korean Hangul is a writing system which uses syllable blocks consisting of alphabetic components. A syllable consists of one or more Leading Consonants, one or more Vowels, and zero or more trailing consonants. Unicode has precomposed syllable blocks at U+AC00..U+D7A3 (11172). There are also component Jamos: Hangul Jamo (U+1100..U+11FF). U+1100..U+115F Choseong (initial, Leading Consonants) have East_Asian_Width=3DWide and Hangul_Syllable_Type=3DLeading_Jamo U+1160..U+11A7 Jungseong (medial, Vowels) have East_Asian_Width=3DNeutral and Hangul_Syllable_Type=3DVowel_Jamo U+11A8..U+11FF Jongseong (final, Trailing consonants) have East_Asian_Width=3DNeutral and Hangul_Syllable_Type=3DTrailing_Jamo U+A960..U+A97F Hangul Jamo Extended-A (choseong) have East_Asian_Width=3DWi= de U+D7B0..U+D7FF Hangul Jamo Extended-B (jungseong and jongseong) have East_Asian_Width=3DNeutral U+3130..U+318F Hangul Compatibility Jamo have no conjoining behavior U+FFA0..U+FFDF half-width forms have no conjoining behavior. U+1100..U+11FF, U+A960..U+A97F, U+D7B0..U+D7FF have conjoining behavior, a sequence of L+V+T* gets rendered as a syllable block. wcwidth() implementations tend to give U+1100..U+115F width 2, and U+1160..U+11FF width 0, so the resulting syllable block has the correct total width. U+D7B0..U+D7FF, should also have width 0. glibc gave width 0 to conjoining jungseong and jongseong at: commit 7a79e321c6f85b204036c33d85f6b2aa794e7c76 Author: Thorsten Glaser Date: Fri Jul 14 14:02:50 2017 +0200 Refresh generated charmap data and ChangeLog [BZ #21750] * charmaps/UTF-8: Refresh. diff --git a/localedata/ChangeLog b/localedata/ChangeLog index 04ef5ad071..9e05b4a652 100644 --- a/localedata/ChangeLog +++ b/localedata/ChangeLog @@ -1,3 +1,17 @@ +2017-07-14 Thorsten Glaser + + [BZ #21750] + * charmaps/UTF-8: Refresh. + * unicode-gen/utf8_gen.py (U+00AD): Set width to 1. + * unicode-gen/utf8_gen.py (U+1160..U+11FF): Set width to 0. + * unicode-gen/utf8_gen.py (U+3248..U+324F): Set width to 2. + * unicode-gen/utf8_gen.py (U+4DC0..U+4DFF): Likewise. + * unicode-gen/utf8_gen.py: Treat category Me and Mn as combining. + [BZ #19852] + * unicode-gen/utf8_gen.py: Process EastAsianWidth lines before + UnicodeData lines so the latter have precedence; remove hack + to group output by EastAsianWidth ranges. + [ ... snip ...] commit 6e540caa21616d5ec5511fafb22819204525138e Author: Mike FABIAN Date: Tue Jun 16 08:29:40 2020 +0200 Set width of JUNGSEONG/JONGSEONG characters from UD7B0 to UD7FB to 0 [BZ #26120] Reviewed-by: default avatarCarlos O'Donell diff --git a/localedata/charmaps/UTF-8 b/localedata/charmaps/UTF-8 index 14c5d4fa33..8cce47cd97 100644 --- a/localedata/charmaps/UTF-8 +++ b/localedata/charmaps/UTF-8 @@ -48920,6 +48920,8 @@ WIDTH 0 0 ... 2 +... 0 +... 0 ... 2 ... 2 0 Regards, -- Luis Javier Merino Mor=C3=A1n