From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/12377 Path: news.gmane.org!.POSTED!not-for-mail From: Eric Pruitt Newsgroups: gmane.linux.lib.musl.general Subject: Updating Unicode support Date: Mon, 22 Jan 2018 17:54:49 -0800 Message-ID: <20180123015446.vera7ocpvgaqvkss@sinister.lan.codevat.com> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: blaine.gmane.org 1516672412 12045 195.159.176.226 (23 Jan 2018 01:53:32 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Tue, 23 Jan 2018 01:53:32 +0000 (UTC) User-Agent: NeoMutt/20170113 (1.7.2) To: musl@lists.openwall.com Original-X-From: musl-return-12393-gllmg-musl=m.gmane.org@lists.openwall.com Tue Jan 23 02:53:28 2018 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1ednlw-0001zZ-59 for gllmg-musl@m.gmane.org; Tue, 23 Jan 2018 02:53:12 +0100 Original-Received: (qmail 32345 invoked by uid 550); 23 Jan 2018 01:55:03 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 32310 invoked from network); 23 Jan 2018 01:55:03 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:subject:message-id:mime-version:content-disposition :pgp-key:user-agent; bh=QZRcWFFcfmQGF/KZHYD77uqaex+/gCUslDRIKj0t/6E=; b=r/MwFdsoxYWmmDnubvZguGEZCJQhxk6H0DjwZj4z4u1rnxhmJI3M1YRmRw+EXCedvS rOxz8OYtjhuuvUoiUPbMgpA+k+S/YgNwRb/Ufq8CciLcBx0m0PNMk3IzXucU0OPY94Uc 9HKJN5MaoKlifageWMw18LOoQ/cfyNzQ3yXcysVsty7NyA38IS9T5Swa1V8vDll6HcOY YKNTMI+pK8EWVZWgbgxJ2giVWUYQAmGxohAnv4yC5ZpBQr3kieThFjyLkij47WFWvw5J Ge6uaAjDhsjmgzfWvb0/chZaMNuSij1EdCm10skQ/2+CYeSl8MVB4+WmPoqLj74FvK/y eRnw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:mime-version :content-disposition:pgp-key:user-agent; bh=QZRcWFFcfmQGF/KZHYD77uqaex+/gCUslDRIKj0t/6E=; b=lBWJSohwB0TwTm2rqmyTHOqIrDhXW2myyOUKar41gaQX2jRuwVCCSbB9SMSemVKDe9 TXgOlkoujjZ3AcR19+MITdn/hjq9uEjY7VXJsZSS4Gbj3FlI7uzhaldGsKXKiQx9aD7Y boHsj6eJnAr9a+/wpmKfZU9XMF0BuwXgWCWtESZ7GyNDKcWNBSIU3WrI7x7RvxGCcCs/ A9htBT7E9C7DlSq2yT6tZif+aPEX95hXoBlt5faL3hbWXTZcPQhtSLyJo/ozXcd0gR9l 3Fru5s0vDTD/bhg9NgM2bWA03uSE1NHfYxlcPYm3D3jgH2WikQ4TxTvtv+Gv2aQKCsbQ +pBQ== X-Gm-Message-State: AKwxytcxjKzt9dUfgVvNxaP/dng6HamcHLuU6SdkEw373us2kEpBUpvB vSHtlo5PnnGzmfulY4rQAZpLSg== X-Google-Smtp-Source: AH8x225OQ+fdB3oF0tJLVUSnCUIiy+4QXSCkZROtr0sQ6KSgvMsW8fiGLgpkwvxDQUsoZK9iX8iAMQ== X-Received: by 10.98.63.93 with SMTP id m90mr9255208pfa.231.1516672490912; Mon, 22 Jan 2018 17:54:50 -0800 (PST) Content-Disposition: inline PGP-Key: https://www.codevat.com/pgp.asc#F8601B5D2511B4C3535232488DDDE2E6053692AB Xref: news.gmane.org gmane.linux.lib.musl.general:12377 Archived-At: NOTE: When I first started writing this email, I didn't realize musl's Unicode property table had recently been updated, but I noticed when I was looking up commit IDs to cite. I'm leaving most of the verbiage below unchanged since I think it adds useful context. The Unicode property data used by musl has not been updated in quite some time, and due to changes introduced in recent publications of the Unicode standard, musl's width data is incorrect for many symbols -- notably emoji. This can lead to rendering glitches in terminals when some applications are not built with musl; for example, my terminal emulator is dynamically linked against a version of GNU libc that supports Unicode 9 (released June 21, 2016) whereas musl's table was lasted updated in 2011 or 2012 (commit 1b0ce9a). To resolve this problem, I wrote a drop-in replacement for musl's wcwidth(3) implementation that uses utf8proc (https://github.com/JuliaLang/utf8proc) as the source of truth. You can find the code for this at . I am wondering if the musl developers would consider accepting a patch that implements optional / configurable support for utf8proc. The utf8proc-wcwidth.c file I linked to includes some additional code unrelated to musl making it possible to use the file as an LD_PRELOAD library. The LD_PRELOAD stuff would **not** be include in the proposed patch. I'm also investigating implementing the Unicode Collation Algorithm (https://unicode.org/reports/tr10/) for wcscoll(3); would that be of interest? Thanks, Eric