From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/14865 Path: news.gmane.org!.POSTED.blaine.gmane.org!not-for-mail From: Eleftherios Kritikos Newsgroups: gmane.linux.lib.musl.general Subject: Re: [PATCH] Update ctype data to Unicode 12.1.0 Date: Fri, 25 Oct 2019 17:29:22 +0300 Message-ID: References: <20191012212742.29880-1-el01049@gmail.com> <20191012223947.GH16318@brightrain.aerifal.cx> <20191014130709.GL16318@brightrain.aerifal.cx> <20191020145915.GD16318@brightrain.aerifal.cx> <20191025141514.GU16318@brightrain.aerifal.cx> Reply-To: musl@lists.openwall.com Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000009f9fd20595bcfa22" Injection-Info: blaine.gmane.org; posting-host="blaine.gmane.org:195.159.176.226"; logging-data="159725"; mail-complaints-to="usenet@blaine.gmane.org" To: musl@lists.openwall.com, Rich Felker Original-X-From: musl-return-14881-gllmg-musl=m.gmane.org@lists.openwall.com Fri Oct 25 16:29:49 2019 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.89) (envelope-from ) id 1iO0b7-000fOQ-88 for gllmg-musl@m.gmane.org; Fri, 25 Oct 2019 16:29:49 +0200 Original-Received: (qmail 21760 invoked by uid 550); 25 Oct 2019 14:29:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 21742 invoked from network); 25 Oct 2019 14:29:46 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=EiZjTeHczXVBFLceiH8d7m6Pmenpr3BYEqJXiMlJiW8=; b=tU2a065qXvFTmUf8bFOi8tRguTnq+/dW02/ZyfcjfvGjHlStvUjEyKVGVxYtGo3CHi OCp6rBPDF+/4D9/GkqFPK63xir/XrB6BCo9gISOibpiRyzpAMn582bJVb3M8Q0w6o8EW JaeG+Fm8VZ70RJAKuAPrTJFBqzUFhosv7CGDXoOCo2E/+N3P5DFND7MNOSfcF6/tW0Oi gozjqnUBupEUjYofg3RSnWjP9tp0qm1RTAtXHZzj+CBP2bJ+MkUF0oxhRcY91E5m9DEZ iN86IGEKGZrBAxS94KTnsunTLd0ch7gttopyNAvWP91zd78WF/zkC02VkcYmpsAwdg4G NB6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=EiZjTeHczXVBFLceiH8d7m6Pmenpr3BYEqJXiMlJiW8=; b=eYWbhAiDwUQUGAeKj2O/v366cGEAWHM9Z+3ZXPZsevppJhtQ0d0DGgUJnqvnYYP8jM Y9nsfddLPkl6dt71YLQXE7SkaEP1MWQ40VuABazTDrRZarkI/8cW/D3ixv9CQ622YAMK 7gJLHWAmPFQUUrM0Z97jOiYukdGYeVvvprKs8MIoLNks8IrpxopzgZ37hJcNIr/De8Fs E7d2f+Ry17HOkn4Pr/eymnv2XJP/CPEcbzUWFu8rVQ2a64fK14DIS0g8wfHcEr4VORn6 TcxzG1IhfMqvf/UwnNWkTel+kNFhAKjZFq2fkZwEVndpxnfO65YU0/kqd3VR/RQ6lUhs xu+A== X-Gm-Message-State: APjAAAX2PbJ+y9eq6jE2LBSW0yo2AXyK8tYelf5/i/Aqxtq24V+L+XBn c9BkVNAYwhUxKhOoKmssDTY3Ge6DUNFiD3gvYlv0DedD X-Google-Smtp-Source: APXvYqx3nZdJxR/XnKHiBrUe6A7ycuDZd4gQPgVPm8kcbZmXKxJu3QX6Ow8sC+pxXs848JQYUjOS5260Dv0D0GyUCNM= X-Received: by 2002:a2e:b17b:: with SMTP id a27mr2739163ljm.243.1572013774707; Fri, 25 Oct 2019 07:29:34 -0700 (PDT) In-Reply-To: <20191025141514.GU16318@brightrain.aerifal.cx> Xref: news.gmane.org gmane.linux.lib.musl.general:14865 Archived-At: --0000000000009f9fd20595bcfa22 Content-Type: text/plain; charset="UTF-8" Thanks for all this good work! There is no time pressure for me btw as I am not depending on this patch. I am using musl for static binaries though, and that's great! On Fri, 25 Oct 2019, 5:15 pm Rich Felker, wrote: > On Wed, Oct 23, 2019 at 07:21:35PM +0300, Eleftherios Kritikos wrote: > > Hi all, > > > > I wanted to mention that I have used the code for `wcwidth`[1] and for > > generating Unicode data tables[2] from musl in the Haskell library > > vty[3] (a ncurses style library). > > > > Relevant files in the MR: > > * > https://github.com/jtdaugherty/vty/pull/179/files#diff-ab3908e00d1c13397ed03e5c2213ad8bR5 > > * > https://github.com/jtdaugherty/vty/pull/179/files#diff-a06fd5aeeca6d7dac0278c2537eb1950R1 > > * > https://github.com/jtdaugherty/vty/pull/179/files#diff-86acb7ffecd1a09c5f55892bd0ce13b1R1 > > * > https://github.com/jtdaugherty/vty/pull/179/files#diff-dc77683ad25ad6f509fb58a397c93f4aR1 > > * > https://github.com/jtdaugherty/vty/pull/179/files#diff-9879d6db96fd29134fc802214163b95aR32 > > > > Thanks Rich Felker and everyone else for all the good work that has > > gone into musl! > > > > Please let me know if you think attribution was not properly given. > > > > 1. > http://git.musl-libc.org/cgit/musl/tree/src/ctype/wcwidth.c?id=9b2921bea1d5017832e1b45d1fd64220047a9802 > > 2.https://github.com/richfelker/musl-chartable-tools/tree/master/ctype > > 3. https://github.com/jtdaugherty/vty > > Great! I love seeing code/concepts from musl getting adopted elsewhere > especially in places where the classic solutions were all much larger. > > Just a quick update on why I haven't merged this yet: I went to do the > case mappings too, and found that at least one range, I believe the > one that would be CASEMAP(0x1c90,0x1cba,0x10d0), is not representable > in the current code that requires updating by hand (it could be done > on a char-by-char basis but continuing to expand that part makes the > file grow larger and slower very quickly). > > So, I'm pulling back up the proposed replacement code from April 2018 > that never got finished and merged. The old thread is here: > https://www.openwall.com/lists/musl/2018/04/05/1 > > It's moderately larger -- ~4.8k instead of ~1.5k for Unicode 10 -- but > O(1) rather than O(n) (n = # of case mappings), about 10x faster, and > programmatically generated from UnicodeData.txt. I'll add the (awful, > ugly, just like everything else in musl-chartable-tools) code for > generating the table to musl-chartable-tools when I merge it so it's > not a black box. > > I have it working now, so as long as I don't hit any unexpected > problems testing I'll get this (and your patch, and updating case > mappings to Unicode 12) merged soon. > > Thanks again for sending the patch and pinging this. > > Rich > --0000000000009f9fd20595bcfa22 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for all this good work!

=
There is no time pressure for me btw as I am not de= pending on this patch. I am using musl for static binaries though, and that= 's great!


On Fri, 25 Oct 2019, 5:15 pm Rich Felker, <dalias@libc.org> wrote:
On Wed, Oct 23, 2019 at 07:21:35PM +0300, Eleftherios Kr= itikos wrote:
> Hi all,
>
> I wanted to mention that I have used the code for `wcwidth`[1] and for=
> generating Unicode data tables[2] from musl in the Haskell library
> vty[3] (a ncurses style library).
>
> Relevant files in the MR:
>=C2=A0 * https://github.com/jtdaugherty/vty/pull/179/files#diff-ab3908e= 00d1c13397ed03e5c2213ad8bR5
>=C2=A0 * https://github.com/jtdaugherty/vty/pull/179/files#diff-a06fd5a= eeca6d7dac0278c2537eb1950R1
>=C2=A0 * https://github.com/jtdaugherty/vty/pull/179/files#diff-86acb7f= fecd1a09c5f55892bd0ce13b1R1
>=C2=A0 * https://github.com/jtdaugherty/vty/pull/179/files#diff-dc77683= ad25ad6f509fb58a397c93f4aR1
>=C2=A0 * https://github.com/jtdaugherty/vty/pull/179/files#diff-9879d6= db96fd29134fc802214163b95aR32
>
> Thanks Rich Felker and everyone else for all the good work that has > gone into musl!
>
> Please let me know if you think attribution was not properly given. >
> 1.http://git.musl-libc.org/cgit/musl/tree/src/ctype/wc= width.c?id=3D9b2921bea1d5017832e1b45d1fd64220047a9802
> 2.https://github= .com/richfelker/musl-chartable-tools/tree/master/ctype
> 3. https://github.com/jtdaugherty/vty

Great! I love seeing code/concepts from musl getting adopted elsewhere
especially in places where the classic solutions were all much larger.

Just a quick update on why I haven't merged this yet: I went to do the<= br> case mappings too, and found that at least one range, I believe the
one that would be CASEMAP(0x1c90,0x1cba,0x10d0), is not representable
in the current code that requires updating by hand (it could be done
on a char-by-char basis but continuing to expand that part makes the
file grow larger and slower very quickly).

So, I'm pulling back up the proposed replacement code from April 2018 that never got finished and merged. The old thread is here:
https://www.openwall.com/lists/musl/2018/= 04/05/1

It's moderately larger -- ~4.8k instead of ~1.5k for Unicode 10 -- but<= br> O(1) rather than O(n) (n =3D # of case mappings), about 10x faster, and
programmatically generated from UnicodeData.txt. I'll add the (awful, ugly, just like everything else in musl-chartable-tools) code for
generating the table to musl-chartable-tools when I merge it so it's not a black box.

I have it working now, so as long as I don't hit any unexpected
problems testing I'll get this (and your patch, and updating case
mappings to Unicode 12) merged soon.

Thanks again for sending the patch and pinging this.

Rich
--0000000000009f9fd20595bcfa22--