From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.org/gmane.linux.lib.musl.general/11925 Path: news.gmane.org!.POSTED!not-for-mail From: Rich Felker Newsgroups: gmane.linux.lib.musl.general Subject: Re: Re: [PATCH] towupper/towlower: Update to Unicode 9.0 Date: Wed, 13 Sep 2017 14:13:34 -0400 Message-ID: <20170913181334.GT1627@brightrain.aerifal.cx> References: <13F34D7B-8E99-483A-A5F5-F139D0D906B9@cpan.org> Reply-To: musl@lists.openwall.com NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1505326424 22015 195.159.176.226 (13 Sep 2017 18:13:44 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 13 Sep 2017 18:13:44 +0000 (UTC) User-Agent: Mutt/1.5.21 (2010-09-15) To: musl@lists.openwall.com Original-X-From: musl-return-11938-gllmg-musl=m.gmane.org@lists.openwall.com Wed Sep 13 20:13:41 2017 Return-path: Envelope-to: gllmg-musl@m.gmane.org Original-Received: from mother.openwall.net ([195.42.179.200]) by blaine.gmane.org with smtp (Exim 4.84_2) (envelope-from ) id 1dsCAO-0005d6-QO for gllmg-musl@m.gmane.org; Wed, 13 Sep 2017 20:13:40 +0200 Original-Received: (qmail 23847 invoked by uid 550); 13 Sep 2017 18:13:46 -0000 Mailing-List: contact musl-help@lists.openwall.com; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-ID: Original-Received: (qmail 23826 invoked from network); 13 Sep 2017 18:13:46 -0000 Content-Disposition: inline In-Reply-To: <13F34D7B-8E99-483A-A5F5-F139D0D906B9@cpan.org> Original-Sender: Rich Felker Xref: news.gmane.org gmane.linux.lib.musl.general:11925 Archived-At: On Wed, Sep 13, 2017 at 12:05:19PM +0200, Reini Urban wrote: > Wait a bit with that. I think I found some more Unicode 9.0 issues with the tables, > and I’ve found a huge performance opportunity by sorting the 3 tables (mostly pairs), > and break the loops earlier. > This should come close to glibc table performance then, without the huge memory costs they have. > > I’ll write a perl regression testing script not to miss any more mappings, and maybe > improve the current musl logic. This will need 1-2 days. > I’ll also use it for cperl then. Thanks for the update. I still need to publish the table generation code for all the other tables -- I got it mostly dug up and cleaned up but got interrupted last time so it's still not posted. With that it will be possible to update other things too, not just case mappings. A few of the existing tables are using an older version of the tabulation code that formats the big arrays differently, so I'll probably first make a commit to reformat them, so that it's possible to mechanically check that this commit does not change the generated .o files, then use the uniform formatting as the basis the subsequent update to Unicode 9.0. That should not affect the case mapping file though since it's not machine-generated. Rich