On Wed, Sep 13, 2017 at 8:13 PM, Rich Felker wrote:
On Wed, Sep 13, 2017 at 12:05:19PM +0200, Reini Urban wrote:
> Wait a bit with that. I think I found some more Unicode 9.0 issues with the tables,
> and I’ve found a huge performance opportunity by sorting the 3 tables (mostly pairs),
> and break the loops earlier.
> This should come close to glibc table performance then, without the huge memory costs they have.
>
> I’ll write a perl regression testing script not to miss any more mappings, and maybe
> improve the current musl logic. This will need 1-2 days.
> I’ll also use it for cperl then.

Thanks for the update. I still need to publish the table generation
code for all the other tables -- I got it mostly dug up and cleaned up
but got interrupted last time so it's still not posted. With that it
will be possible to update other things too, not just case mappings.

A few of the existing tables are using an older version of the
tabulation code that formats the big arrays differently, so I'll
probably first make a commit to reformat them, so that it's possible
to mechanically check that this commit does not change the generated
.o files, then use the uniform formatting as the basis the subsequent
update to Unicode 9.0. That should not affect the case mapping file
though since it's not machine-generated.


I haven't yet seen your table generator, so I updated the tables with my version, as I
use them in safeclib.
Unicode 10.0 support plus sort tables for double search speed.

I also added a harmless patch to a check-syntax target for emacs flymake support.

-- Reini