mailing list of musl libc
 help / color / mirror / code / Atom feed
* Efficient case mapping
@ 2018-03-06 20:27 Rich Felker
  2018-03-07  3:48 ` Rich Felker
  2018-03-07 15:44 ` Rich Felker
  0 siblings, 2 replies; 5+ messages in thread
From: Rich Felker @ 2018-03-06 20:27 UTC (permalink / raw)
  To: musl

Replacing the compact but slow linear-search-based case mapping tables
with something comparably compact but fast has been on the hidden
roadmap for a while now, and I think I have a good solution:

Analysis of Unicode blocks (256-char) shows that most blocks have
either no case mappings at all, or just a single common delta between
lower and upper case for the whole block. There are about 11 blocks
that break this rule and have more. The most unique deltas with
nontrivial frequency (same delta used by more than a single-digit
number of characters) in any block is 3, appearing only in block 04.

As such, a single-direction case mapping can be achieved as follows.
Use two multi-level bit tables (same format as ctype bits) to assign a
2-bit number to each codepoint, and 2 integer deltas (the two most
common) to each block. The meanings are:

00 - no case mapping
01 - map via block's first delta
10 - map via block's second delta
11 - exception; binary search tiny table of exceptions

Obviously this can be extended to two directions via a second pair of
tables, but I think we can do better, just 3 bits total:

000 - no case mappings
001 - is lowercase; map to uppercase via block's first toupper delta
010 - is uppercase; map to lowercase via block's first tolower delta
011 - exception; binary search tiny table of exceptions
100 - is lowercase; map to uppercase via block's second toupper delta
101 - is lowercase; map to uppercase via block's third toupper delta
110 - is uppercase; map to lowercase via block's second tolower delta
111 - is uppercase; map to lowercase via block's third tolower delta

Most blocks will use no bits at all (no case mappings). A few blocks
that only contain lowercase with a single delta will use just one bit.
Many blocks with case mappings will use 2 bits (only one frequent
delta for each direction). The remaining few will use 3 bits, and now
they can code not just 2 but 3 frequent deltas in each direction,
getting rid of a lot of exceptions.

Some quick size estimates (hope I didn't mess this up):

11 complex blocks -- 11 blocks * 256/8 bytes * 3 bits
7 simple blocks -- 7 blocks * 256/8 bytes * 1 bit
Top-level tables -- 512 blocks * 1 byte * 3 tables
Exceptions -- ~96 pairs * 4 bytes * 2 sortings

Looks like it comes to about 3k total.

Further optimizing: We're not using all the data we already have. Only
alphabetic characters can have case mappings, and we already have an
iswalpha table; the above tables are highly correlated with it.

If we short-circuit out the whole operation with iswalpha, then the
"000 - no case mappings" value is no longer common. Instead, "zero is
one of the common deltas" can be included or not on a per-block basis.

We can also add, along with the per-block common deltas, a table of
which cases each delta goes with, and assign new bit meanings:

00 - is case[0] for block, map to ~case[0] via delta[0]
01 - is case[1] for block, map to ~case[1] via delta[1]
10 - is case[2] for block, map to ~case[2] via delta[2]
11 - is exception, map via binary search of exceptions table

If needed we could add back a third bit here, but I think we can get
by without it.

Note that, because iswalpha is now short-circuiting non-alphabetic
characters, we can just assign an arbitrary table value (best: 00) to
all non-alphabetic (including unassigned) characters. This drops most
of the "simple blocks" (blocks with just 0 or a single delta) down to
trivial (elided) blocks.

These last ideas ("further optimizing:" and beyond) were worked out
while writing this email, and I don't yet have any analysis/test code
to go with them. Will follow up later with results. With some luck,
final version will be barely-larger than current code, extensible, and
fast. :)

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Efficient case mapping
  2018-03-06 20:27 Efficient case mapping Rich Felker
@ 2018-03-07  3:48 ` Rich Felker
  2018-03-07 18:25   ` Rich Felker
  2018-03-07 15:44 ` Rich Felker
  1 sibling, 1 reply; 5+ messages in thread
From: Rich Felker @ 2018-03-07  3:48 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 3761 bytes --]

On Tue, Mar 06, 2018 at 03:27:41PM -0500, Rich Felker wrote:
> Further optimizing: We're not using all the data we already have. Only
> alphabetic characters can have case mappings, and we already have an
> iswalpha table; the above tables are highly correlated with it.
> 
> If we short-circuit out the whole operation with iswalpha, then the
> "000 - no case mappings" value is no longer common. Instead, "zero is
> one of the common deltas" can be included or not on a per-block basis.
> 
> We can also add, along with the per-block common deltas, a table of
> which cases each delta goes with, and assign new bit meanings:
> 
> 00 - is case[0] for block, map to ~case[0] via delta[0]
> 01 - is case[1] for block, map to ~case[1] via delta[1]
> 10 - is case[2] for block, map to ~case[2] via delta[2]
> 11 - is exception, map via binary search of exceptions table
> 
> If needed we could add back a third bit here, but I think we can get
> by without it.
> 
> Note that, because iswalpha is now short-circuiting non-alphabetic
> characters, we can just assign an arbitrary table value (best: 00) to
> all non-alphabetic (including unassigned) characters. This drops most
> of the "simple blocks" (blocks with just 0 or a single delta) down to
> trivial (elided) blocks.
> 
> These last ideas ("further optimizing:" and beyond) were worked out
> while writing this email, and I don't yet have any analysis/test code
> to go with them. Will follow up later with results. With some luck,
> final version will be barely-larger than current code, extensible, and
> fast. :)

I now have results for this approach which I'm attaching. The output
is from my block analysis tool. For each block, it displays case
mapping rules sorted by decreasing frequency, with characters that
don't matter (because they're non-alphabetic) merged into the most
frequent rule. The syntax is:

  rule d(t) [f]

where t is the character type (n = no case, u = uppercase, l =
lowercase, x = exceptional/titlecase), d is the delta to the opposite
case character, and f is the frequency of the rule (again, counting
merging of dontcare into the most frequent).

While the vast majority of blocks are fine with 0-2 bits of rule data,
a few do have more than 3 fairly-frequent rules, the worst being
blocks 04, 05, and 2C. Having an extra bit would save at least 169
exceptions in those, and thus ~676 bytes. The fixed cost of an extra
table bit is 512 bytes (for the first level), plus 32 bytes per block
that contains data, so it's pretty close to break-even, but keeping
the exception table small does help performance too. My leaning would
be to add the extra bit, but write a cost function so that it's only
used where the 4th-7th frequencies for the block total more than 8
(the number of exceptions to break even at 32 bytes).

With that approach it looks like we have about 220 entries in the
exception tables, 4-5 blocks with a third bit, 21 blocks with a second
bit, 24 blocks with a first bit, and 3 table headers. That comes to
about 880+160+672+768+1536 = 4k.

One thing I haven't accounted for is storing the 3 (or 7) deltas per
block. The raw deltas are signed 17-bit, but if we assume mappings
don't cross planes they can be thought of as unsigned 16-bit on the
low 16 bits of the codepoint. But that's still pretty large -- 3
16-bit numbers by 512 blocks is 3k. Fortunately I think we can do a
lot better. First, have a table of all the deltas that ever appear
(about 20). Then, for each block, have 0-7 single-byte indices into
this table, and a single-byte index to the sequence of indices. Now we
just have about 2*20+512+5*7+16*3+3*1 = 638 bytes.

4.6k is a good bit more than I was planning on spending here, so maybe
there are still improvements to be made...

Rich

[-- Attachment #2: caseblocks1.txt --]
[-- Type: text/plain, Size: 4405 bytes --]

block 0:
  rule 32(u) [195]
  rule -32(l) [56]
  rule 0(n) [3]
  rule 7615(l) [1]
  rule 121(l) [1]
block 1:
  rule 1(u) [99]
  rule -1(l) [99]
  rule 0(n) [14]
  rule 2(u) [4]
  rule 0(x) [4]
  rule -2(l) [4]
  rule 218(u) [3]
  rule 205(u) [3]
  rule 217(u) [2]
  rule 211(u) [2]
  rule 219(u) [1]
  rule 214(u) [1]
  rule 213(u) [1]
  rule 210(u) [1]
  rule 209(u) [1]
  rule 207(u) [1]
  rule 206(u) [1]
  rule 203(u) [1]
  rule 202(u) [1]
  rule 195(l) [1]
  rule 163(l) [1]
  rule 130(l) [1]
  rule 97(l) [1]
  rule 79(u) [1]
  rule 56(l) [1]
  rule -56(u) [1]
  rule -79(l) [1]
  rule -97(u) [1]
  rule -121(u) [1]
  rule -199(u) [1]
  rule -232(l) [1]
block 2:
  rule 0(n) [147]
  rule 1(u) [32]
  rule -1(l) [32]
  rule -205(l) [3]
  rule -218(l) [3]
  rule 42308(l) [2]
  rule 10815(l) [2]
  rule -211(l) [2]
  rule -217(l) [2]
  rule 42319(l) [1]
  rule 42315(l) [1]
  rule 42305(l) [1]
  rule 42282(l) [1]
  rule 42280(l) [1]
  rule 42261(l) [1]
  rule 42258(l) [1]
  rule 10795(u) [1]
  rule 10792(u) [1]
  rule 10783(l) [1]
  rule 10782(l) [1]
  rule 10780(l) [1]
  rule 10749(l) [1]
  rule 10743(l) [1]
  rule 10727(l) [1]
  rule 71(u) [1]
  rule 69(u) [1]
  rule -69(l) [1]
  rule -71(l) [1]
  rule -130(u) [1]
  rule -163(u) [1]
  rule -195(u) [1]
  rule -202(l) [1]
  rule -203(l) [1]
  rule -206(l) [1]
  rule -207(l) [1]
  rule -209(l) [1]
  rule -210(l) [1]
  rule -213(l) [1]
  rule -214(l) [1]
  rule -219(l) [1]
block 3:
  rule 32(u) [152]
  rule -32(l) [26]
  rule 1(u) [17]
  rule -1(l) [17]
  rule 0(n) [8]
  rule 130(l) [3]
  rule 37(u) [3]
  rule -37(l) [3]
  rule -130(u) [3]
  rule 63(u) [2]
  rule -63(l) [2]
  rule 116(u) [1]
  rule 84(l) [1]
  rule 64(u) [1]
  rule 38(u) [1]
  rule 8(u) [1]
  rule 7(l) [1]
  rule -7(u) [1]
  rule -8(l) [1]
  rule -31(l) [1]
  rule -38(l) [1]
  rule -47(l) [1]
  rule -54(l) [1]
  rule -57(l) [1]
  rule -60(u) [1]
  rule -62(l) [1]
  rule -64(l) [1]
  rule -80(l) [1]
  rule -86(l) [1]
  rule -96(l) [1]
  rule -116(l) [1]
block 4:
  rule 1(u) [83]
  rule -1(l) [75]
  rule 32(u) [32]
  rule -32(l) [32]
  rule 80(u) [16]
  rule -80(l) [16]
  rule 15(u) [1]
  rule -15(l) [1]
block 5:
  rule 0(n) [132]
  rule 48(u) [38]
  rule -48(l) [38]
  rule 1(u) [24]
  rule -1(l) [24]
block 10:
  rule 0(n) [216]
  rule 7264(u) [40]
block 13:
  rule 0(n) [164]
  rule 38864(u) [80]
  rule 8(u) [6]
  rule -8(l) [6]
block 1c:
  rule 0(n) [247]
  rule -6242(l) [2]
  rule 35266(l) [1]
  rule -6181(l) [1]
  rule -6236(l) [1]
  rule -6243(l) [1]
  rule -6244(l) [1]
  rule -6253(l) [1]
  rule -6254(l) [1]
block 1d:
  rule 0(n) [254]
  rule 35332(l) [1]
  rule 3814(l) [1]
block 1e:
  rule 1(u) [123]
  rule -1(l) [123]
  rule 0(n) [8]
  rule -59(l) [1]
block 1f:
  rule 8(l) [116]
  rule -8(u) [78]
  rule 0(n) [25]
  rule 86(l) [4]
  rule -86(u) [4]
  rule 9(l) [3]
  rule -9(u) [3]
  rule 128(l) [2]
  rule 126(l) [2]
  rule 112(l) [2]
  rule 100(l) [2]
  rule 74(l) [2]
  rule -74(u) [2]
  rule -100(u) [2]
  rule -112(u) [2]
  rule -126(u) [2]
  rule -128(u) [2]
  rule 7(l) [1]
  rule -7(u) [1]
  rule -7205(l) [1]
block 21:
  rule 0(n) [217]
  rule 16(u) [16]
  rule -16(l) [16]
  rule 28(u) [1]
  rule 1(u) [1]
  rule -1(l) [1]
  rule -28(l) [1]
  rule -7517(u) [1]
  rule -8262(u) [1]
  rule -8383(u) [1]
block 24:
  rule 26(u) [230]
  rule -26(l) [26]
block 2c:
  rule 1(u) [82]
  rule -1(l) [59]
  rule 48(u) [47]
  rule -48(l) [47]
  rule 0(n) [10]
  rule -10815(u) [2]
  rule -3814(u) [1]
  rule -10727(u) [1]
  rule -10743(u) [1]
  rule -10749(u) [1]
  rule -10780(u) [1]
  rule -10782(u) [1]
  rule -10783(u) [1]
  rule -10792(l) [1]
  rule -10795(l) [1]
block 2d:
  rule 0(n) [216]
  rule -7264(l) [40]
block a6:
  rule 0(n) [182]
  rule 1(u) [37]
  rule -1(l) [37]
block a7:
  rule 1(u) [151]
  rule -1(l) [60]
  rule 0(n) [34]
  rule -42308(u) [2]
  rule 928(u) [1]
  rule -35332(u) [1]
  rule -42258(u) [1]
  rule -42261(u) [1]
  rule -42280(u) [1]
  rule -42282(u) [1]
  rule -42305(u) [1]
  rule -42315(u) [1]
  rule -42319(u) [1]
block ab:
  rule 0(n) [175]
  rule -38864(l) [80]
  rule -928(l) [1]
block ff:
  rule 0(n) [204]
  rule 32(u) [26]
  rule -32(l) [26]
block 104:
  rule 0(n) [104]
  rule 40(u) [76]
  rule -40(l) [76]
block 10c:
  rule 0(n) [154]
  rule 64(u) [51]
  rule -64(l) [51]
block 118:
  rule 32(u) [223]
  rule -32(l) [32]
  rule 0(n) [1]
block 1e9:
  rule 34(u) [221]
  rule -34(l) [34]
  rule 0(n) [1]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Efficient case mapping
  2018-03-06 20:27 Efficient case mapping Rich Felker
  2018-03-07  3:48 ` Rich Felker
@ 2018-03-07 15:44 ` Rich Felker
  1 sibling, 0 replies; 5+ messages in thread
From: Rich Felker @ 2018-03-07 15:44 UTC (permalink / raw)
  To: musl

On Tue, Mar 06, 2018 at 03:27:41PM -0500, Rich Felker wrote:
> Replacing the compact but slow linear-search-based case mapping tables
> with something comparably compact but fast has been on the hidden
> roadmap for a while now, and I think I have a good solution:
> 
> Analysis of Unicode blocks (256-char) shows that most blocks have
> either no case mappings at all, or just a single common delta between
> lower and upper case for the whole block. There are about 11 blocks
> that break this rule and have more. The most unique deltas with
> nontrivial frequency (same delta used by more than a single-digit
> number of characters) in any block is 3, appearing only in block 04.
> 
> As such, a single-direction case mapping can be achieved as follows.
> Use two multi-level bit tables (same format as ctype bits) to assign a
> 2-bit number to each codepoint, and 2 integer deltas (the two most
> common) to each block. The meanings are:
> 
> 00 - no case mapping
> 01 - map via block's first delta
> 10 - map via block's second delta
> 11 - exception; binary search tiny table of exceptions
> 
> Obviously this can be extended to two directions via a second pair of
> tables, but I think we can do better, just 3 bits total:
> 
> 000 - no case mappings
> 001 - is lowercase; map to uppercase via block's first toupper delta
> 010 - is uppercase; map to lowercase via block's first tolower delta
> 011 - exception; binary search tiny table of exceptions
> 100 - is lowercase; map to uppercase via block's second toupper delta
> 101 - is lowercase; map to uppercase via block's third toupper delta
> 110 - is uppercase; map to lowercase via block's second tolower delta
> 111 - is uppercase; map to lowercase via block's third tolower delta
> 
> Most blocks will use no bits at all (no case mappings). A few blocks
> that only contain lowercase with a single delta will use just one bit.
> Many blocks with case mappings will use 2 bits (only one frequent
> delta for each direction). The remaining few will use 3 bits, and now
> they can code not just 2 but 3 frequent deltas in each direction,
> getting rid of a lot of exceptions.
> 
> Some quick size estimates (hope I didn't mess this up):
> 
> 11 complex blocks -- 11 blocks * 256/8 bytes * 3 bits
> 7 simple blocks -- 7 blocks * 256/8 bytes * 1 bit
> Top-level tables -- 512 blocks * 1 byte * 3 tables
> Exceptions -- ~96 pairs * 4 bytes * 2 sortings
> 
> Looks like it comes to about 3k total.

Note that this was a significant underestimate. Of course it omitted
any accounting for storage of per-block deltas (up to 6 of them for
each block) like my first total for the improved scheme, but there
seem to be other significant errors too. The counts for "complex" and
"simple" blocks were based on just one direction of case mapping,
upper to lower. Once you bring in both directions, very little is
simple and the number of blocks increases quite a bit.

> Further optimizing: We're not using all the data we already have. Only
> alphabetic characters can have case mappings, and we already have an
> iswalpha table; the above tables are highly correlated with it.
> 
> If we short-circuit out the whole operation with iswalpha, then the
> "000 - no case mappings" value is no longer common. Instead, "zero is
> one of the common deltas" can be included or not on a per-block basis.

An interesting find after implementing this (and reporting the
results, which had a minor bug, in the previous reply to the quoted
email): ahort-circuiting with iswalpha barely helps at all. There are
only two blocks 04 and 24, that have fewer rules after folding the
nonalphabetic characters into another rule as "dontcare". Of those, 04
has loads of exceptional cases and saving a rule has minor benefit.
Only 24 actually goes from needing 2 bits down to needing only 1 bit.
So the "further optimizing" really has only marginal benefit, saving
about 72 bytes of table at the (size and time) expense of a call to
iswalpha.

Arguably a better optimization might be to optimize iswalpha (3.8k) as
iswlower||iswupper||small_table_of_caseless_alpha.

What this really comes down to is that having a multi-level bit table
per-property for character properties is always going to be locally
cheaper (for static linking when you just want one property) than a
table mapping characters to a bit vector representing all their
properties and just picking out the property you want, but if you have
a variable-width table like what I'm doing with case mappings, where
the width (and thus number/set of possible output combinations) varies
per block, it may be more efficient to combine all properties in to a
single variable-width-bit-vector table (mainly/especially if the
properties are correlated). If so, it may make sense to make such a
change eventually -- to have a single musl-internal character
properties lookup function that returns a bit vector representing:

- wcwidth: -1, 0, 1, or 2
- type: lower[delta], upper[delta], casing-exception, nocase-alpha,
  punct, control, or space

Then iswalpha would be represented as reading the table and masking
for lower||upper||casing-exception||nocase-alpha, etc.

The key to how this is different, and much more efficient, than glibc
and other implementations is the multilevel aspect of the table and
the variable-width of the encoding, whereby blocks with only a few
possible outputs only need 1 or 2 bits.

At 128 blocks needing 3 bits on average (a number I made up, assuming
over half of the 512 blocks are CJK or unassigned and some others are
simpler) this would be 13k of tables, which is comparable to the
current size of the character type data (iswalpha+iswpunct+wcwidth)
without any case mappings. So it looks like a minor win at best, but
maybe something to keep in mind for the future.

Rich


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Efficient case mapping
  2018-03-07  3:48 ` Rich Felker
@ 2018-03-07 18:25   ` Rich Felker
  2018-04-05 17:09     ` Rich Felker
  0 siblings, 1 reply; 5+ messages in thread
From: Rich Felker @ 2018-03-07 18:25 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 4201 bytes --]

On Tue, Mar 06, 2018 at 10:48:52PM -0500, Rich Felker wrote:
> On Tue, Mar 06, 2018 at 03:27:41PM -0500, Rich Felker wrote:
> > Further optimizing: We're not using all the data we already have. Only
> > alphabetic characters can have case mappings, and we already have an
> > iswalpha table; the above tables are highly correlated with it.
> > 
> > If we short-circuit out the whole operation with iswalpha, then the
> > "000 - no case mappings" value is no longer common. Instead, "zero is
> > one of the common deltas" can be included or not on a per-block basis.
> > 
> > We can also add, along with the per-block common deltas, a table of
> > which cases each delta goes with, and assign new bit meanings:
> > 
> > 00 - is case[0] for block, map to ~case[0] via delta[0]
> > 01 - is case[1] for block, map to ~case[1] via delta[1]
> > 10 - is case[2] for block, map to ~case[2] via delta[2]
> > 11 - is exception, map via binary search of exceptions table
> > 
> > If needed we could add back a third bit here, but I think we can get
> > by without it.
> > 
> > Note that, because iswalpha is now short-circuiting non-alphabetic
> > characters, we can just assign an arbitrary table value (best: 00) to
> > all non-alphabetic (including unassigned) characters. This drops most
> > of the "simple blocks" (blocks with just 0 or a single delta) down to
> > trivial (elided) blocks.
> > 
> > These last ideas ("further optimizing:" and beyond) were worked out
> > while writing this email, and I don't yet have any analysis/test code
> > to go with them. Will follow up later with results. With some luck,
> > final version will be barely-larger than current code, extensible, and
> > fast. :)
> 
> I now have results for this approach which I'm attaching. The output
> is from my block analysis tool. For each block, it displays case
> mapping rules sorted by decreasing frequency, with characters that
> don't matter (because they're non-alphabetic) merged into the most
> frequent rule. The syntax is:
> 
>   rule d(t) [f]
> 
> where t is the character type (n = no case, u = uppercase, l =
> lowercase, x = exceptional/titlecase), d is the delta to the opposite
> case character, and f is the frequency of the rule (again, counting
> merging of dontcare into the most frequent).
> 
> While the vast majority of blocks are fine with 0-2 bits of rule data,
> a few do have more than 3 fairly-frequent rules, the worst being
> blocks 04, 05, and 2C. Having an extra bit would save at least 169
> exceptions in those, and thus ~676 bytes. The fixed cost of an extra
> table bit is 512 bytes (for the first level), plus 32 bytes per block
> that contains data, so it's pretty close to break-even, but keeping
> the exception table small does help performance too. My leaning would
> be to add the extra bit, but write a cost function so that it's only
> used where the 4th-7th frequencies for the block total more than 8
> (the number of exceptions to break even at 32 bytes).
> 
> With that approach it looks like we have about 220 entries in the
> exception tables, 4-5 blocks with a third bit, 21 blocks with a second
> bit, 24 blocks with a first bit, and 3 table headers. That comes to
> about 880+160+672+768+1536 = 4k.
> 
> One thing I haven't accounted for is storing the 3 (or 7) deltas per
> block. The raw deltas are signed 17-bit, but if we assume mappings
> don't cross planes they can be thought of as unsigned 16-bit on the
> low 16 bits of the codepoint. But that's still pretty large -- 3
> 16-bit numbers by 512 blocks is 3k. Fortunately I think we can do a
> lot better. First, have a table of all the deltas that ever appear
> (about 20). Then, for each block, have 0-7 single-byte indices into
> this table, and a single-byte index to the sequence of indices. Now we
> just have about 2*20+512+5*7+16*3+3*1 = 638 bytes.
> 
> 4.6k is a good bit more than I was planning on spending here, so maybe
> there are still improvements to be made...

And here is a draft of the code that would use the mapping. The
#ifdef'd external tables are to allow compiling the code without
actually having table data yet; otherwise gcc optimizes out the whole
translation unit. :-)

Comments?

Rich

[-- Attachment #2: casemap.c --]
[-- Type: text/plain, Size: 1803 bytes --]

#include <wchar.h>
#include <wctype.h>

#if 0
static const unsigned char tab0[] = {0};
static const unsigned char tab1[] = {0};
static const unsigned char tab2[] = {0};

static const unsigned char rulebases[512] = {0};
static const int rules[] = {0};

static const unsigned char exceptions[][2] = {0};
#else
extern const unsigned char tab0[];
extern const unsigned char tab1[];
extern const unsigned char tab2[];

extern const unsigned char rulebases[512];
extern const int rules[];

extern const unsigned char exceptions[][2];
#endif

static int casing_delta(unsigned c, int dir)
{
	unsigned b, x, y, v, rt, xb, xn;
	int r, rd;

	if (c >= 0x20000) return 0;

	b = c>>8;
	c &= 255;
	x = c>>3;
	y = c&7;
	v = (tab0[tab0[b]*32+x]>>y&1)
	  + (tab1[tab1[b]*32+x]>>y&1)*2
	  + (tab2[tab2[b]*32+x]>>y&1)*4;

	/* use the bit vector out of the tables as an index into
	 * a block-specific set of rules and decode the rule into
	 * a type and a case-mapping delta. */
	r = rules[rulebases[b]+v];
	rt = r & 255;
	rd = r >> 8;

	/* rules 0/1 are simple lower/upper case with a delta.
	 * apply according to desired mapping direction. */
	if (rt < 2) return rd & -(rt^dir);

	/* binary search. endpoints of the binary search for
	 * this block are stored in the rule delta field. */
	xn = rd & 0xff;
	xb = (unsigned)rd >> 8;
	while (xn) {
		unsigned try = exceptions[xb+xn/2][0];
		if (try == c) {
			r = rules[exceptions[xb+xn/2][1]];
			rt = r & 255;
			rd = r >> 8;
			if (rt < 2) return rd & -(rt^dir);
			/* Hard-coded for the four exceptional titlecase */
			return dir ? -1 : 1;
		} else if (try > c) {
			xn /= 2;
		} else {
			xb += xn/2;
			xn -= xn/2;
		}
	}
	return 0;
}

wint_t towlower(wint_t wc)
{
	return wc + casing_delta(wc, 0);
}

wint_t towupper(wint_t wc)
{
	return wc + casing_delta(wc, 1);
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Efficient case mapping
  2018-03-07 18:25   ` Rich Felker
@ 2018-04-05 17:09     ` Rich Felker
  0 siblings, 0 replies; 5+ messages in thread
From: Rich Felker @ 2018-04-05 17:09 UTC (permalink / raw)
  To: musl

[-- Attachment #1: Type: text/plain, Size: 5769 bytes --]

On Wed, Mar 07, 2018 at 01:25:36PM -0500, Rich Felker wrote:
> On Tue, Mar 06, 2018 at 10:48:52PM -0500, Rich Felker wrote:
> > On Tue, Mar 06, 2018 at 03:27:41PM -0500, Rich Felker wrote:
> > > Further optimizing: We're not using all the data we already have. Only
> > > alphabetic characters can have case mappings, and we already have an
> > > iswalpha table; the above tables are highly correlated with it.
> > > 
> > > If we short-circuit out the whole operation with iswalpha, then the
> > > "000 - no case mappings" value is no longer common. Instead, "zero is
> > > one of the common deltas" can be included or not on a per-block basis.
> > > 
> > > We can also add, along with the per-block common deltas, a table of
> > > which cases each delta goes with, and assign new bit meanings:
> > > 
> > > 00 - is case[0] for block, map to ~case[0] via delta[0]
> > > 01 - is case[1] for block, map to ~case[1] via delta[1]
> > > 10 - is case[2] for block, map to ~case[2] via delta[2]
> > > 11 - is exception, map via binary search of exceptions table
> > > 
> > > If needed we could add back a third bit here, but I think we can get
> > > by without it.
> > > 
> > > Note that, because iswalpha is now short-circuiting non-alphabetic
> > > characters, we can just assign an arbitrary table value (best: 00) to
> > > all non-alphabetic (including unassigned) characters. This drops most
> > > of the "simple blocks" (blocks with just 0 or a single delta) down to
> > > trivial (elided) blocks.
> > > 
> > > These last ideas ("further optimizing:" and beyond) were worked out
> > > while writing this email, and I don't yet have any analysis/test code
> > > to go with them. Will follow up later with results. With some luck,
> > > final version will be barely-larger than current code, extensible, and
> > > fast. :)
> > 
> > I now have results for this approach which I'm attaching. The output
> > is from my block analysis tool. For each block, it displays case
> > mapping rules sorted by decreasing frequency, with characters that
> > don't matter (because they're non-alphabetic) merged into the most
> > frequent rule. The syntax is:
> > 
> >   rule d(t) [f]
> > 
> > where t is the character type (n = no case, u = uppercase, l =
> > lowercase, x = exceptional/titlecase), d is the delta to the opposite
> > case character, and f is the frequency of the rule (again, counting
> > merging of dontcare into the most frequent).
> > 
> > While the vast majority of blocks are fine with 0-2 bits of rule data,
> > a few do have more than 3 fairly-frequent rules, the worst being
> > blocks 04, 05, and 2C. Having an extra bit would save at least 169
> > exceptions in those, and thus ~676 bytes. The fixed cost of an extra
> > table bit is 512 bytes (for the first level), plus 32 bytes per block
> > that contains data, so it's pretty close to break-even, but keeping
> > the exception table small does help performance too. My leaning would
> > be to add the extra bit, but write a cost function so that it's only
> > used where the 4th-7th frequencies for the block total more than 8
> > (the number of exceptions to break even at 32 bytes).
> > 
> > With that approach it looks like we have about 220 entries in the
> > exception tables, 4-5 blocks with a third bit, 21 blocks with a second
> > bit, 24 blocks with a first bit, and 3 table headers. That comes to
> > about 880+160+672+768+1536 = 4k.
> > 
> > One thing I haven't accounted for is storing the 3 (or 7) deltas per
> > block. The raw deltas are signed 17-bit, but if we assume mappings
> > don't cross planes they can be thought of as unsigned 16-bit on the
> > low 16 bits of the codepoint. But that's still pretty large -- 3
> > 16-bit numbers by 512 blocks is 3k. Fortunately I think we can do a
> > lot better. First, have a table of all the deltas that ever appear
> > (about 20). Then, for each block, have 0-7 single-byte indices into
> > this table, and a single-byte index to the sequence of indices. Now we
> > just have about 2*20+512+5*7+16*3+3*1 = 638 bytes.
> > 
> > 4.6k is a good bit more than I was planning on spending here, so maybe
> > there are still improvements to be made...
> 
> And here is a draft of the code that would use the mapping. The
> #ifdef'd external tables are to allow compiling the code without
> actually having table data yet; otherwise gcc optimizes out the whole
> translation unit. :-)

Updated with table output, usable.

It comes out to about 5.5k of rodata+text, compared to about 1.5k for
the awful, slow, manually-maintained code we have now.

In terms of performance, it's about 10x faster on average across all
characters with case mappings (iterating and mapping them each), and
also 10x faster iterating over all characters and mapping them,
presumably since the cost of mapping most non-casing characters is
cheap either way.

In actual numbers, on my S1260, it averages about 50ns for each case
mapping with the new code, and about 500ns with the old. Replacing the
case mapping functions with the identity function (still an external
call), the overhead seems to be 7-10ns per call, so the difference is
more like 40ns vs 490 ns.

At this point I'm fairly happy with the results except that the size
and logical complexit are somewhat higher than I had hoped for. I
wonder if it might be comparable (maybe even smaller) size to do away
with the variable sized (per-block) table entries and just always have
2-bit or 3-bit entries. It would certainly make the code simpler and
faster. I can probably work out the delta-size without actually
writing any code.

BTW this work seems to have turned up one omission in musl's current
case mappings: U+037F/U+03F3. Any ideas if there's a reason it was
omitted, or if this is just a mistake?

Rich

[-- Attachment #2: casemap.c --]
[-- Type: text/plain, Size: 1550 bytes --]

#include <wchar.h>
#include <wctype.h>

static const unsigned char tab0[];
static const unsigned char tab1[];
static const unsigned char tab2[];

static const unsigned char rulebases[512];
static const int rules[];

static const unsigned char exceptions[][2];

#include "casemap.h"

static int casing_delta(unsigned c, int dir)
{
	unsigned b, x, y, v, rt, xb, xn;
	int r, rd;

	if (c >= 0x20000) return 0;

	b = c>>8;
	c &= 255;
	x = c>>3;
	y = c&7;
	v = (tab0[tab0[b]*32+x]>>y&1)
	  + (tab1[tab1[b]*32+x]>>y&1)*2
	  + (tab2[tab2[b]*32+x]>>y&1)*4;

	/* use the bit vector out of the tables as an index into
	 * a block-specific set of rules and decode the rule into
	 * a type and a case-mapping delta. */
	r = rules[rulebases[b]+v];
	rt = r & 255;
	rd = r >> 8;

	/* rules 0/1 are simple lower/upper case with a delta.
	 * apply according to desired mapping direction. */
	if (rt < 2) return rd & -(rt^dir);

	/* binary search. endpoints of the binary search for
	 * this block are stored in the rule delta field. */
	xn = rd & 0xff;
	xb = (unsigned)rd >> 8;
	while (xn) {
		unsigned try = exceptions[xb+xn/2][0];
		if (try == c) {
			r = rules[exceptions[xb+xn/2][1]];
			rt = r & 255;
			rd = r >> 8;
			if (rt < 2) return rd & -(rt^dir);
			/* Hard-coded for the four exceptional titlecase */
			return dir ? -1 : 1;
		} else if (try > c) {
			xn /= 2;
		} else {
			xb += xn/2;
			xn -= xn/2;
		}
	}
	return 0;
}

wint_t towlower(wint_t wc)
{
	return wc + casing_delta(wc, 0);
}

wint_t towupper(wint_t wc)
{
	return wc + casing_delta(wc, 1);
}

[-- Attachment #3: casemap.h --]
[-- Type: text/plain, Size: 19097 bytes --]

static const unsigned char tab0[] = {
	17, 18, 19, 20, 21, 22, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	23, 16, 16, 24, 16, 16, 16, 16, 16, 16, 16, 16, 25, 26, 27, 28,
	16, 29, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 30, 31, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 32, 33, 16, 16, 16, 34, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 35,
	16, 16, 16, 16, 36, 16, 16, 16, 16, 16, 16, 16, 37, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 38, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 39, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	255, 255, 255, 255, 255, 255, 255, 255, 1, 0, 0, 248, 255, 255, 255, 255,
	255, 255, 255, 255, 255, 255, 223, 255, 0, 0, 128, 0, 255, 255, 255, 127,
	85, 85, 85, 85, 85, 85, 84, 171, 170, 86, 85, 85, 85, 85, 85, 42,
	148, 40, 2, 9, 149, 156, 40, 93, 15, 160, 170, 74, 85, 85, 17, 85,
	170, 170, 170, 170, 170, 170, 250, 19, 132, 170, 32, 229, 148, 96, 217, 223,
	118, 224, 251, 159, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	255, 255, 255, 255, 255, 255, 255, 255, 223, 255, 255, 255, 255, 255, 186, 71,
	191, 47, 1, 0, 4, 0, 255, 255, 251, 15, 28, 170, 170, 170, 64, 25,
	255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 85, 85, 85, 85,
	253, 87, 85, 85, 85, 85, 85, 85, 170, 42, 85, 85, 85, 85, 85, 85,
	85, 85, 85, 85, 85, 85, 254, 255, 255, 255, 127, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 255, 255, 255, 255, 191, 32, 0, 0, 0, 0, 0, 0,
	255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 192,
	255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	0, 254, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 221,
	255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85,
	85, 85, 213, 183, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85,
	255, 0, 255, 192, 255, 0, 255, 0, 255, 192, 255, 85, 255, 0, 0, 192,
	255, 0, 255, 0, 255, 0, 247, 160, 247, 224, 255, 240, 223, 224, 247, 224,
	255, 255, 255, 255, 191, 243, 251, 255, 255, 191, 255, 255, 0, 0, 255, 255,
	231, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 128, 137, 10, 182, 63,
	85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 85, 245, 175, 247, 255,
	255, 255, 255, 255, 191, 32, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 85, 85, 85, 85, 85, 21, 0, 0,
	85, 85, 85, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	255, 255, 255, 255, 171, 170, 171, 170, 170, 170, 170, 170, 170, 170, 255, 149,
	170, 215, 186, 170, 170, 130, 160, 255, 255, 255, 255, 255, 255, 255, 255, 255,
	255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255,
	0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 15, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	255, 255, 255, 255, 255, 255, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0,
	255, 255, 255, 255, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
};
static const unsigned char tab1[] = {
	17, 18, 19, 20, 21, 22, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 23, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 24, 25,
	16, 26, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 27, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 28, 29, 16, 16, 16, 30, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 31,
	16, 16, 16, 16, 32, 16, 16, 16, 16, 16, 16, 16, 33, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 34, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 35, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 254, 255, 255, 7, 254, 255, 255, 7,
	0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 127, 127, 255, 255, 127, 127,
	170, 170, 170, 170, 170, 170, 168, 85, 85, 171, 170, 170, 170, 170, 170, 84,
	40, 49, 4, 10, 42, 45, 81, 110, 15, 64, 85, 149, 170, 170, 33, 170,
	255, 255, 255, 255, 252, 255, 15, 24, 198, 255, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 56,
	0, 7, 254, 255, 251, 15, 254, 255, 251, 15, 0, 0, 0, 0, 0, 0,
	0, 0, 255, 255, 255, 255, 0, 0, 0, 0, 255, 255, 170, 170, 170, 170,
	254, 171, 170, 170, 170, 170, 170, 170, 84, 85, 170, 170, 170, 170, 170, 170,
	85, 85, 85, 85, 85, 85, 0, 0, 0, 0, 0, 0, 254, 255, 255, 255,
	127, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 63, 0,
	170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170,
	170, 170, 234, 183, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170,
	0, 255, 192, 255, 0, 255, 0, 255, 192, 255, 85, 255, 0, 255, 0, 192,
	0, 255, 0, 255, 0, 255, 244, 163, 247, 224, 252, 243, 220, 227, 247, 224,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	255, 255, 255, 255, 255, 127, 0, 0, 0, 0, 0, 0, 10, 21, 72, 192,
	170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 170, 10, 80, 8, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 170, 170, 170, 170, 170, 42, 0, 0,
	170, 170, 170, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 252, 255, 252, 255, 255, 255, 255, 255, 255, 255, 0, 222,
	255, 24, 207, 255, 255, 3, 240, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, 0, 0, 255, 255,
	255, 255, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 254, 255, 255, 7, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 15,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 7, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 255, 0, 0, 0, 0,
	0, 0, 0, 0, 252, 255, 255, 255, 15, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
};
static const unsigned char tab2[] = {
	16, 16, 16, 17, 18, 19, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 20, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 207, 56,
	0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 255, 255, 255, 128, 13,
	255, 255, 0, 0, 0, 0, 255, 255, 255, 255, 255, 255, 0, 0, 0, 0,
	252, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	170, 170, 170, 170, 170, 170, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
	0, 0, 0, 0, 0, 128, 255, 255, 255, 255, 255, 255, 8, 0, 146, 255,
	0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 240, 135, 243, 255,
};
static const int rules[] = {
	0x302, 0x0, 0x2001, -0x2000, 0x2e700, 0x1dbf00,
	0x7900, 0x32c02, 0x101, -0x100, 0x0, -0xc6ff,
	-0xe800, -0x78ff, -0x12c00, 0xc300, 0xd201, 0xce01,
	0xcd01, 0x4f01, 0xca01, 0xcb01, 0xcf01, 0x6100,
	0xd301, 0xd101, 0xa300, 0xd501, 0x8200, 0xd601,
	0xda01, 0xd901, 0xdb01, 0x3800, 0x201, 0x2,
	-0x200, -0x4f00, -0x60ff, -0x37ff, 0x2f2d02, 0x0,
	0x101, -0x100, -0x81ff, 0x2a2b01, -0xa2ff, 0x2a2801,
	0x2a3f00, -0xc2ff, 0x4501, 0x4701, 0x2a1f00, 0x2a1c00,
	0x2a1e00, -0xd200, -0xce00, -0xcd00, -0xca00, -0xcb00,
	0xa54f00, 0xa54b00, -0xcf00, 0xa52800, 0xa54400, -0xd100,
	-0xd300, 0x29f700, 0xa54100, 0x29fd00, -0xd500, -0xd600,
	0x29e700, -0xda00, 0xa52a00, -0x4500, -0xd900, -0x4700,
	-0xdb00, 0xa51500, 0xa51200, 0x5c1e02, 0x0, 0x2001,
	-0x2000, 0x101, -0x100, 0x8200, 0x2501, 0x5400,
	0x7401, 0x2601, 0x4001, 0x3f01, -0x2600, -0x2500,
	-0x1f00, -0x4000, -0x3f00, 0x801, -0x3e00, -0x3900,
	-0x2f00, -0x3600, -0x800, -0x5600, -0x5000, 0x700,
	-0x7400, -0x3bff, -0x6000, -0x6ff, 0x7a0202, 0x101,
	-0x100, 0x2001, -0x2000, 0x5001, -0x5000, 0x0,
	0xf01, -0xf00, 0x0, 0x3001, -0x3000, 0x101,
	-0x100, 0x0, 0x1c6001, 0x7c0602, 0x0, 0x97d001,
	0x801, 0x820902, 0x0, -0x186e00, -0x186d00, -0x186400,
	-0x186200, -0x186300, -0x185c00, -0x182500, 0x89c200, 0x8b0202,
	0x0, 0x8a0400, 0xee600, 0x8d0202, 0x101, -0x100,
	0x0, -0x3b00, -0x1dbeff, 0x8f2502, 0x800, -0x7ff,
	0x0, 0x4a00, 0x5600, 0x6400, 0x8000, 0x7000,
	0x7e00, 0x900, -0x49ff, -0x8ff, -0x1c2500, -0x55ff,
	-0x63ff, -0x6fff, -0x7fff, -0x7dff, 0xb40702, 0x0,
	0x1001, -0x1000, -0x1d5cff, -0x20beff, -0x2045ff, 0x1c01,
	-0x1c00, 0xbb0802, 0x101, -0x100, 0x3001, -0x3000,
	0x0, -0x2a3eff, -0xee5ff, -0x29f6ff, -0x29e6ff, -0x2a2b00,
	-0x2a2800, -0x2a1bff, -0x29fcff, -0x2a1eff, -0x2a1dff, 0x0,
	-0x1c6000, 0x0, 0x101, -0x100, 0xc30b02, 0x0,
	0x101, -0x100, -0x8a03ff, -0xa527ff, -0xa543ff, -0xa54eff,
	-0xa54aff, -0xa540ff, -0xa511ff, -0xa529ff, -0xa514ff, 0x3a001,
	0xce0002, 0x0, -0x97d000, -0x3a000, 0x0, 0x2001,
	-0x2000, 0x0, 0x2801, -0x2800, 0x0, 0x4001,
	-0x4000, 0x0, 0x2001, -0x2000, 0x0, 0x2201,
	-0x2200,
};
static const unsigned char rulebases[] = {
	0, 7, 40, 81, 112, 122, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	127, 1, 1, 129, 1, 1, 1, 1, 1, 1, 1, 1, 133, 143, 147, 153,
	1, 172, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 181, 197, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 199, 202, 1, 1, 1, 216, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 220,
	1, 1, 1, 1, 223, 1, 1, 1, 1, 1, 1, 1, 226, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 229, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 232, 1, 1, 1, 1, 1, 1,
	1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
};
static const unsigned char exceptions[][2] = {
	{ 181, 4 }, { 223, 5 }, { 255, 6 }, { 48, 11 },
	{ 49, 12 }, { 120, 13 }, { 127, 14 }, { 128, 15 },
	{ 129, 16 }, { 134, 17 }, { 137, 18 }, { 138, 18 },
	{ 142, 19 }, { 143, 20 }, { 144, 21 }, { 147, 18 },
	{ 148, 22 }, { 149, 23 }, { 150, 24 }, { 151, 25 },
	{ 154, 26 }, { 156, 24 }, { 157, 27 }, { 158, 28 },
	{ 159, 29 }, { 166, 30 }, { 169, 30 }, { 174, 30 },
	{ 177, 31 }, { 178, 31 }, { 183, 32 }, { 191, 33 },
	{ 196, 34 }, { 197, 35 }, { 198, 36 }, { 199, 34 },
	{ 200, 35 }, { 201, 36 }, { 202, 34 }, { 203, 35 },
	{ 204, 36 }, { 221, 37 }, { 241, 34 }, { 242, 35 },
	{ 243, 36 }, { 246, 38 }, { 247, 39 }, { 32, 44 },
	{ 58, 45 }, { 61, 46 }, { 62, 47 }, { 63, 48 },
	{ 64, 48 }, { 67, 49 }, { 68, 50 }, { 69, 51 },
	{ 80, 52 }, { 81, 53 }, { 82, 54 }, { 83, 55 },
	{ 84, 56 }, { 86, 57 }, { 87, 57 }, { 89, 58 },
	{ 91, 59 }, { 92, 60 }, { 96, 57 }, { 97, 61 },
	{ 99, 62 }, { 101, 63 }, { 102, 64 }, { 104, 65 },
	{ 105, 66 }, { 106, 64 }, { 107, 67 }, { 108, 68 },
	{ 111, 66 }, { 113, 69 }, { 114, 70 }, { 117, 71 },
	{ 125, 72 }, { 128, 73 }, { 131, 73 }, { 135, 74 },
	{ 136, 73 }, { 137, 75 }, { 138, 76 }, { 139, 76 },
	{ 140, 77 }, { 146, 78 }, { 157, 79 }, { 158, 80 },
	{ 69, 89 }, { 127, 90 }, { 134, 91 }, { 140, 92 },
	{ 142, 93 }, { 143, 93 }, { 172, 94 }, { 173, 95 },
	{ 174, 95 }, { 175, 95 }, { 194, 96 }, { 204, 97 },
	{ 205, 98 }, { 206, 98 }, { 207, 99 }, { 208, 100 },
	{ 209, 101 }, { 213, 102 }, { 214, 103 }, { 215, 104 },
	{ 240, 105 }, { 241, 106 }, { 242, 107 }, { 243, 108 },
	{ 244, 109 }, { 245, 110 }, { 249, 111 }, { 253, 44 },
	{ 254, 44 }, { 255, 44 }, { 192, 120 }, { 207, 121 },
	{ 248, 104 }, { 249, 104 }, { 250, 104 }, { 251, 104 },
	{ 252, 104 }, { 253, 104 }, { 128, 135 }, { 129, 136 },
	{ 130, 137 }, { 131, 138 }, { 132, 138 }, { 133, 139 },
	{ 134, 140 }, { 135, 141 }, { 136, 142 }, { 121, 145 },
	{ 125, 146 }, { 155, 151 }, { 158, 152 }, { 112, 157 },
	{ 113, 157 }, { 114, 158 }, { 115, 158 }, { 116, 158 },
	{ 117, 158 }, { 118, 159 }, { 119, 159 }, { 120, 160 },
	{ 121, 160 }, { 122, 161 }, { 123, 161 }, { 124, 162 },
	{ 125, 162 }, { 179, 163 }, { 186, 164 }, { 187, 164 },
	{ 188, 165 }, { 190, 166 }, { 195, 163 }, { 200, 167 },
	{ 201, 167 }, { 202, 167 }, { 203, 167 }, { 204, 165 },
	{ 218, 168 }, { 219, 168 }, { 229, 107 }, { 234, 169 },
	{ 235, 169 }, { 236, 111 }, { 243, 163 }, { 248, 170 },
	{ 249, 170 }, { 250, 171 }, { 251, 171 }, { 252, 165 },
	{ 38, 176 }, { 42, 177 }, { 43, 178 }, { 50, 179 },
	{ 78, 180 }, { 131, 8 }, { 132, 9 }, { 98, 189 },
	{ 100, 190 }, { 101, 191 }, { 102, 192 }, { 109, 193 },
	{ 110, 194 }, { 111, 195 }, { 112, 196 }, { 125, 206 },
	{ 141, 207 }, { 170, 208 }, { 171, 209 }, { 172, 210 },
	{ 173, 211 }, { 174, 208 }, { 176, 212 }, { 177, 213 },
	{ 178, 214 }, { 179, 215 },
};

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-05 17:09 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-06 20:27 Efficient case mapping Rich Felker
2018-03-07  3:48 ` Rich Felker
2018-03-07 18:25   ` Rich Felker
2018-04-05 17:09     ` Rich Felker
2018-03-07 15:44 ` Rich Felker

Code repositories for project(s) associated with this public inbox

	https://git.vuxu.org/mirror/musl/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).