On Tue, Mar 06, 2018 at 10:48:52PM -0500, Rich Felker wrote: > On Tue, Mar 06, 2018 at 03:27:41PM -0500, Rich Felker wrote: > > Further optimizing: We're not using all the data we already have. Only > > alphabetic characters can have case mappings, and we already have an > > iswalpha table; the above tables are highly correlated with it. > > > > If we short-circuit out the whole operation with iswalpha, then the > > "000 - no case mappings" value is no longer common. Instead, "zero is > > one of the common deltas" can be included or not on a per-block basis. > > > > We can also add, along with the per-block common deltas, a table of > > which cases each delta goes with, and assign new bit meanings: > > > > 00 - is case[0] for block, map to ~case[0] via delta[0] > > 01 - is case[1] for block, map to ~case[1] via delta[1] > > 10 - is case[2] for block, map to ~case[2] via delta[2] > > 11 - is exception, map via binary search of exceptions table > > > > If needed we could add back a third bit here, but I think we can get > > by without it. > > > > Note that, because iswalpha is now short-circuiting non-alphabetic > > characters, we can just assign an arbitrary table value (best: 00) to > > all non-alphabetic (including unassigned) characters. This drops most > > of the "simple blocks" (blocks with just 0 or a single delta) down to > > trivial (elided) blocks. > > > > These last ideas ("further optimizing:" and beyond) were worked out > > while writing this email, and I don't yet have any analysis/test code > > to go with them. Will follow up later with results. With some luck, > > final version will be barely-larger than current code, extensible, and > > fast. :) > > I now have results for this approach which I'm attaching. The output > is from my block analysis tool. For each block, it displays case > mapping rules sorted by decreasing frequency, with characters that > don't matter (because they're non-alphabetic) merged into the most > frequent rule. The syntax is: > > rule d(t) [f] > > where t is the character type (n = no case, u = uppercase, l = > lowercase, x = exceptional/titlecase), d is the delta to the opposite > case character, and f is the frequency of the rule (again, counting > merging of dontcare into the most frequent). > > While the vast majority of blocks are fine with 0-2 bits of rule data, > a few do have more than 3 fairly-frequent rules, the worst being > blocks 04, 05, and 2C. Having an extra bit would save at least 169 > exceptions in those, and thus ~676 bytes. The fixed cost of an extra > table bit is 512 bytes (for the first level), plus 32 bytes per block > that contains data, so it's pretty close to break-even, but keeping > the exception table small does help performance too. My leaning would > be to add the extra bit, but write a cost function so that it's only > used where the 4th-7th frequencies for the block total more than 8 > (the number of exceptions to break even at 32 bytes). > > With that approach it looks like we have about 220 entries in the > exception tables, 4-5 blocks with a third bit, 21 blocks with a second > bit, 24 blocks with a first bit, and 3 table headers. That comes to > about 880+160+672+768+1536 = 4k. > > One thing I haven't accounted for is storing the 3 (or 7) deltas per > block. The raw deltas are signed 17-bit, but if we assume mappings > don't cross planes they can be thought of as unsigned 16-bit on the > low 16 bits of the codepoint. But that's still pretty large -- 3 > 16-bit numbers by 512 blocks is 3k. Fortunately I think we can do a > lot better. First, have a table of all the deltas that ever appear > (about 20). Then, for each block, have 0-7 single-byte indices into > this table, and a single-byte index to the sequence of indices. Now we > just have about 2*20+512+5*7+16*3+3*1 = 638 bytes. > > 4.6k is a good bit more than I was planning on spending here, so maybe > there are still improvements to be made... And here is a draft of the code that would use the mapping. The #ifdef'd external tables are to allow compiling the code without actually having table data yet; otherwise gcc optimizes out the whole translation unit. :-) Comments? Rich