* [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
@ 2026-01-19 14:35 Xan Phung
2026-01-19 19:54 ` Szabolcs Nagy
2026-01-20 15:09 ` Rich Felker
0 siblings, 2 replies; 9+ messages in thread
From: Xan Phung @ 2026-01-19 14:35 UTC (permalink / raw)
To: musl; +Cc: Xan Phung
Currently iswalpha and iswpunct have total text data size of over
8kb. A more efficient encoding has reduced the total size to 3.75kb
(2kb and 1.8kb respectively), a 53% reduction.
The new encoding remains a (mostly) branchless table lookup, but now
requires 3 memory accesses instead of 2. It remains optimized for
random access decoding. The top level remains much the same,
providing 8 bit offsets into codepage units (256 codepoint
granularity). The second level data uses fixed sizes, of one 32 bit
word per codepage (where each 2 bit pair in word identifies a block
of 16 codepoints as all 0, all 1, or mixed). The third level is a
variable length series of extension bytes, indexed by the popcount
of set high bits within the second level's 32 bit word. This
popcount is calculated with nearly same latency as a 32 bit multiply
(so it is comparable with the indexing speed of accessing a 2D array
of non power of 2 size).
Results have been tested against first 0x20000 codepoints, and match
that returned by the pre-existing musl implementation.
Signed-off-by: Xan Phung <xan.phung@gmail.com>
---
src/ctype/alpha.h | 172 -----------------------------
src/ctype/iswalpha.c | 56 ++++++++--
src/ctype/iswalpha_dict.h | 16 +++
src/ctype/iswalpha_table.h | 217 +++++++++++++++++++++++++++++++++++++
src/ctype/iswpunct.c | 50 +++++++--
src/ctype/iswpunct_dict.h | 12 ++
src/ctype/iswpunct_table.h | 217 +++++++++++++++++++++++++++++++++++++
src/ctype/punct.h | 141 ------------------------
8 files changed, 552 insertions(+), 329 deletions(-)
delete mode 100644 src/ctype/alpha.h
create mode 100644 src/ctype/iswalpha_dict.h
create mode 100644 src/ctype/iswalpha_table.h
create mode 100644 src/ctype/iswpunct_dict.h
create mode 100644 src/ctype/iswpunct_table.h
delete mode 100644 src/ctype/punct.h
diff --git a/src/ctype/alpha.h b/src/ctype/alpha.h
deleted file mode 100644
index 4167f38..0000000
--- a/src/ctype/alpha.h
+++ /dev/null
@@ -1,172 +0,0 @@
-18,17,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,17,34,35,36,17,37,38,39,40,
-41,42,43,44,17,45,46,47,16,16,48,16,16,16,16,16,16,16,49,50,51,16,52,53,16,16,
-17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,54,
-17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,
-17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,
-17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,
-17,17,17,55,17,17,17,17,56,17,57,58,59,60,61,62,17,17,17,17,17,17,17,17,17,17,
-17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,17,
-17,17,17,17,17,17,17,63,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,17,64,65,17,66,67,
-68,69,70,71,72,73,74,17,75,76,77,78,79,80,81,16,82,83,84,85,86,87,88,89,90,91,
-92,93,16,94,95,96,16,17,17,17,97,98,99,16,16,16,16,16,16,16,16,16,16,17,17,17,
-17,100,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,17,17,101,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,17,17,102,103,16,16,104,105,17,17,17,17,17,17,17,17,17,17,17,17,17,17,
-17,17,17,17,17,17,17,17,17,106,17,17,107,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,17,
-108,109,16,16,16,16,16,16,16,16,16,110,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,111,112,113,114,16,16,16,16,16,16,16,16,115,116,
-117,16,16,16,16,16,118,119,16,16,16,16,120,16,16,121,16,16,16,16,16,16,16,16,
-16,16,16,16,16,
-16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0,0,0,0,254,255,255,7,254,
-255,255,7,0,0,0,0,0,4,32,4,255,255,127,255,255,255,127,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,195,255,3,0,31,80,0,0,0,0,0,0,0,0,0,0,32,0,0,0,0,0,223,188,64,215,255,255,
-251,255,255,255,255,255,255,255,255,255,191,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,3,252,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,254,255,255,255,127,2,255,255,255,
-255,255,1,0,0,0,0,255,191,182,0,255,255,255,135,7,0,0,0,255,7,255,255,255,255,
-255,255,255,254,255,195,255,255,255,255,255,255,255,255,255,255,255,255,239,
-31,254,225,255,
-159,0,0,255,255,255,255,255,255,0,224,255,255,255,255,255,255,255,255,255,255,
-255,255,3,0,255,255,255,255,255,7,48,4,255,255,255,252,255,31,0,0,255,255,255,
-1,255,7,0,0,0,0,0,0,255,255,223,63,0,0,240,255,248,3,255,255,255,255,255,255,
-255,255,255,239,255,223,225,255,207,255,254,255,239,159,249,255,255,253,197,
-227,159,89,128,176,207,255,3,16,238,135,249,255,255,253,109,195,135,25,2,94,
-192,255,63,0,238,191,251,255,255,253,237,227,191,27,1,0,207,255,0,30,238,159,
-249,255,255,253,237,227,159,25,192,176,207,255,2,0,236,199,61,214,24,199,255,
-195,199,29,129,0,192,255,0,0,239,223,253,255,255,253,255,227,223,29,96,7,207,
-255,0,0,239,223,253,255,255,253,239,227,223,29,96,64,207,255,6,0,239,223,253,
-255,255,255,255,231,223,93,240,128,207,255,0,252,236,255,127,252,255,255,251,
-47,127,128,95,255,192,255,12,0,254,255,255,255,255,127,255,7,63,32,255,3,0,0,
-0,0,214,247,255,255,175,255,255,59,95,32,255,243,0,0,0,
-0,1,0,0,0,255,3,0,0,255,254,255,255,255,31,254,255,3,255,255,254,255,255,255,
-31,0,0,0,0,0,0,0,0,255,255,255,255,255,255,127,249,255,3,255,255,255,255,255,
-255,255,255,255,63,255,255,255,255,191,32,255,255,255,255,255,247,255,255,255,
-255,255,255,255,255,255,61,127,61,255,255,255,255,255,61,255,255,255,255,61,
-127,61,255,127,255,255,255,255,255,255,255,61,255,255,255,255,255,255,255,255,
-7,0,0,0,0,255,255,0,0,255,255,255,255,255,255,255,255,255,255,63,63,254,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,159,255,255,254,255,255,7,255,255,255,255,255,255,255,255,
-255,199,255,1,255,223,15,0,255,255,15,0,255,255,15,0,255,223,13,0,255,255,255,
-255,255,255,207,255,255,1,128,16,255,3,0,0,0,0,255,3,255,255,255,255,255,255,
-255,255,255,255,255,1,255,255,255,255,255,7,255,255,255,255,255,255,255,255,
-63,
-0,255,255,255,127,255,15,255,1,192,255,255,255,255,63,31,0,255,255,255,255,
-255,15,255,255,255,3,255,3,0,0,0,0,255,255,255,15,255,255,255,255,255,255,255,
-127,254,255,31,0,255,3,255,3,128,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,
-255,239,255,239,15,255,3,0,0,0,0,255,255,255,255,255,243,255,255,255,255,255,
-255,191,255,3,0,255,255,255,255,255,255,127,0,255,227,255,255,255,255,255,63,
-255,1,255,255,255,255,255,231,0,0,0,0,0,222,111,4,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,
-128,255,31,0,255,255,63,63,255,255,255,255,63,63,255,170,255,255,255,63,255,
-255,255,255,255,255,223,95,220,31,207,15,255,31,220,31,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,2,128,0,0,255,31,0,0,0,0,0,0,0,0,0,0,0,0,132,252,47,62,80,189,255,243,
-224,67,0,0,255,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,192,255,255,255,255,255,255,3,0,
-0,255,255,255,255,255,127,255,255,255,255,255,127,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,31,120,12,0,255,255,255,255,191,32,255,
-255,255,255,255,255,255,128,0,0,255,255,127,0,127,127,127,127,127,127,127,127,
-255,255,255,255,0,0,0,0,0,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,224,0,0,0,254,3,62,31,254,255,255,255,255,255,255,255,255,255,127,224,254,
-255,255,255,255,255,255,255,255,255,255,247,224,255,255,255,255,255,254,255,
-255,255,255,255,255,255,255,255,255,127,0,0,255,255,255,7,0,0,0,0,0,0,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,63,0,0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,
-0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,31,0,0,
-0,0,0,0,0,0,255,255,255,255,255,63,255,31,255,255,255,15,0,0,255,255,255,255,
-255,127,240,143,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0,0,0,
-0,128,255,252,255,255,255,255,255,255,255,255,255,255,255,255,249,255,255,255,
-255,255,255,124,0,0,0,0,0,128,255,191,255,255,255,255,0,0,0,255,255,255,255,
-255,255,15,0,255,255,255,255,255,255,255,255,47,0,255,3,0,0,252,232,255,255,
-255,255,255,7,255,255,255,255,7,0,255,255,255,31,255,255,255,255,255,255,247,
-255,0,128,255,3,255,255,255,127,255,255,255,255,255,255,127,0,255,63,255,3,
-255,255,127,252,255,255,255,255,255,255,255,127,5,0,0,56,255,255,60,0,126,126,
-126,0,127,127,255,255,255,255,255,247,255,0,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,7,255,3,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,15,0,255,255,127,248,255,255,255,255,
-255,
-15,255,255,255,255,255,255,255,255,255,255,255,255,255,63,255,255,255,255,255,
-255,255,255,255,255,255,255,255,3,0,0,0,0,127,0,248,224,255,253,127,95,219,
-255,255,255,255,255,255,255,255,255,255,255,255,255,3,0,0,0,248,255,255,255,
-255,255,255,255,255,255,255,255,255,63,0,0,255,255,255,255,255,255,255,255,
-252,255,255,255,255,255,255,0,0,0,0,0,255,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,223,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,31,0,0,255,3,
-254,255,255,7,254,255,255,7,192,255,255,255,255,255,255,255,255,255,255,127,
-252,252,252,28,0,0,0,0,255,239,255,255,127,255,255,183,255,63,255,63,0,0,0,0,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,7,0,0,0,0,0,0,0,0,
-255,255,255,255,255,255,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,255,255,255,31,255,255,255,255,255,255,1,0,0,0,0,
-0,255,255,255,255,0,224,255,255,255,7,255,255,255,255,255,7,255,255,255,63,
-255,255,255,255,15,255,62,0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,63,255,3,255,255,255,255,15,255,255,255,
-255,15,255,255,255,255,255,0,255,255,255,255,255,255,15,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,255,255,255,255,255,255,127,0,255,255,63,0,255,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,63,253,255,255,255,255,191,145,255,255,63,0,255,255,
-127,0,255,255,255,127,0,0,0,0,0,0,0,0,255,255,55,0,255,255,63,0,255,255,255,3,
-0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,192,0,0,0,0,0,0,0,0,111,240,239,
-254,255,255,63,0,0,0,0,0,255,255,255,31,255,255,255,31,0,0,0,0,255,254,255,
-255,31,0,0,0,255,255,255,255,255,255,63,0,255,255,63,0,255,255,7,0,255,255,3,
-0,0,0,0,0,0,0,0,0,0,0,0,
-0,255,255,255,255,255,255,255,255,255,1,0,0,0,0,0,0,255,255,255,255,255,255,7,
-0,255,255,255,255,255,255,7,0,255,255,255,255,255,0,255,3,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,31,128,0,255,255,63,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,255,255,127,0,255,255,255,255,255,255,255,255,63,0,0,0,
-192,255,0,0,252,255,255,255,255,255,255,1,0,0,255,255,255,1,255,3,255,255,255,
-255,255,255,199,255,112,0,255,255,255,255,71,0,255,255,255,255,255,255,255,
-255,30,0,255,23,0,0,0,0,255,255,251,255,255,255,159,64,0,0,0,0,0,0,0,0,127,
-189,255,191,255,1,255,255,255,255,255,255,255,1,255,3,239,159,249,255,255,253,
-237,227,159,25,129,224,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,
-255,255,255,255,255,187,7,255,131,0,0,0,0,255,255,255,255,255,255,255,255,179,
-0,255,3,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,255,63,127,0,0,0,63,0,0,
-0,0,255,255,255,255,255,255,255,127,17,0,255,3,0,0,0,0,255,255,255,255,255,
-255,63,1,255,3,0,0,0,0,0,0,255,255,255,231,255,7,255,3,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,
-0,255,255,255,255,255,255,255,255,255,3,0,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,255,252,255,255,255,255,255,252,26,0,0,0,255,255,255,255,255,255,231,
-127,0,0,255,255,255,255,255,255,255,255,255,32,0,0,0,0,255,255,255,255,255,
-255,255,1,255,253,255,255,255,255,127,127,1,0,255,3,0,0,252,255,255,255,252,
-255,255,254,127,0,0,0,0,0,0,0,0,0,127,251,255,255,255,255,127,180,203,0,255,3,
-191,253,255,255,255,127,123,1,255,3,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,127,0,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,3,0,0,
-0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,255,255,255,127,0,
-0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,
-255,255,255,127,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,
-255,255,255,255,255,255,127,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,
-255,255,255,255,255,255,1,255,255,255,127,255,3,0,0,0,0,0,0,0,0,0,0,0,0,255,
-255,255,63,0,0,255,255,255,255,255,255,0,0,15,0,255,3,248,255,255,224,255,255,
-0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,255,255,255,255,255,255,255,255,255,135,255,255,255,255,255,255,255,128,
-255,255,0,0,0,0,0,0,0,0,11,0,0,0,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,7,0,255,255,255,127,0,0,0,0,0,
-0,7,0,240,0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,15,255,255,255,255,
-255,255,255,255,255,255,255,255,255,7,255,31,255,1,255,67,0,0,0,0,0,0,0,0,0,0,
-0,0,255,255,255,255,255,255,255,255,255,255,223,255,255,255,255,255,255,255,
-255,223,100,222,255,235,239,255,255,255,255,255,255,
-255,191,231,223,223,255,255,255,123,95,252,253,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,63,255,255,255,
-253,255,255,247,255,255,255,247,255,255,223,255,255,255,223,255,255,127,255,
-255,255,127,255,255,255,253,255,255,255,253,255,255,247,207,255,255,255,255,
-255,255,127,255,255,249,219,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,255,255,255,255,255,31,128,63,255,67,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,
-15,255,3,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,31,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255,
-143,8,255,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,239,255,255,255,150,254,247,10,132,234,150,170,150,247,247,94,255,251,255,
-15,238,251,255,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,3,255,255,255,3,255,
-255,255,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
diff --git a/src/ctype/iswalpha.c b/src/ctype/iswalpha.c
index 1c5485d..e3b7037 100644
--- a/src/ctype/iswalpha.c
+++ b/src/ctype/iswalpha.c
@@ -1,16 +1,54 @@
#include <wctype.h>
-static const unsigned char table[] = {
-#include "alpha.h"
+#define PAGE_SH 8
+#define PAGE_MAX (1u << PAGE_SH)
+#define PAGES (0x20000 / PAGE_MAX)
+#define PAGEH (0x200/8 + PAGES) /* 0x200 direct mapped codepts */
+
+const static unsigned char table[PAGEH + 227*sizeof(unsigned)] = {
+#include "iswalpha_table.h"
};
-int iswalpha(wint_t wc)
-{
- if (wc<0x20000U)
- return (table[table[wc>>8]*32+((wc&255)>>3)]>>(wc&7))&1;
- if (wc<0x2fffeU)
- return 1;
- return 0;
+const static unsigned short dict[119] = {
+#include "iswalpha_dict.h"
+};
+
+int iswalpha(wint_t wc) {
+ unsigned *huffm = (unsigned *)(table + PAGEH), target;
+ unsigned page, shfr, lane, base, rev;
+ unsigned huff, type, popc, fast;
+ signed char ext;
+
+ /* Uncommon codepoints, skipped by branch predictors */
+ if ((unsigned)wc >= 0x20000)
+ return (unsigned) wc < 0x2fffe;
+ if ((unsigned)wc-0xa00 < 0x200)
+ return table[PAGES + ((wc-0xa00)>>3)] >> (wc & 7) & 1;
+
+ /* Three level lookup, final level index = popc^rev_direction */
+ target = wc & (PAGE_MAX-1);
+ page = wc >> PAGE_SH;
+ shfr = target & 15;
+ lane = target >> 4;
+ base = table[page];
+ huff = huffm[base];
+ base+= (rev = -(page & 1)) + 1;
+ type = (huff >> (2 * lane)) & 3;
+ popc = (huff << (31 - 2 * lane));
+ popc = (popc & 0x11111111) + ((popc & 0x44444444) >> 2);
+ popc = (popc * 0x11111111) >> 28;
+ ext = (table + PAGEH + base*4)[(int)(popc^rev)];
+
+ /* Fast (1st) path precalcs shfr before ext loaded from mem */
+ /* Dictionary lookup slow (2nd) path is only 1% of codepoints */
+ fast = (type != 2 | ext & 1);
+ if (fast) {
+ shfr = (shfr + 5) & -(type >= 2);
+ shfr = (shfr - (6 & -(type == 2)) + type);
+ return (ext << 8 | 0xfe) >> shfr & 1;
+ } else {
+ return dict[ext >> 1 & 0x7f] >> shfr & 1;
+ }
}
int __iswalpha_l(wint_t c, locale_t l)
diff --git a/src/ctype/iswalpha_dict.h b/src/ctype/iswalpha_dict.h
new file mode 100644
index 0000000..7d1ef92
--- /dev/null
+++ b/src/ctype/iswalpha_dict.h
@@ -0,0 +1,16 @@
+/* wctype: dictionary 119 entries */
+0x0400,0x0420,0xff7f,0x501f,0xbcdf,0xd740,0xfc03,0x027f,
+0xbfff,0x00b6,0x87ff,0x1fef,0xe1fe,0x9fff,0xe000,0x0430,
+0x3fdf,0x03f8,0x9fef,0xe3c5,0x599f,0xb080,0x1003,0xdfef,
+0x1ddf,0x0760,0xe3ef,0x4060,0x5ddf,0x80f0,0xfc00,0xfc7f,
+0x2ffb,0x807f,0xff5f,0x7fff,0x203f,0xf7d6,0x205f,0xff03,
+0xf97f,0x20bf,0x3d7f,0x7f3d,0xff3d,0x3f3f,0x1080,0x0080,
+0x0fef,0xde00,0x046f,0xaaff,0x5fdf,0x1fdc,0x0fcf,0x8002,
+0xfc84,0x3e2f,0xbd50,0x43e0,0x781f,0x80ff,0x7f7f,0x8000,
+0x00e0,0x03fe,0x1f3e,0xe07f,0x8ff0,0xe8fc,0x3800,0x7e7e,
+0xf87f,0xe0f8,0x5f7f,0xfcfc,0x1cfc,0xb7ff,0xff0f,0xfd3f,
+0x91bf,0xf06f,0xfeef,0x409f,0xbd7f,0xe3ed,0x199f,0xe081,
+0x07bb,0x83ff,0x00b3,0x7f3f,0x3f00,0x013f,0x7fe7,0xfb7f,
+0xb47f,0x00cb,0xfdbf,0x017b,0x00f0,0x43ff,0xde64,0xe7bf,
+0xdfdf,0x7bff,0xfc5f,0xff3f,0xcff7,0x07db,0x3f80,0x088f,
+0xfe96,0x0af7,0xea84,0xaa96,0xf796,0x5ef7,0xfbee,
diff --git a/src/ctype/iswalpha_table.h b/src/ctype/iswalpha_table.h
new file mode 100644
index 0000000..c9840b5
--- /dev/null
+++ b/src/ctype/iswalpha_table.h
@@ -0,0 +1,217 @@
+/* wctype: table 512 x 256 codepoints */
+0x02,0x01,0x07,0x0a,0x0b,0x0e,0x0f,0x06,0x16,0x1c,0x1d,0x1e,0x1f,0x15,0x27,0x44,
+0x2e,0x01,0x32,0x31,0x37,0x01,0x3b,0x2d,0x40,0x26,0x45,0x36,0x4b,0x51,0x01,0x4a,
+0x4f,0x55,0x00,0x00,0x52,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x56,0x3a,0x5a,0x00,
+0x5e,0x59,0x00,0x00,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x4e,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x69,
+0x01,0x01,0x01,0x01,0x63,0x01,0x67,0x62,0x6a,0x3f,0x6e,0x5d,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0xa1,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x73,0xb3,0x01,0x72,0x76,0x81,
+0x79,0x89,0x7d,0x66,0x82,0x6d,0x01,0xb6,0x86,0xb9,0x8a,0x85,0x8f,0x75,0x00,0x9d,
+0x92,0x8e,0x96,0xbd,0x9a,0x78,0x9e,0xc1,0xa2,0xc4,0xa6,0x00,0xab,0xaa,0xb0,0x00,
+0x01,0x01,0x01,0x91,0xb4,0x95,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x01,0x01,0x01,0x01,0xb7,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x01,0x01,0xba,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x01,0x01,0xbe,0xaf,0x00,0x00,0xc2,0xc7,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x01,
+0x01,0x01,0x01,0x01,0x01,0x01,0x01,0x99,0x01,0x01,0xc5,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x01,0xca,0xc8,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xcb,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0xd0,0xa5,0xd4,0xcf,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0xd8,0xd3,0xda,0x00,0x00,0x00,0x00,0x00,0xdc,0x7c,0x00,0x00,0x00,0x00,0xde,0x00,
+0x00,0xd7,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0xee,0x87,0xf9,0xff,0xff,0xfd,0x6d,0xc3,0x87,0x19,0x02,0x5e,0xc0,0xff,0x3f,0x00,
+0xee,0xbf,0xfb,0xff,0xff,0xfd,0xed,0xe3,0xbf,0x1b,0x01,0x00,0xcf,0xff,0x00,0x1e,
+0xee,0x9f,0xf9,0xff,0xff,0xfd,0xed,0xe3,0x9f,0x19,0xc0,0xb0,0xcf,0xff,0x02,0x00,
+0xec,0xc7,0x3d,0xd6,0x18,0xc7,0xff,0xc3,0xc7,0x1d,0x81,0x00,0xc0,0xff,0x00,0x00,
+0x00,0x00,0x00,0x00,0x55,0x55,0x55,0x55,
+
+/* wctype: page pair U+00000 [words 2-6] */
+0x00,0xbb,0xa0,0x99,0xfe,0x0f,0xfe,0x0f,0x00,0x02,0x04,0x04,0x1e,0x0f,0x03,0x1c,
+0x54,0x56,0xd5,0xa5,
+
+/* wctype: page pair U+00200 [words 7-10] */
+0x55,0x55,0x55,0x2f,0xc3,0x03,0x06,0xbf,0xfb,0x0a,0x08,0x20,0x00,0x83,0x76,0xd5,
+
+/* wctype: page pair U+00400 [words 11-14] */
+0x55,0x55,0x56,0x55,0x0c,0x07,0x14,0x12,0x10,0x03,0x0e,0xfe,0xd5,0x59,0x82,0xe6,
+
+/* wctype: page pair U+00600 [words 15-21] */
+0x58,0x69,0x55,0xa9,0x0f,0xfd,0x87,0x16,0x18,0x1a,0x0c,0xc0,0x44,0x42,0x40,0x3e,
+0xec,0x3c,0xcf,0x3a,0x38,0xcf,0xfd,0x2e,0x9e,0xba,0x9b,0xfa,
+
+/* wctype: page pair U+00800 [words 22-28] */
+0x29,0x29,0x90,0x6c,0xf9,0x3f,0x03,0x0f,0x20,0xf0,0x22,0x2c,0xcf,0x2a,0x28,0x26,
+0xfb,0xf9,0x24,0xfe,0xcf,0xe1,0xbf,0xdf,0x95,0xfe,0xae,0xba,
+
+/* wctype: page pair U+00a00 [words 29-30] */
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+
+/* wctype: page pair U+00c00 [words 31-38] */
+0xae,0x3a,0xae,0xfa,0x2e,0xfd,0xfb,0xc7,0x30,0x32,0xcf,0x2e,0xfd,0xfb,0x34,0x30,
+0x36,0xcf,0x06,0x07,0x07,0x1f,0x1f,0x7f,0xc0,0x03,0x1f,0x46,0xa9,0xe7,0x65,0x0a,
+
+/* wctype: page pair U+00e00 [words 39-45] */
+0xa7,0x0a,0xb6,0x0a,0xfe,0x46,0x0f,0x48,0x07,0x4a,0xaf,0x77,0x4c,0xe7,0x07,0x5c,
+0x03,0xcf,0x0d,0xbf,0x0f,0x0f,0x0f,0xbf,0xde,0xed,0xd5,0x2a,
+
+/* wctype: page pair U+01000 [words 46-49] */
+0x95,0x56,0x59,0x96,0x50,0x07,0x7f,0x52,0xef,0x5a,0x0f,0x58,0x59,0x09,0x51,0x95,
+
+/* wctype: page pair U+01200 [words 50-54] */
+0x55,0x5a,0x96,0x5a,0x7b,0x54,0x7b,0x56,0x58,0x04,0x03,0xbf,0xe7,0x07,0x60,0xef,
+0xd5,0x0a,0x65,0xf5,
+
+/* wctype: page pair U+01400 [words 55-58] */
+0x57,0x55,0x55,0x55,0xfe,0x7c,0x7c,0x7c,0x7c,0x7f,0x7a,0x52,0x65,0x25,0xad,0x5a,
+
+/* wctype: page pair U+01600 [words 59-63] */
+0x55,0x65,0x5b,0xa5,0x1a,0xfe,0x0f,0x8f,0x03,0x46,0x07,0x7e,0xf7,0x3f,0x07,0x0f,
+0x65,0x9d,0xd5,0x9a,
+
+/* wctype: page pair U+01800 [words 64-68] */
+0x58,0x95,0x65,0xd5,0x07,0x03,0x0f,0x3f,0x3f,0xfd,0x4e,0xfe,0x3f,0xfd,0x07,0x01,
+0x23,0xe6,0x9a,0x00,
+
+/* wctype: page pair U+01a00 [words 69-74] */
+0x59,0xf9,0x2a,0x00,0x1f,0x46,0xfe,0x1f,0x07,0x07,0x5e,0x6a,0x3f,0x6c,0x6a,0x68,
+0x7f,0x66,0x5a,0x5a,0x59,0x9a,0x95,0xaa,
+
+/* wctype: page pair U+01c00 [words 75-78] */
+0xd5,0x96,0x96,0xa0,0x7f,0xc7,0x7f,0x03,0xcf,0x62,0x64,0x3f,0x55,0x55,0xd5,0x00,
+
+/* wctype: page pair U+02000 [words 79-81] */
+0x00,0x80,0x08,0x00,0x6e,0x3f,0x1f,0x80,0x55,0x55,0x55,0xf0,
+
+/* wctype: page pair U+02400 [words 82-85] */
+0x00,0x00,0xc0,0x25,0xc0,0x07,0x03,0x76,0xe7,0x74,0x72,0x70,0xaa,0x52,0x02,0x00,
+
+/* wctype: page pair U+02c00 [words 86-89] */
+0x65,0x59,0x55,0xe5,0x46,0x46,0x78,0x0c,0x0f,0x46,0xfe,0xe0,0xd7,0x55,0x92,0x40,
+
+/* wctype: page pair U+02e00 [words 90-93] */
+0x20,0x00,0x00,0x00,0x7e,0x07,0x0f,0x01,0xef,0x7c,0x7e,0x8e,0x6e,0x69,0x55,0xa5,
+
+/* wctype: page pair U+03000 [words 94-98] */
+0xa2,0x57,0x79,0x95,0x80,0x82,0x84,0xfe,0x86,0xfe,0xef,0x80,0x7c,0xf3,0xfc,0x80,
+0x7c,0x55,0x56,0xc3,
+
+/* wctype: page pair U+0a400 [words 99-102] */
+0x55,0x55,0x02,0x94,0x3f,0x7f,0x3e,0x9c,0x7f,0x0f,0x0f,0x1c,0x65,0x96,0x59,0x0e,
+
+/* wctype: page pair U+0a600 [words 103-105] */
+0x26,0xa5,0x55,0x15,0x3f,0x1f,0x46,0x88,0x55,0x55,0x55,0x15,
+
+/* wctype: page pair U+0a800 [words 106-109] */
+0x27,0xd5,0x55,0x8b,0xbf,0x01,0x0f,0x2f,0x07,0x8a,0x0f,0x01,0x65,0x35,0x00,0x00,
+
+/* wctype: page pair U+0aa00 [words 110-114] */
+0xd5,0x9a,0x95,0xdb,0x7f,0x7f,0x07,0x3e,0x46,0x05,0x8c,0x3c,0x1f,0x01,0xfc,0x7f,
+0x95,0x54,0x5d,0x82,
+
+/* wctype: page pair U+0fa00 [words 115-117] */
+0x55,0x65,0x55,0x09,0x7f,0x07,0x07,0x01,0xa5,0x00,0x00,0x00,
+
+/* wctype: page pair U+0fe00 [words 118-120] */
+0x00,0xc0,0x55,0x95,0xdf,0x3f,0xb8,0xb6,0x00,0x00,0x95,0x08,
+
+/* wctype: page pair U+10000 [words 121-124] */
+0xa6,0x0a,0x55,0x95,0xdf,0x04,0x9a,0x7f,0x7f,0x0f,0x07,0xde,0x55,0x0a,0x00,0x00,
+
+/* wctype: page pair U+10200 [words 125-129] */
+0x00,0x00,0x59,0x0d,0x3f,0x01,0x00,0x98,0x96,0x46,0xc0,0x0f,0xfe,0x0f,0xfe,0x07,
+0xb8,0x7b,0x95,0x0a,
+
+/* wctype: page pair U+10400 [words 130-133] */
+0x55,0x55,0x69,0x99,0x7f,0x07,0x9c,0x1f,0x03,0x07,0x3f,0x3f,0xd5,0xdd,0x0d,0x00,
+
+/* wctype: page pair U+10800 [words 134-137] */
+0x96,0xdd,0x09,0xd0,0x9e,0xa0,0x3f,0x7f,0x46,0x37,0x00,0x1f,0x00,0xd5,0x00,0x00,
+
+/* wctype: page pair U+10a00 [words 138-142] */
+0xda,0x90,0x09,0x36,0xa2,0xa4,0x3f,0x3f,0x3f,0xfd,0x1f,0x2f,0x1e,0x47,0x70,0xc7,
+0xd5,0xd7,0x55,0x0b,
+
+/* wctype: page pair U+10c00 [words 143-145] */
+0x55,0x02,0xd5,0xd5,0x03,0x07,0x07,0x07,0x55,0x55,0x09,0x00,
+
+/* wctype: page pair U+11000 [words 146-149] */
+0x55,0x33,0x97,0xa4,0x3f,0xc0,0xfc,0x03,0x03,0x07,0x00,0x0f,0x55,0x03,0x00,0x00,
+
+/* wctype: page pair U+11200 [words 150-153] */
+0x9d,0x00,0x6a,0xa5,0xfb,0xa6,0xa8,0x10,0x03,0x03,0x07,0x01,0x55,0x55,0x55,0x95,
+
+/* wctype: page pair U+11400 [words 154-157] */
+0x55,0x0a,0x55,0x0a,0xb0,0xb2,0xb4,0x07,0x7f,0x3f,0x5e,0x3f,0x69,0x03,0x00,0xd0,
+
+/* wctype: page pair U+11600 [words 158-161] */
+0x95,0x0b,0x95,0x02,0x46,0x11,0x07,0xba,0x07,0x1f,0x90,0x0f,0x55,0x55,0x75,0x96,
+
+/* wctype: page pair U+11800 [words 162-165] */
+0x95,0x00,0x50,0xa5,0x03,0x07,0x7e,0xfd,0xd4,0xd2,0xd0,0xce,0x9a,0x5e,0x55,0x55,
+
+/* wctype: page pair U+11a00 [words 166-170] */
+0x95,0x54,0x09,0x95,0xbc,0x41,0x03,0x00,0x07,0xc6,0x46,0xc4,0x07,0xc2,0xc0,0xbe,
+0x96,0x6a,0x2a,0x00,
+
+/* wctype: page pair U+11c00 [words 171-175] */
+0x96,0xcb,0xed,0x00,0xfb,0x7c,0x01,0x07,0xfc,0xfc,0xfd,0x7f,0xc1,0xf8,0x07,0x0f,
+0x15,0xbb,0x01,0x00,
+
+/* wctype: page pair U+11e00 [words 176-179] */
+0x00,0x00,0x00,0xd0,0x7f,0xf8,0x03,0xdb,0x94,0xfb,0x92,0x7f,0xab,0x57,0xd5,0x5c,
+
+/* wctype: page pair U+12400 [words 180-182] */
+0x55,0x25,0x55,0x55,0x46,0x01,0x3f,0x7f,0xd5,0x2d,0x00,0x00,
+
+/* wctype: page pair U+13400 [words 183-185] */
+0x25,0x00,0x00,0x00,0x46,0x81,0x07,0x3f,0x9d,0x00,0x95,0x00,
+
+/* wctype: page pair U+14600 [words 186-189] */
+0x55,0x03,0x00,0x00,0x7f,0x0f,0xae,0xac,0xaa,0xfb,0xf9,0x24,0xae,0x3a,0x00,0x00,
+
+/* wctype: page pair U+16a00 [words 190-193] */
+0x95,0x29,0x00,0x24,0x03,0x46,0x07,0x7f,0x00,0x07,0x0f,0xcf,0xa9,0x00,0x00,0x00,
+
+/* wctype: page pair U+16e00 [words 194-196] */
+0x00,0x55,0x00,0x00,0x00,0x1a,0xf9,0xf9,0x00,0x00,0x60,0x39,
+
+/* wctype: page pair U+18a00 [words 197-199] */
+0x55,0x55,0x55,0xd5,0x07,0x0b,0x7a,0x14,0x55,0x56,0x06,0x30,
+
+/* wctype: page pair U+1b200 [words 200-202] */
+0x55,0x55,0x55,0x95,0x1f,0xc8,0x07,0x46,0x09,0x6c,0x55,0x55,
+
+/* wctype: page pair U+1bc00 [words 203-207] */
+0x55,0xa5,0x0a,0x00,0x0f,0x3f,0x03,0xca,0x00,0xd8,0xfb,0xfb,0x46,0x46,0xdf,0xdf,
+0xdd,0x66,0x66,0x56,
+
+/* wctype: page pair U+1d400 [words 208-211] */
+0x55,0x5d,0xa9,0x57,0xdf,0xbf,0xcc,0xd7,0xef,0xca,0xdc,0x3f,0xa5,0x02,0x00,0x00,
+
+/* wctype: page pair U+1d600 [words 212-215] */
+0x55,0x55,0x65,0x9b,0xd6,0xfd,0xef,0xef,0x00,0x07,0x07,0x07,0x40,0x66,0x02,0x00,
+
+/* wctype: page pair U+1e000 [words 216-217] */
+0x2a,0x00,0x00,0x00,0x04,0xf3,0xda,0x00,
+
+/* wctype: page pair U+1e200 [words 218-219] */
+0x00,0x00,0x00,0xa5,0x1f,0x07,0x00,0x00,
+
+/* wctype: page pair U+1e800 [words 220-221] */
+0x55,0x55,0x55,0x03,0x1f,0x00,0x00,0x00,
+
+/* wctype: page pair U+1ee00 [words 222-226] */
+0xa7,0xaa,0xaa,0x00,0xef,0xe0,0xe2,0xe4,0xe6,0xe8,0xea,0xf7,0x1f,0xec,0x1f,0x00,
+0x00,0x00,0x00,0x00,
diff --git a/src/ctype/iswpunct.c b/src/ctype/iswpunct.c
index f0b9ea0..f534702 100644
--- a/src/ctype/iswpunct.c
+++ b/src/ctype/iswpunct.c
@@ -1,14 +1,50 @@
#include <wctype.h>
-static const unsigned char table[] = {
-#include "punct.h"
+#define PAGE_SH 8
+#define PAGE_MAX (1u << PAGE_SH)
+#define PAGES (0x20000 / PAGE_MAX)
+#define PAGEH (PAGES)
+
+const static unsigned char table[PAGEH + 213*sizeof(unsigned)] = {
+#include "iswpunct_table.h"
};
-int iswpunct(wint_t wc)
-{
- if (wc<0x20000U)
- return (table[table[wc>>8]*32+((wc&255)>>3)]>>(wc&7))&1;
- return 0;
+const static unsigned short dict[85] = {
+#include "iswpunct_dict.h"
+};
+
+int iswpunct(wint_t wc) {
+ unsigned *huffm = (unsigned *)(table + PAGEH), target;
+ unsigned page, shfr, lane, base, rev;
+ unsigned huff, type, popc, fast;
+ signed char ext;
+ if ((unsigned)wc >= 0x20000)
+ return 0;
+
+ /* Three level lookup, final level index = popc^rev_direction */
+ target = wc & (PAGE_MAX-1);
+ page = wc >> PAGE_SH;
+ shfr = target & 15;
+ lane = target >> 4;
+ base = table[page];
+ huff = huffm[base];
+ base+= (rev = -(page & 1)) + 1;
+ type = (huff >> (2 * lane)) & 3;
+ popc = (huff << (31 - 2 * lane));
+ popc = (popc & 0x11111111) + ((popc & 0x44444444) >> 2);
+ popc = (popc * 0x11111111) >> 28;
+ ext = (table + PAGEH + base*4)[(int)(popc^rev)];
+
+ /* Fast (1st) path precalcs shfr before ext loaded from mem */
+ /* Dictionary lookup slow (2nd) path is only 1% of codepoints */
+ fast = (type != 2 | (ext & 1) == 0);
+ if (fast) {
+ shfr = (shfr + 6) & -(type >= 2);
+ shfr = (shfr - (8 & -(type == 2)) + (type^1));
+ return (ext << 8 | 0x01) >> shfr & 1;
+ } else {
+ return dict[ext >> 1 & 0x7f] >> shfr & 1;
+ }
}
int __iswpunct_l(wint_t c, locale_t l)
diff --git a/src/ctype/iswpunct_dict.h b/src/ctype/iswpunct_dict.h
new file mode 100644
index 0000000..bd1bc0f
--- /dev/null
+++ b/src/ctype/iswpunct_dict.h
@@ -0,0 +1,12 @@
+/* wctype: dictionary 85 entries */
+0x7800,0xfbff,0xfbdf,0x0080,0xafe0,0x4020,0x00b0,0x03fc,
+0x4000,0xe010,0x1e01,0x6000,0xbfff,0x07ff,0xe3cf,0x7fff,
+0x4e00,0xfc07,0x6ffc,0xe003,0x00fd,0xa000,0x7f00,0x03ff,
+0x8000,0xdfc0,0x00fc,0xdfff,0x0680,0x1fff,0x2f7f,0x9fe0,
+0x3f7f,0xf00c,0xf880,0x00ff,0x21ff,0x0390,0xfbe0,0xfcff,
+0x7ff1,0x037b,0xc1d0,0x42af,0xbc1f,0xff3f,0x87e0,0xfe03,
+0x8001,0x0fff,0xff1e,0xfc01,0xe0c1,0x700f,0x00f0,0xc010,
+0x1703,0x8008,0x3fff,0x0380,0x0f7f,0x7f7f,0x8fc0,0x8700,
+0x01ff,0xf860,0x3fc0,0x2003,0x3fe1,0x3f60,0x1fc0,0xf844,
+0x6800,0x00c0,0x8018,0x0180,0x8003,0xb000,0xfe7f,0x0770,
+0xefff,0xfc7b,0xc7e7,0xe7ff,0x070f,
diff --git a/src/ctype/iswpunct_table.h b/src/ctype/iswpunct_table.h
new file mode 100644
index 0000000..7b4c16a
--- /dev/null
+++ b/src/ctype/iswpunct_table.h
@@ -0,0 +1,217 @@
+/* wctype: table 512 x 256 codepoints */
+0x02,0x00,0x08,0x2a,0x0c,0x07,0x0f,0x31,0x14,0x47,0x19,0x0b,0x1e,0x13,0x22,0x18,
+0x27,0x00,0x00,0x0e,0x2b,0x00,0x2e,0x1d,0x32,0x2d,0x36,0x35,0x3a,0x21,0x00,0x42,
+0x3e,0x26,0x01,0x01,0x43,0x01,0x01,0x01,0x01,0x01,0x01,0x4a,0x48,0x4d,0x4b,0x51,
+0x4e,0x85,0x52,0x01,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x5b,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x56,0x00,0x59,0x39,0x5c,0x89,0x60,0x58,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x64,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x67,0x00,0x00,0x3d,0x00,0x5f,0x6b,0x8e,
+0x00,0x55,0x6f,0x66,0x00,0x95,0x00,0x00,0x72,0x6e,0x76,0xb4,0x7b,0x00,0x7f,0x75,
+0x82,0x6a,0x86,0x7a,0x8a,0x63,0x8f,0x99,0x93,0xb7,0x96,0x00,0x9a,0x9d,0x9e,0xba,
+0x00,0x00,0x00,0x00,0xa1,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0xa4,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xa7,0x71,0x00,0x00,0xaa,0xbd,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xad,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
+0xb1,0xa0,0xb5,0xa3,0x00,0x00,0xb8,0x7e,0x01,0x01,0xbb,0x00,0x00,0x00,0x00,0x00,
+0x00,0xa6,0xbe,0x00,0x00,0x00,0x00,0x00,0xc0,0xa9,0x00,0x00,0xc2,0xac,0xc4,0x00,
+0xc6,0xb0,0xc9,0x01,0x01,0x01,0xcc,0x81,0xce,0x92,0xd1,0x00,0x00,0x00,0x00,0x00,
+0x00,0x00,0x00,0x00,0x55,0x55,0x55,0x55,
+
+/* wctype: page pair U+00000 [words 2-7] */
+0xb0,0xbb,0xa0,0x88,0xfe,0xf8,0x01,0xf0,0x01,0x01,0x03,0x05,0x07,0x07,0x18,0x49,
+0x11,0xfe,0xcc,0xf8,0x00,0x08,0x9e,0xc3,
+
+/* wctype: page pair U+00200 [words 8-11] */
+0x00,0x00,0x00,0x6f,0x3c,0xfc,0x09,0x1b,0x40,0x29,0x40,0x20,0x80,0x82,0x00,0x82,
+
+/* wctype: page pair U+00400 [words 12-14] */
+0x00,0x00,0x02,0x00,0x0f,0x2f,0x3b,0xc0,0x00,0x98,0x08,0x00,
+
+/* wctype: page pair U+00600 [words 15-19] */
+0x09,0x28,0x00,0xa8,0xb0,0x02,0x78,0x13,0x15,0x17,0x10,0x08,0x2f,0x2d,0x2b,0x30,
+0x80,0x8a,0x00,0xc2,
+
+/* wctype: page pair U+00800 [words 20-24] */
+0xa8,0x08,0x00,0x2c,0x06,0x40,0x1f,0x21,0x08,0x23,0x1b,0x37,0x80,0x35,0xf8,0xfe,
+0x67,0x00,0x82,0x0a,
+
+/* wctype: page pair U+00a00 [words 25-29] */
+0x80,0xc2,0x80,0x82,0x20,0x40,0x40,0x20,0x40,0x27,0x2f,0x3d,0xfc,0x30,0x70,0x10,
+0xcc,0x00,0xc0,0x8a,
+
+/* wctype: page pair U+00c00 [words 30-33] */
+0x03,0xc2,0x83,0x02,0x10,0x40,0x80,0x10,0x20,0x40,0x4d,0x7f,0x00,0x00,0x00,0xb5,
+
+/* wctype: page pair U+00e00 [words 34-38] */
+0xa0,0x0a,0x80,0x02,0x31,0x31,0x33,0x18,0x08,0x3e,0x1c,0x59,0x18,0x57,0x55,0x53,
+0xaa,0x06,0x56,0x55,
+
+/* wctype: page pair U+01000 [words 39-42] */
+0x80,0x02,0x08,0x80,0x39,0xf8,0x80,0x10,0x40,0x0d,0x0b,0xdf,0x55,0x97,0x02,0xc0,
+
+/* wctype: page pair U+01400 [words 43-45] */
+0x03,0x00,0x00,0x00,0x01,0x88,0x31,0x1c,0x80,0x03,0x00,0x58,
+
+/* wctype: page pair U+01600 [words 46-49] */
+0x00,0x20,0x0b,0x20,0x17,0x01,0x30,0x70,0x1d,0xf0,0x1b,0x19,0x02,0x02,0x00,0xa0,
+
+/* wctype: page pair U+01800 [words 50-53] */
+0x02,0x00,0x00,0x00,0x1f,0x43,0x40,0x18,0x3b,0xf8,0x10,0x10,0xc0,0x9b,0x20,0xb0,
+
+/* wctype: page pair U+01a00 [words 54-57] */
+0x08,0xb0,0xa0,0x00,0x80,0x01,0x3f,0x41,0x1f,0x0c,0x03,0x7f,0x3d,0x00,0x02,0x00,
+
+/* wctype: page pair U+01c00 [words 58-61] */
+0x80,0x80,0x00,0xa6,0x45,0x80,0x47,0x49,0x4b,0x03,0xfc,0x04,0x20,0x00,0xc0,0x03,
+
+/* wctype: page pair U+02000 [words 62-66] */
+0x66,0xb9,0x52,0xd4,0x45,0x4f,0x1f,0xdf,0x51,0x1f,0x01,0x17,0xc0,0xc0,0x27,0x2b,
+0x00,0x00,0x80,0xaa,
+
+/* wctype: page pair U+02400 [words 67-71] */
+0x35,0x52,0xd5,0x60,0x7f,0x1b,0x3f,0xf8,0x25,0x40,0x20,0x01,0x30,0x1e,0x40,0x20,
+0x80,0xfe,0x80,0x82,
+
+/* wctype: page pair U+02c00 [words 72-74] */
+0x00,0x00,0x00,0xa0,0x5d,0x5f,0x5b,0xcf,0x55,0xd5,0x59,0x55,
+
+/* wctype: page pair U+02e00 [words 75-77] */
+0x65,0x01,0x59,0xd5,0x1f,0x03,0x0f,0x61,0x00,0x80,0x00,0x00,
+
+/* wctype: page pair U+03000 [words 78-81] */
+0xa6,0x00,0x38,0x80,0x65,0x67,0x69,0x3c,0x01,0x10,0x63,0x3f,0x55,0x55,0x55,0x8d,
+
+/* wctype: page pair U+03200 [words 82-85] */
+0x59,0x55,0x55,0x55,0x1f,0x75,0x01,0x63,0x1f,0xe0,0x8f,0x87,0xd7,0xc0,0x3a,0x94,
+
+/* wctype: page pair U+0a400 [words 86-88] */
+0x00,0x00,0x54,0x83,0x7f,0x80,0x70,0x10,0x00,0x08,0x00,0x20,
+
+/* wctype: page pair U+0a600 [words 89-91] */
+0x02,0xa0,0x00,0x80,0xc0,0x31,0x6b,0x47,0x00,0x00,0x00,0x55,
+
+/* wctype: page pair U+0a800 [words 92-95] */
+0xa3,0x80,0x00,0x92,0x40,0x1e,0x2f,0x6d,0x6f,0x71,0x60,0x80,0x80,0x00,0x00,0x80,
+
+/* wctype: page pair U+0aa00 [words 96-99] */
+0x00,0x88,0x80,0xcb,0xe0,0x77,0x31,0x02,0x80,0x43,0x47,0x31,0x00,0x00,0x80,0x09,
+
+/* wctype: page pair U+0e000 [words 100-102] */
+0x03,0x00,0x00,0x00,0x01,0x01,0x31,0x0f,0x30,0x00,0x08,0x0c,
+
+/* wctype: page pair U+0f800 [words 103-106] */
+0x00,0x00,0x00,0x80,0x31,0x1f,0xfe,0xd0,0x89,0x38,0x0f,0x18,0xc0,0xc3,0x00,0xfa,
+
+/* wctype: page pair U+0fe00 [words 107-110] */
+0x59,0x2d,0x00,0x80,0x2f,0xf7,0x79,0x31,0xfc,0x60,0x31,0x7d,0x88,0x00,0x80,0x5d,
+
+/* wctype: page pair U+10200 [words 111-113] */
+0x00,0x00,0x00,0x90,0x63,0x03,0xf0,0x30,0x40,0x3b,0x00,0x00,
+
+/* wctype: page pair U+10800 [words 114-117] */
+0x00,0xcc,0x30,0x80,0x80,0x80,0x80,0xf0,0x2f,0xc0,0x7f,0xc0,0x38,0x0b,0x00,0x00,
+
+/* wctype: page pair U+10a00 [words 118-122] */
+0x80,0x8a,0x08,0xe2,0x7f,0x81,0x81,0xc0,0xc0,0x02,0x83,0x7f,0x1f,0x8d,0x40,0x30,
+0x80,0xe2,0x00,0x00,
+
+/* wctype: page pair U+10c00 [words 123-126] */
+0x00,0x00,0x00,0x80,0xf8,0x08,0x04,0x04,0x31,0x31,0x20,0x20,0xcc,0x22,0x22,0x03,
+
+/* wctype: page pair U+10e00 [words 127-129] */
+0x00,0x90,0x00,0x00,0x1f,0x63,0x81,0x0f,0x55,0xd5,0x55,0x29,
+
+/* wctype: page pair U+11000 [words 130-133] */
+0x00,0xbe,0x83,0x02,0x85,0xfc,0x3f,0x31,0x03,0xfc,0x87,0x0f,0x00,0x00,0x04,0x35,
+
+/* wctype: page pair U+11200 [words 134-137] */
+0x80,0x00,0x20,0x20,0x8b,0x04,0x0c,0x80,0x75,0x08,0x73,0xf0,0x20,0x08,0xc0,0x0a,
+
+/* wctype: page pair U+11400 [words 138-142] */
+0x00,0x0a,0x00,0x03,0x8f,0x91,0x4c,0x60,0x7b,0x3f,0xf0,0x01,0xf0,0x01,0xf8,0xfe,
+0xbb,0x3b,0x00,0xa0,
+
+/* wctype: page pair U+11600 [words 143-146] */
+0x80,0x23,0x80,0x00,0x31,0x0e,0x3b,0x93,0xa7,0xa5,0xa3,0xa1,0x56,0x95,0x65,0x56,
+
+/* wctype: page pair U+11800 [words 147-149] */
+0x80,0x00,0x00,0xe0,0x1c,0xf8,0x07,0x31,0x00,0x20,0x00,0x00,
+
+/* wctype: page pair U+11a00 [words 150-153] */
+0x80,0x02,0x38,0x00,0x95,0x47,0xbe,0x07,0x00,0x00,0xf8,0x10,0xa0,0x00,0x00,0x00,
+
+/* wctype: page pair U+11c00 [words 154-157] */
+0x80,0xeb,0x00,0x00,0x31,0x3e,0xf8,0x3b,0x03,0x00,0x07,0x34,0x00,0x03,0x08,0x00,
+
+/* wctype: page pair U+11e00 [words 158-160] */
+0x00,0x00,0x00,0x80,0x97,0x00,0x81,0x9d,0x65,0x55,0x55,0x25,
+
+/* wctype: page pair U+12400 [words 161-163] */
+0x00,0xc0,0x00,0x00,0x1f,0x00,0x81,0x7f,0x55,0x9d,0x00,0x00,
+
+/* wctype: page pair U+13400 [words 164-166] */
+0x80,0x00,0x00,0x00,0x81,0x00,0x31,0x7f,0xc0,0x02,0x00,0x00,
+
+/* wctype: page pair U+16a00 [words 167-169] */
+0x00,0x20,0x00,0xc0,0x80,0x3f,0x80,0x9f,0x00,0x0a,0x00,0x00,
+
+/* wctype: page pair U+16e00 [words 170-172] */
+0x00,0x00,0x09,0x00,0x1b,0x00,0x75,0xfe,0x97,0x00,0x00,0x00,
+
+/* wctype: page pair U+1bc00 [words 173-176] */
+0x00,0x00,0x38,0x00,0x9b,0x0f,0xc0,0x3b,0xf8,0x38,0xf8,0x3b,0x16,0x22,0x26,0x70,
+
+/* wctype: page pair U+1d000 [words 177-180] */
+0x55,0x55,0x55,0xd5,0x3f,0x00,0x00,0xfc,0x3c,0xfe,0xfe,0xfc,0x80,0x88,0x28,0x00,
+
+/* wctype: page pair U+1d200 [words 181-183] */
+0x55,0x03,0x00,0xd0,0x3f,0x0f,0x00,0x05,0x00,0x00,0x00,0x30,
+
+/* wctype: page pair U+1d600 [words 184-186] */
+0x00,0x00,0x00,0x8b,0x02,0x10,0x10,0x99,0x00,0x00,0x00,0x95,
+
+/* wctype: page pair U+1da00 [words 187-189] */
+0x55,0x55,0x3a,0x00,0x63,0xf0,0xfe,0x04,0x00,0x00,0x00,0x30,
+
+/* wctype: page pair U+1e200 [words 190-191] */
+0x00,0x00,0x00,0xa0,0xe0,0x31,0x00,0x00,
+
+/* wctype: page pair U+1e800 [words 192-193] */
+0x00,0x00,0x00,0x0f,0x80,0x7f,0x00,0x00,
+
+/* wctype: page pair U+1ec00 [words 194-195] */
+0x00,0xc0,0xd5,0x00,0xfe,0x1f,0x00,0x00,
+
+/* wctype: page pair U+1ee00 [words 196-197] */
+0x00,0x00,0x00,0xc0,0x03,0x00,0x00,0x00,
+
+/* wctype: page pair U+1f000 [words 198-200] */
+0x65,0x55,0xed,0xdf,0x63,0x0f,0x1f,0xfe,0xfe,0xfe,0x3f,0x00,
+
+/* wctype: page pair U+1f200 [words 201-203] */
+0x97,0x3e,0x00,0x00,0x07,0x63,0x81,0x03,0x3f,0x00,0x00,0x00,
+
+/* wctype: page pair U+1f600 [words 204-205] */
+0x55,0x55,0x55,0xad,0x3f,0x3b,0x1b,0x00,
+
+/* wctype: page pair U+1f800 [words 206-208] */
+0x56,0x5a,0x26,0x00,0x63,0x47,0x2f,0x47,0x75,0x00,0x00,0x00,
+
+/* wctype: page pair U+1fa00 [words 209-212] */
+0x55,0xad,0x0f,0x00,0x0f,0x75,0xa9,0x07,0x3f,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
diff --git a/src/ctype/punct.h b/src/ctype/punct.h
deleted file mode 100644
index 6792947..0000000
--- a/src/ctype/punct.h
+++ /dev/null
@@ -1,141 +0,0 @@
-18,16,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,16,16,34,35,16,36,37,38,39,
-40,41,42,43,16,44,45,46,17,17,47,17,17,17,17,17,17,48,49,50,51,52,53,54,55,17,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,56,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,57,16,58,59,60,61,62,63,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,64,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,65,16,16,66,16,67,68,
-69,16,70,71,72,16,73,16,16,74,75,76,77,78,16,79,80,81,82,83,84,85,86,87,88,89,
-90,91,16,92,93,94,95,16,16,16,16,96,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,97,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,98,99,16,16,100,101,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,16,16,16,16,16,102,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,
-16,16,16,103,104,105,106,16,16,107,108,17,17,109,16,16,16,16,16,16,110,111,16,
-16,16,16,16,112,113,16,16,114,115,116,16,117,118,119,17,17,17,120,121,122,123,
-124,16,16,16,16,
-16,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,254,255,0,252,1,0,0,248,1,
-0,0,120,0,0,0,0,255,251,223,251,0,0,128,0,0,0,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,60,0,252,255,224,175,255,255,255,255,255,255,255,255,
-255,255,223,255,255,255,255,255,32,64,176,0,0,0,0,0,0,0,0,0,0,0,0,0,64,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,252,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,252,0,0,0,0,0,230,254,255,255,255,0,64,73,0,0,0,0,0,24,0,255,255,0,216,
-0,0,0,0,0,0,0,1,0,60,0,0,0,0,0,0,0,0,0,0,0,0,16,224,1,30,0,
-96,255,191,0,0,0,0,0,0,255,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,248,207,
-227,0,0,0,3,0,32,255,127,0,0,0,78,0,0,0,0,0,0,0,0,0,0,0,0,0,0,8,0,7,252,0,0,0,
-0,0,0,0,0,0,16,0,32,30,0,48,0,1,0,0,0,0,0,0,0,0,16,0,32,0,0,0,0,252,111,0,0,0,
-0,0,0,0,16,0,32,0,0,0,0,64,0,0,0,0,0,0,0,0,16,0,32,0,0,0,0,3,224,0,0,0,0,0,0,
-0,16,0,32,0,0,0,0,253,0,0,0,0,0,0,0,0,0,0,32,0,0,0,0,255,7,16,0,0,0,0,0,0,0,0,
-32,0,0,0,0,128,255,16,0,0,0,0,0,0,16,0,32,0,0,0,0,0,0,0,0,0,0,0,0,0,24,0,160,
-0,127,0,0,255,3,0,0,0,0,0,0,0,0,0,4,0,0,0,0,16,0,0,0,0,0,0,128,0,128,192,223,
-0,12,0,0,0,0,0,0,0,0,0,0,0,4,0,31,0,0,0,0,0,
-0,254,255,255,255,0,252,255,255,0,0,0,0,0,0,0,0,252,0,0,0,0,0,0,192,255,223,
-255,7,0,0,0,0,0,0,0,0,0,0,128,6,0,252,0,0,0,0,0,0,0,0,0,192,0,0,0,0,0,0,0,0,0,
-0,0,8,0,0,0,0,0,0,0,0,0,0,0,224,255,255,255,31,0,0,255,3,0,0,0,0,0,0,0,0,0,0,
-0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,96,0,0,1,0,0,24,0,0,0,0,0,0,0,0,0,56,0,0,0,0,16,0,0,0,112,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,48,0,0,254,127,47,0,0,255,3,255,127,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14,49,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,196,255,255,255,
-255,0,0,0,192,0,0,0,0,0,0,0,0,1,0,224,159,0,0,0,0,127,63,255,127,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,16,0,16,0,0,252,255,255,255,31,0,0,0,0,0,12,0,0,0,0,0,0,64,0,
-12,240,0,0,0,0,0,0,128,248,0,0,0,0,0,0,0,192,0,0,0,0,0,0,0,0,255,0,255,255,
-255,33,144,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,
-127,0,224,251,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,160,3,224,0,224,0,
-224,0,96,128,248,255,255,255,252,255,255,255,255,255,127,223,255,241,127,255,
-127,0,0,255,255,255,255,0,0,255,255,255,255,1,0,123,3,208,193,175,66,0,12,31,
-188,255,255,0,0,0,0,0,14,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,127,0,0,0,255,7,0,0,255,255,255,255,255,255,255,255,255,
-255,63,0,0,0,0,0,0,252,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,207,255,255,255,
-63,255,255,255,255,255,255,255,255,255,255,255,255,255,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,224,135,3,254,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,
-128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,127,255,255,255,255,0,
-0,0,0,0,0,255,255,255,251,255,255,255,255,255,255,255,255,255,255,15,0,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,63,0,0,0,255,15,30,255,255,255,1,252,193,224,0,0,0,0,
-0,0,0,0,0,0,0,30,1,0,0,0,0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-255,255,0,0,0,0,255,255,255,255,15,0,0,0,255,255,255,127,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,
-255,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,255,
-255,255,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,255,127,0,0,0,
-0,0,0,192,0,224,0,0,0,0,0,0,0,0,0,0,0,128,15,112,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-255,0,255,255,127,0,3,0,0,0,0,0,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-64,0,0,0,0,15,255,3,0,0,0,0,0,0,240,0,0,0,0,0,0,0,0,0,16,192,0,0,255,255,3,23,
-0,0,0,0,0,248,0,0,0,0,8,128,0,0,0,0,0,0,0,0,0,0,8,0,255,63,0,192,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,240,0,0,128,3,0,0,0,0,0,0,0,128,2,0,0,192,0,0,67,0,0,0,0,0,
-0,0,0,0,0,0,0,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,56,0,
-0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,0,0,0,0,0,2,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,252,255,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,192,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,48,255,255,255,3,255,255,255,255,255,255,247,
-255,127,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,254,255,0,252,1,0,0,248,1,0,
-0,248,63,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,127,127,0,48,135,255,255,255,255,255,
-143,255,0,0,0,0,0,0,224,255,255,127,255,15,1,0,0,0,0,0,255,255,255,255,255,63,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,
-15,0,0,0,0,15,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-128,255,0,0,128,255,0,0,0,0,128,255,0,0,0,0,0,0,0,0,0,248,0,0,192,143,0,0,0,
-128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,48,255,255,252,255,255,255,255,255,0,0,0,0,
-0,0,0,135,255,1,255,1,0,0,0,224,0,0,0,224,0,0,0,0,0,1,0,0,96,248,127,0,0,0,0,
-0,0,0,0,254,0,0,0,255,0,0,0,255,0,0,0,30,0,254,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,252,0,0,0,0,0,0,0,0,0,0,0,
-0,255,255,255,127,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,224,127,0,0,0,192,255,255,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,192,63,252,255,63,0,0,128,3,0,0,0,0,0,0,254,3,32,0,0,0,0,0,0,0,
-0,0,0,0,0,24,0,15,0,0,0,0,0,56,0,0,0,0,0,0,0,0,0,225,63,0,232,254,255,31,0,0,
-0,0,0,0,0,96,63,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,6,0,0,0,0,0,0,0,0,0,
-24,0,32,0,0,192,31,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,68,
-248,0,104,0,0,0,0,0,0,0,0,0,0,0,0,76,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,128,255,255,255,0,0,0,0,0,0,0,0,0,0,0,0,128,14,0,0,0,255,
-31,0,0,0,0,0,0,0,0,192,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,8,0,252,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,14,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,252,7,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,24,128,255,0,0,0,0,0,
-0,0,0,0,0,223,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,62,0,0,252,255,31,3,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,52,0,0,0,0,0,0,0,0,0,128,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,128,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,255,255,255,
-255,3,
-128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,31,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,192,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,63,0,0,0,0,0,0,0,255,255,48,0,0,248,
-3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,
-255,255,7,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,176,15,0,0,0,0,0,0,
-0,0,0,0,0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,63,
-0,255,255,255,255,127,254,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,255,1,0,0,255,255,255,255,255,255,255,255,
-63,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,255,255,15,0,255,255,255,255,255,255,
-255,255,255,255,127,0,255,255,255,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,8,0,0,0,8,0,0,32,0,0,0,32,0,0,128,
-0,0,0,128,0,0,0,2,0,0,0,2,0,0,8,0,0,0,0,0,0,0,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,255,255,15,0,248,254,255,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,127,0,0,128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,240,0,
-128,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,128,255,127,0,0,0,0,0,0,0,
-0,0,0,0,0,0,112,7,0,192,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,254,255,255,255,255,255,255,255,31,0,0,0,0,0,0,0,0,0,254,255,
-255,255,255,255,255,63,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,0,255,255,255,255,255,
-15,255,255,255,255,255,255,255,255,255,255,255,255,15,0,255,127,254,255,254,
-255,254,255,255,255,63,0,255,31,255,255,255,255,0,0,0,252,0,0,0,28,0,0,0,252,
-255,255,255,31,0,0,0,0,0,0,192,255,255,255,7,0,255,255,255,255,255,15,255,1,3,
-0,63,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
-0,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,255,63,0,255,31,255,7,255,255,255,255,255,255,255,255,
-255,255,255,255,255,255,15,0,255,255,255,255,255,255,255,255,255,255,255,1,
-255,15,0,0,255,15,255,255,255,255,255,255,255,0,255,3,255,255,255,255,255,0,
-255,255,255,63,0,0,0,0,0,0,0,0,0,0,255,239,255,255,255,255,255,255,255,255,
-255,255,255,255,123,252,255,255,255,255,231,199,255,255,255,231,255,255,255,
-255,255,255,255,255,255,255,255,255,255,255,255,255,15,0,255,63,15,7,7,0,63,0,
-0,0,0,0,0,0,0,0,0,0,0,0,
--
2.51.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
2026-01-19 14:35 [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53% Xan Phung
@ 2026-01-19 19:54 ` Szabolcs Nagy
2026-01-19 21:25 ` Xan Phung
2026-01-20 15:09 ` Rich Felker
1 sibling, 1 reply; 9+ messages in thread
From: Szabolcs Nagy @ 2026-01-19 19:54 UTC (permalink / raw)
To: Xan Phung; +Cc: musl
* Xan Phung <xan.phung@gmail.com> [2026-01-20 01:35:26 +1100]:
> Currently iswalpha and iswpunct have total text data size of over
> 8kb. A more efficient encoding has reduced the total size to 3.75kb
> (2kb and 1.8kb respectively), a 53% reduction.
>
> The new encoding remains a (mostly) branchless table lookup, but now
> requires 3 memory accesses instead of 2. It remains optimized for
> random access decoding. The top level remains much the same,
> providing 8 bit offsets into codepage units (256 codepoint
> granularity). The second level data uses fixed sizes, of one 32 bit
> word per codepage (where each 2 bit pair in word identifies a block
> of 16 codepoints as all 0, all 1, or mixed). The third level is a
> variable length series of extension bytes, indexed by the popcount
> of set high bits within the second level's 32 bit word. This
> popcount is calculated with nearly same latency as a 32 bit multiply
> (so it is comparable with the indexing speed of accessing a 2D array
> of non power of 2 size).
nice.
>
> Results have been tested against first 0x20000 codepoints, and match
> that returned by the pre-existing musl implementation.
did you run benchmarks too? given that the new code
uses more operations it would be useful to know how
it performs.
note that decimal '255' is shorter than '0xff' so decimal
means a bit less data in the source code.
> --- a/src/ctype/iswalpha.c
> +++ b/src/ctype/iswalpha.c
> @@ -1,16 +1,54 @@
> #include <wctype.h>
>
> -static const unsigned char table[] = {
> -#include "alpha.h"
> +#define PAGE_SH 8
> +#define PAGE_MAX (1u << PAGE_SH)
> +#define PAGES (0x20000 / PAGE_MAX)
> +#define PAGEH (0x200/8 + PAGES) /* 0x200 direct mapped codepts */
> +
> +const static unsigned char table[PAGEH + 227*sizeof(unsigned)] = {
> +#include "iswalpha_table.h"
> };
>
> -int iswalpha(wint_t wc)
> -{
> - if (wc<0x20000U)
> - return (table[table[wc>>8]*32+((wc&255)>>3)]>>(wc&7))&1;
> - if (wc<0x2fffeU)
> - return 1;
> - return 0;
> +const static unsigned short dict[119] = {
> +#include "iswalpha_dict.h"
> +};
> +
> +int iswalpha(wint_t wc) {
> + unsigned *huffm = (unsigned *)(table + PAGEH), target;
> + unsigned page, shfr, lane, base, rev;
> + unsigned huff, type, popc, fast;
> + signed char ext;
> +
> + /* Uncommon codepoints, skipped by branch predictors */
> + if ((unsigned)wc >= 0x20000)
> + return (unsigned) wc < 0x2fffe;
> + if ((unsigned)wc-0xa00 < 0x200)
> + return table[PAGES + ((wc-0xa00)>>3)] >> (wc & 7) & 1;
> +
> + /* Three level lookup, final level index = popc^rev_direction */
> + target = wc & (PAGE_MAX-1);
> + page = wc >> PAGE_SH;
> + shfr = target & 15;
> + lane = target >> 4;
> + base = table[page];
> + huff = huffm[base];
> + base+= (rev = -(page & 1)) + 1;
> + type = (huff >> (2 * lane)) & 3;
> + popc = (huff << (31 - 2 * lane));
> + popc = (popc & 0x11111111) + ((popc & 0x44444444) >> 2);
> + popc = (popc * 0x11111111) >> 28;
> + ext = (table + PAGEH + base*4)[(int)(popc^rev)];
> +
> + /* Fast (1st) path precalcs shfr before ext loaded from mem */
> + /* Dictionary lookup slow (2nd) path is only 1% of codepoints */
> + fast = (type != 2 | ext & 1);
> + if (fast) {
> + shfr = (shfr + 5) & -(type >= 2);
> + shfr = (shfr - (6 & -(type == 2)) + type);
> + return (ext << 8 | 0xfe) >> shfr & 1;
> + } else {
> + return dict[ext >> 1 & 0x7f] >> shfr & 1;
> + }
> }
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
2026-01-19 19:54 ` Szabolcs Nagy
@ 2026-01-19 21:25 ` Xan Phung
2026-01-20 12:36 ` Szabolcs Nagy
0 siblings, 1 reply; 9+ messages in thread
From: Xan Phung @ 2026-01-19 21:25 UTC (permalink / raw)
To: Xan Phung, Openwall musl
[-- Attachment #1: Type: text/plain, Size: 6300 bytes --]
On Tue, 20 Jan 2026, 3:55 am Szabolcs Nagy, <nsz@port70.net> wrote:
> * Xan Phung <xan.phung@gmail.com> [2026-01-20 01:35:26 +1100]:
> > Currently iswalpha and iswpunct have total text data size of over
> > 8kb. A more efficient encoding has reduced the total size to 3.75kb
> > (2kb and 1.8kb respectively), a 53% reduction.
> >
> > The new encoding remains a (mostly) branchless table lookup, but now
> > requires 3 memory accesses instead of 2. It remains optimized for
> > random access decoding. The top level remains much the same,
> > providing 8 bit offsets into codepage units (256 codepoint
> > granularity). The second level data uses fixed sizes, of one 32 bit
> > word per codepage (where each 2 bit pair in word identifies a block
> > of 16 codepoints as all 0, all 1, or mixed). The third level is a
> > variable length series of extension bytes, indexed by the popcount
> > of set high bits within the second level's 32 bit word. This
> > popcount is calculated with nearly same latency as a 32 bit multiply
> > (so it is comparable with the indexing speed of accessing a 2D array
> > of non power of 2 size).
>
> nice.
>
> >
> > Results have been tested against first 0x20000 codepoints, and match
> > that returned by the pre-existing musl implementation.
>
> did you run benchmarks too? given that the new code
> uses more operations it would be useful to know how
> it performs.
New code uses more operations (that needs around 10 extra cpu cycles
compared to old code), but on the other hand has a much better caching
footprint. For example, a cache line can hold twice as many codepoints in
new code vs old code which means up to 50% reduction in cache misses (with
each L1 cache miss having a 20-200 cycle penalty).
I can add benchmarking output to the tool I use to generate and verify the
data file, but before I do that, I want to get input on:
(i) What the performance vs data size preference is. For example, by
expanding the one level direct bitmap (currently used in iswalpha only for
the difficult to compress codepoints in the 0xa00 to 0xc00 range) to a more
extended set of codepoints in range 0x0 to 0x1000, this adds around 300
bytes to data size to iswalpha, but will speed up access to these code
points (which contain the most important codepoints in general use) by up
to 2 levels of memory access, whilst still being a 40~45% size reduction
(compared to old code)
(ii) Whether a 64 bit word version would be accepted (currently I have used
a maximum data word size of 32 bits).
Using 64 bit word data size at the 2nd indexing level would have better
compression because it would enable larger 512 codepoint page sizes, and
hence the top level index would only be 256 bytes instead of 512 bytes, and
the net saving is up to 120-150 bytes after some losses due alignment
wastage etc.
A 64b word size also enables a potential performance optimisation, as the
3rd level data can be loaded in parallel with 2nd level data (effectively
collapsing the table hierarchy back to 2 levels), with the popcount then
being used as index for a 64b barrel shift, rather than as byte index for
3rd level memory load.
Hashing (hence potentially reducing memory access hierarchy to as little as
one load) is also viable at a 64b word size, but this comes in at approx
10% larger data size.
>
> note that decimal '255' is shorter than '0xff' so decimal
> means a bit less data in the source code.
>
Ha, ha, ha :) Not sure if you are joking, but when I was referring to the
'text size' I mean the compiled code and constant data size reported by the
'size' tool, not the source code size.
But if you are saying musl coding standards prefer decimal due to it's more
compact source code data size, then yes I can make decimal source files
instead of hexadecimal.
>
>
> > --- a/src/ctype/iswalpha.c
> > +++ b/src/ctype/iswalpha.c
> > @@ -1,16 +1,54 @@
> > #include <wctype.h>
> >
> > -static const unsigned char table[] = {
> > -#include "alpha.h"
> > +#define PAGE_SH 8
> > +#define PAGE_MAX (1u << PAGE_SH)
> > +#define PAGES (0x20000 / PAGE_MAX)
> > +#define PAGEH (0x200/8 + PAGES) /* 0x200 direct mapped codepts */
> > +
> > +const static unsigned char table[PAGEH + 227*sizeof(unsigned)] = {
> > +#include "iswalpha_table.h"
> > };
> >
> > -int iswalpha(wint_t wc)
> > -{
> > - if (wc<0x20000U)
> > - return (table[table[wc>>8]*32+((wc&255)>>3)]>>(wc&7))&1;
> > - if (wc<0x2fffeU)
> > - return 1;
> > - return 0;
> > +const static unsigned short dict[119] = {
> > +#include "iswalpha_dict.h"
> > +};
> > +
> > +int iswalpha(wint_t wc) {
> > + unsigned *huffm = (unsigned *)(table + PAGEH), target;
> > + unsigned page, shfr, lane, base, rev;
> > + unsigned huff, type, popc, fast;
> > + signed char ext;
> > +
> > + /* Uncommon codepoints, skipped by branch predictors */
> > + if ((unsigned)wc >= 0x20000)
> > + return (unsigned) wc < 0x2fffe;
> > + if ((unsigned)wc-0xa00 < 0x200)
> > + return table[PAGES + ((wc-0xa00)>>3)] >> (wc & 7) & 1;
> > +
> > + /* Three level lookup, final level index = popc^rev_direction */
> > + target = wc & (PAGE_MAX-1);
> > + page = wc >> PAGE_SH;
> > + shfr = target & 15;
> > + lane = target >> 4;
> > + base = table[page];
> > + huff = huffm[base];
> > + base+= (rev = -(page & 1)) + 1;
> > + type = (huff >> (2 * lane)) & 3;
> > + popc = (huff << (31 - 2 * lane));
> > + popc = (popc & 0x11111111) + ((popc & 0x44444444) >> 2);
> > + popc = (popc * 0x11111111) >> 28;
> > + ext = (table + PAGEH + base*4)[(int)(popc^rev)];
> > +
> > + /* Fast (1st) path precalcs shfr before ext loaded from mem */
> > + /* Dictionary lookup slow (2nd) path is only 1% of codepoints */
> > + fast = (type != 2 | ext & 1);
> > + if (fast) {
> > + shfr = (shfr + 5) & -(type >= 2);
> > + shfr = (shfr - (6 & -(type == 2)) + type);
> > + return (ext << 8 | 0xfe) >> shfr & 1;
> > + } else {
> > + return dict[ext >> 1 & 0x7f] >> shfr & 1;
> > + }
> > }
>
[-- Attachment #2: Type: text/html, Size: 8549 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
2026-01-19 21:25 ` Xan Phung
@ 2026-01-20 12:36 ` Szabolcs Nagy
2026-01-20 13:40 ` Xan Phung
0 siblings, 1 reply; 9+ messages in thread
From: Szabolcs Nagy @ 2026-01-20 12:36 UTC (permalink / raw)
To: Xan Phung; +Cc: Openwall musl
* Xan Phung <xan.phung@gmail.com> [2026-01-20 05:25:16 +0800]:
> On Tue, 20 Jan 2026, 3:55 am Szabolcs Nagy, <nsz@port70.net> wrote:
> > note that decimal '255' is shorter than '0xff' so decimal
> > means a bit less data in the source code.
> >
>
>
> Ha, ha, ha :) Not sure if you are joking, but when I was referring to the
> 'text size' I mean the compiled code and constant data size reported by the
> 'size' tool, not the source code size.
>
> But if you are saying musl coding standards prefer decimal due to it's more
> compact source code data size, then yes I can make decimal source files
> instead of hexadecimal.
this is a minor comment,
but the source code size matters too as musl is distributed in
source form too. decimal data likely deflate compresses better
so this affects git history size too. obviously for actual code
the readability is more important, but for data there is no
such constraint thus we can choose, so why use hex?
more importantly this code looks endian dependent:
> > > +const static unsigned char table[PAGEH + 227*sizeof(unsigned)] = {
> > > +#include "iswalpha_table.h"
> > > };
byte order matters as table data is not equal per 4 byte chunks:
...
+0xec,0xc7,0x3d,0xd6,0x18,0xc7,0xff,0xc3,0xc7,0x1d,0x81,0x00,0xc0,0xff,0x00,0x00,
+0x00,0x00,0x00,0x00,0x55,0x55,0x55,0x55,
> > > +int iswalpha(wint_t wc) {
> > > + unsigned *huffm = (unsigned *)(table + PAGEH), target;
deref of huffm is technically an aliasing violation and thus ub.
> > > + huff = huffm[base];
> > > + base+= (rev = -(page & 1)) + 1;
> > > + type = (huff >> (2 * lane)) & 3;
> > > + popc = (huff << (31 - 2 * lane));
in such cases i think the cleanest solution is
static const struct {
unsigned char tab1[PAGEH];
unsigned tab2[227];
} data = {...};
the struct helps on targets where computing the
address of a global takes multiple instructions.
(otherwise you could just use two separate tabs)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
2026-01-20 12:36 ` Szabolcs Nagy
@ 2026-01-20 13:40 ` Xan Phung
0 siblings, 0 replies; 9+ messages in thread
From: Xan Phung @ 2026-01-20 13:40 UTC (permalink / raw)
To: Xan Phung, Openwall musl
[-- Attachment #1: Type: text/plain, Size: 4337 bytes --]
On Tue, 20 Jan 2026 at 23:36, Szabolcs Nagy <nsz@port70.net> wrote:
> * Xan Phung <xan.phung@gmail.com> [2026-01-20 05:25:16 +0800]:
> > On Tue, 20 Jan 2026, 3:55 am Szabolcs Nagy, <nsz@port70.net> wrote:
> > > note that decimal '255' is shorter than '0xff' so decimal
> > > means a bit less data in the source code.
> > >
> >
> >
> > Ha, ha, ha :) Not sure if you are joking, but when I was referring to
> the
> > 'text size' I mean the compiled code and constant data size reported by
> the
> > 'size' tool, not the source code size.
> >
> > But if you are saying musl coding standards prefer decimal due to it's
> more
> > compact source code data size, then yes I can make decimal source files
> > instead of hexadecimal.
>
> this is a minor comment,
> but the source code size matters too as musl is distributed in
> source form too. decimal data likely deflate compresses better
> so this affects git history size too. obviously for actual code
> the readability is more important, but for data there is no
> such constraint thus we can choose, so why use hex?
>
>
Thanks for the clarification.
I am happy to do whichever number base best fits with musl's conventions.
> more importantly this code looks endian dependent:
>
> > > > +const static unsigned char table[PAGEH + 227*sizeof(unsigned)] = {
> > > > +#include "iswalpha_table.h"
> > > > };
>
> byte order matters as table data is not equal per 4 byte chunks:
>
>
Yes, good point.
What I can do is for the bytes that need to be grouped as words, I can
surround them with a macro.
The header will look something like:
WORD(0x55,0x25,0x55,0x55),0x46,0x01,0x3f,0x7f,0xd5,0x2d,0x00,0x00,
The macro WORD will permute the bytes as needed for the machine's
endianness.
(Or does musl already have a macro that does this?)
...
>
> +0xec,0xc7,0x3d,0xd6,0x18,0xc7,0xff,0xc3,0xc7,0x1d,0x81,0x00,0xc0,0xff,0x00,0x00,
> +0x00,0x00,0x00,0x00,0x55,0x55,0x55,0x55,
>
> > > > +int iswalpha(wint_t wc) {
> > > > + unsigned *huffm = (unsigned *)(table + PAGEH), target;
>
> deref of huffm is technically an aliasing violation and thus ub.
>
> > > > + huff = huffm[base];
> > > > + base+= (rev = -(page & 1)) + 1;
> > > > + type = (huff >> (2 * lane)) & 3;
> > > > + popc = (huff << (31 - 2 * lane));
>
> in such cases i think the cleanest solution is
>
> static const struct {
> unsigned char tab1[PAGEH];
> unsigned tab2[227];
> } data = {...};
>
the struct helps on targets where computing the
> address of a global takes multiple instructions.
> (otherwise you could just use two separate tabs)
>
Thanks. I will take up your suggestion.
If the WORD macro for word ordering of bytes above is acceptable to you, I
will go ahead and prepare a new patch.
This new patch will also contain an expansion of the iswalpha 'direct'
bitmap to cover the entire first 0x1000 codepoints (which will increase
data size by ~200 bytes, as my 1st patch only used it for a small 0x200
code point range).
This will be done in a highly performant way, such that for 220 out of 256
BMP code pages, the performance is *better* than the status quo musl code
(as it will only be a single level memory access). The remaining 36 BMP
code pages are 'niche' languages (eg. Unified Canadian Aboriginal, Vai,
Khmer, etc) or technical & symbology blocks. Such niche code pages are
very branch predictor friendly (eg. if a text string is in the Khmer
language, it is likely that branch prediction will rapidly retrain to
correctly predict not to take the 'direct' path). As a preview of how
performant the code is, the critical path involves (i) computing address &
index of table[] (3 cycles, including a conditional select), (ii) memory
load (3-4 cycles), (iii) shift by the amount '(wc & -direct & 7)', which
only depends on 'wc' and hence can be pre-calculated before load is
completed - so the shift + mask will only take 2 cycles, once data is
retrieved from memory:
/* Direct path used in all but 36 of the 256 BMP code pages */
direct = wc < 0x1000;
page = (wc >> PAGE_SH);
bmap = (wc >> 3) + PAGES;
base = table[direct ? bmap:page];
if (base <= 1 | direct)
return base >> (wc & -direct & 7) & 1;
/* 2nd and 3rd level array access are the same as before ... */
[-- Attachment #2: Type: text/html, Size: 7443 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
2026-01-19 14:35 [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53% Xan Phung
2026-01-19 19:54 ` Szabolcs Nagy
@ 2026-01-20 15:09 ` Rich Felker
2026-01-21 0:56 ` Demi Marie Obenour
1 sibling, 1 reply; 9+ messages in thread
From: Rich Felker @ 2026-01-20 15:09 UTC (permalink / raw)
To: Xan Phung; +Cc: musl
On Tue, Jan 20, 2026 at 01:35:26AM +1100, Xan Phung wrote:
> Currently iswalpha and iswpunct have total text data size of over
> 8kb. A more efficient encoding has reduced the total size to 3.75kb
> (2kb and 1.8kb respectively), a 53% reduction.
>
> The new encoding remains a (mostly) branchless table lookup, but now
> requires 3 memory accesses instead of 2. It remains optimized for
> random access decoding. The top level remains much the same,
> providing 8 bit offsets into codepage units (256 codepoint
>
> granularity). The second level data uses fixed sizes, of one 32 bit
> word per codepage (where each 2 bit pair in word identifies a block
> of 16 codepoints as all 0, all 1, or mixed). The third level is a
> variable length series of extension bytes, indexed by the popcount
> of set high bits within the second level's 32 bit word. This
> popcount is calculated with nearly same latency as a 32 bit multiply
> (so it is comparable with the indexing speed of accessing a 2D array
> of non power of 2 size).
>
> Results have been tested against first 0x20000 codepoints, and match
> that returned by the pre-existing musl implementation.
How are the tables generated? Do you have a patch to or replacement
for the musl-chartable-tools code that generates the existing tables?
It needs to be possible to update them easily for new Unicode
additions, and we need to be confident that the format can efficiently
represent any likely future additions.
> diff --git a/src/ctype/iswalpha.c b/src/ctype/iswalpha.c
> index 1c5485d..e3b7037 100644
> --- a/src/ctype/iswalpha.c
> +++ b/src/ctype/iswalpha.c
> @@ -1,16 +1,54 @@
> #include <wctype.h>
>
> -static const unsigned char table[] = {
> -#include "alpha.h"
> +#define PAGE_SH 8
> +#define PAGE_MAX (1u << PAGE_SH)
> +#define PAGES (0x20000 / PAGE_MAX)
> +#define PAGEH (0x200/8 + PAGES) /* 0x200 direct mapped codepts */
> +
> +const static unsigned char table[PAGEH + 227*sizeof(unsigned)] = {
> +#include "iswalpha_table.h"
> };
>
> -int iswalpha(wint_t wc)
> -{
> - if (wc<0x20000U)
> - return (table[table[wc>>8]*32+((wc&255)>>3)]>>(wc&7))&1;
> - if (wc<0x2fffeU)
> - return 1;
> - return 0;
> +const static unsigned short dict[119] = {
> +#include "iswalpha_dict.h"
> +};
> +
> +int iswalpha(wint_t wc) {
> + unsigned *huffm = (unsigned *)(table + PAGEH), target;
The aliasing violation here isn't acceptable. Is there any good way to
avoid it? I've only skimmed this submission so far; I assume the
motivation is doing the popcount across 4 bytes, but I don't yet
understand why popcount was chosen as the way of storing an index.
Rich
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
2026-01-20 15:09 ` Rich Felker
@ 2026-01-21 0:56 ` Demi Marie Obenour
2026-01-21 1:11 ` Charles Banas
2026-01-21 2:31 ` Rich Felker
0 siblings, 2 replies; 9+ messages in thread
From: Demi Marie Obenour @ 2026-01-21 0:56 UTC (permalink / raw)
To: musl, Rich Felker, Xan Phung
[-- Attachment #1.1.1: Type: text/plain, Size: 602 bytes --]
On 1/20/26 10:09, Rich Felker wrote:
> On Tue, Jan 20, 2026 at 01:35:26AM +1100, Xan Phung wrote:
(snip)
>> +
>> +int iswalpha(wint_t wc) {
>> + unsigned *huffm = (unsigned *)(table + PAGEH), target;
>
> The aliasing violation here isn't acceptable. Is there any good way to
> avoid it? I've only skimmed this submission so far; I assume the
> motivation is doing the popcount across 4 bytes, but I don't yet
> understand why popcount was chosen as the way of storing an index.
Do compilers optimize using memcpy() for type-punning?
--
Sincerely,
Demi Marie Obenour (she/her/hers)
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7253 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
2026-01-21 0:56 ` Demi Marie Obenour
@ 2026-01-21 1:11 ` Charles Banas
2026-01-21 2:31 ` Rich Felker
1 sibling, 0 replies; 9+ messages in thread
From: Charles Banas @ 2026-01-21 1:11 UTC (permalink / raw)
To: musl; +Cc: Rich Felker, Xan Phung
On Tue, Jan 20, 2026 at 5:03 PM Demi Marie Obenour
<demiobenour@gmail.com> wrote:
>
> On 1/20/26 10:09, Rich Felker wrote:
> > On Tue, Jan 20, 2026 at 01:35:26AM +1100, Xan Phung wrote:
>
> (snip)
>
> >> +
> >> +int iswalpha(wint_t wc) {
> >> + unsigned *huffm = (unsigned *)(table + PAGEH), target;
> >
> > The aliasing violation here isn't acceptable. Is there any good way to
> > avoid it? I've only skimmed this submission so far; I assume the
> > motivation is doing the popcount across 4 bytes, but I don't yet
> > understand why popcount was chosen as the way of storing an index.
>
> Do compilers optimize using memcpy() for type-punning?
Yes, they do. All of the ones I've tested elide the memcpy() call
entirely in type-punning cases.
> --
> Sincerely,
> Demi Marie Obenour (she/her/hers)
--
Charles Banas
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53%
2026-01-21 0:56 ` Demi Marie Obenour
2026-01-21 1:11 ` Charles Banas
@ 2026-01-21 2:31 ` Rich Felker
1 sibling, 0 replies; 9+ messages in thread
From: Rich Felker @ 2026-01-21 2:31 UTC (permalink / raw)
To: Demi Marie Obenour; +Cc: musl, Xan Phung
On Tue, Jan 20, 2026 at 07:56:05PM -0500, Demi Marie Obenour wrote:
> On 1/20/26 10:09, Rich Felker wrote:
> > On Tue, Jan 20, 2026 at 01:35:26AM +1100, Xan Phung wrote:
>
> (snip)
>
> >> +
> >> +int iswalpha(wint_t wc) {
> >> + unsigned *huffm = (unsigned *)(table + PAGEH), target;
> >
> > The aliasing violation here isn't acceptable. Is there any good way to
> > avoid it? I've only skimmed this submission so far; I assume the
> > motivation is doing the popcount across 4 bytes, but I don't yet
> > understand why popcount was chosen as the way of storing an index.
>
> Do compilers optimize using memcpy() for type-punning?
Yes, except that we've suppressed that with -ffreestanding which
implies -fno-builtin. To get it back we'd need to make a musl-internal
way that memcpy calls get expanded to __builtin_memcpy. Which has been
kinda on the wishlist for a while, but it does require some care to
suppress it in certain TUs where it could result in circular
definitions.
Rich
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-01-21 2:31 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-19 14:35 [musl][PATCH v1] wctype: reduce text size of iswalpha & iswpunct by 53% Xan Phung
2026-01-19 19:54 ` Szabolcs Nagy
2026-01-19 21:25 ` Xan Phung
2026-01-20 12:36 ` Szabolcs Nagy
2026-01-20 13:40 ` Xan Phung
2026-01-20 15:09 ` Rich Felker
2026-01-21 0:56 ` Demi Marie Obenour
2026-01-21 1:11 ` Charles Banas
2026-01-21 2:31 ` Rich Felker
Code repositories for project(s) associated with this public inbox
https://git.vuxu.org/mirror/musl/
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).