From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Mon, 24 Jun 2013 15:15:03 +0200 From: Steffen "Daode" Nurpmeso To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-ID: <20130624141503.pffQxijUoC6mzgT/cF2fnZTk@dietcurd.local> User-Agent: s-nail s-nail-14.3.2-16-gbd96d22 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: [9fans] Character case mappings Topicbox-Message-UUID: 6800ad18-ead8-11e9-9d60-3106f5b1d025 'Thing is; i'm writing a Unicode aware library for ISO C99 aware environments (*earliest* alpha state) and at the moment i use binary searches (i only have display-widths and simple case mappings right now). For combined upper/lower case mappings i do end up with static struct _casemap { uint32_t start; /* First code point */ uint32_t accu : 16; /* Relative distance to mapping */ _Bool isneg : 1; /* Accu must be subtracted */ _Bool isup : 1; /* Code point is uppercase */ _Bool islull : 1; /* Is Lu/Ll range (.accu = range start & 1) */ _Bool isemap : 1; /* Has a one-to-many mapping */ uint32_t count : 12; /* Number of entries in this range */ } const _casemaps[] = { {0x000041, 32, 0,1,0,0, 26}, ... {0x010428, 40, 1,0,0,0, 40}, }; /* 250 entries */ that can be accessed via static struct _casemap const * _find_casemap(uint32_t codep) { struct _casemap const *cme = _casemaps, *dp; uint32_t min = 0, max = ARRAYCOUNT(_casemaps) - 1; if (codep >= cme[min].start && codep < cme[max].start + cme[max].count) do { uint32_t mid = (min + max) >> 1, s = (dp = cme + mid)->start; if (codep < s) max = --mid; else if (codep >= s + dp->count) min = ++mid; else { cme += mid; goto jleave; } } while (max >= min); cme = NULL; jleave: return cme; } uint32_t sud_simple_tolower(uint32_t codep) { struct _casemap const *cme = _find_casemap(codep); if (cme == NULL) ; else if (! cme->islull) { if (cme->isup) codep = cme->isneg ? codep - cme->accu : codep + cme->accu; } else if ((codep & 1) == cme->accu) ++codep; return codep; } uint32_t sud_simple_toupper(uint32_t codep) { struct _casemap const *cme = _find_casemap(codep); if (cme == NULL) ; else if (! cme->islull) { if (! cme->isup) codep = cme->isneg ? codep - cme->accu : codep + cme->accu; } else if ((codep & 1) != cme->accu) --codep; return codep; } My S-CText (on ) tests all 0x10FFFF code points correct with the above. Now when i look at the sys/src/libc/port/runetype.c (of plan9front) then i think this one is generated, but i cannot find the creating script or program, which would be of interest to me. And maybe Plan9 would be interested to see the above patched into that, at some later time. ? Thank you and ciao, --steffen