9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: Steffen "Daode" Nurpmeso <sdaoden@gmail.com>
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Subject: [9fans] Character case mappings
Date: Mon, 24 Jun 2013 15:15:03 +0200	[thread overview]
Message-ID: <20130624141503.pffQxijUoC6mzgT/cF2fnZTk@dietcurd.local> (raw)

'Thing is; i'm writing a Unicode aware library for ISO C99 aware
environments (*earliest* alpha state) and at the moment i use
binary searches (i only have display-widths and simple case
mappings right now).  For combined upper/lower case mappings i do
end up with

  static struct _casemap {
    uint32_t start;      /* First code point */
    uint32_t accu  : 16; /* Relative distance to mapping */
    _Bool isneg    : 1;  /* Accu must be subtracted */
    _Bool isup     : 1;  /* Code point is uppercase */
    _Bool islull   : 1;  /* Is Lu/Ll range (.accu = range start & 1) */
    _Bool isemap   : 1;  /* Has a one-to-many mapping */
    uint32_t count : 12; /* Number of entries in this range */
  } const _casemaps[] = {
    {0x000041,    32, 0,1,0,0, 26},
    ...
    {0x010428,    40, 1,0,0,0, 40},
  }; /* 250 entries */

that can be accessed via

  static struct _casemap const *
  _find_casemap(uint32_t codep)
  {
    struct _casemap const *cme = _casemaps, *dp;
    uint32_t min = 0, max = ARRAYCOUNT(_casemaps) - 1;

    if (codep >= cme[min].start && codep < cme[max].start + cme[max].count)
      do {
        uint32_t mid = (min + max) >> 1,
          s = (dp = cme + mid)->start;
        if (codep < s)
          max = --mid;
        else if (codep >= s + dp->count)
          min = ++mid;
        else {
          cme += mid;
          goto jleave;
        }
      } while (max >= min);
    cme = NULL;
  jleave:
    return cme;
  }

  uint32_t
  sud_simple_tolower(uint32_t codep)
  {
    struct _casemap const *cme = _find_casemap(codep);

    if (cme == NULL)
      ;
    else if (! cme->islull) {
      if (cme->isup)
        codep = cme->isneg ? codep - cme->accu : codep + cme->accu;
    } else if ((codep & 1) == cme->accu)
      ++codep;
    return codep;
  }

  uint32_t
  sud_simple_toupper(uint32_t codep)
  {
    struct _casemap const *cme = _find_casemap(codep);

    if (cme == NULL)
      ;
    else if (! cme->islull) {
      if (! cme->isup)
        codep = cme->isneg ? codep - cme->accu : codep + cme->accu;
    } else if ((codep & 1) != cme->accu)
      --codep;
    return codep;
  }

My S-CText (on <sourceforge DOT net SLASH p SLASH s-ctext SLASH
code SLASH>) tests all 0x10FFFF code points correct with the
above.  Now when i look at the sys/src/libc/port/runetype.c (of
plan9front) then i think this one is generated, but i cannot find
the creating script or program, which would be of interest to me.
And maybe Plan9 would be interested to see the above patched into
that, at some later time. ?
Thank you and ciao,

--steffen



             reply	other threads:[~2013-06-24 13:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 13:15 Steffen Daode Nurpmeso [this message]
2013-06-24 15:11 ` erik quanstrom
2013-06-24 20:25   ` Steffen Daode Nurpmeso
2013-06-24 20:59     ` erik quanstrom
2013-06-25 12:11       ` Steffen Daode Nurpmeso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130624141503.pffQxijUoC6mzgT/cF2fnZTk@dietcurd.local \
    --to=sdaoden@gmail.com \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).