9fans - fans of the OS Plan 9 from Bell Labs
 help / color / mirror / Atom feed
From: erik quanstrom <quanstro@quanstro.net>
To: 9fans@9fans.net
Subject: Re: [9fans] Character case mappings
Date: Mon, 24 Jun 2013 11:11:42 -0400	[thread overview]
Message-ID: <34723d59b4618c0a19b67299d8c27dc6@ladd.quanstro.net> (raw)
In-Reply-To: <20130624141503.pffQxijUoC6mzgT/cF2fnZTk@dietcurd.local>

> My S-CText (on <sourceforge DOT net SLASH p SLASH s-ctext SLASH
> code SLASH>) tests all 0x10FFFF code points correct with the
> above.  Now when i look at the sys/src/libc/port/runetype.c (of
> plan9front) then i think this one is generated, but i cannot find
> the creating script or program, which would be of interest to me.
> And maybe Plan9 would be interested to see the above patched into
> that, at some later time. ?
> Thank you and ciao,

that's close to the approach taken, except since one needs
a fresh table for each sorting if one hopes to do a binary search,
simple tables of (various width) integers were made.  it was also
noted that bursting the tables at the junction of the basic and
extended plans was possible in many cases.

for example, for decompositions if r is a precombined form,
and r is in the basic frame then for r = r' + c, r' and c are both
in the basic plane.  thus we can burst this table, and put
basic plane mappings (1000 of them) in a more compact table
that doesn't use vlongs.  the extended plane table is tiny
(18 entries).  it's only worth using a binary search for symmetry.

static
uint	__decompose2[] =
{
	0x00c0,	0x00410300,	 /* À -> A 0300 */
[... 998 entries skipped ... ]
	0xfb4e,	0x05e405bf,	 /* פֿ -> פ 05bf */
}

static
uvlong	__decompose264[] =
{
	0x1109a,	0x11099110baull,	 /* 𑂚 -> 𑂙 + 110ba */
[... 16 entries skipped ...]
	0x1d1c0,	0x1d1bc1d16full,	 /* 𝆺𝅥𝅯 -> 𝆺𝅥 + 1d16f */
};

static uint*
bsearch32(uint c, uint *t, int n, int ne)
{
	uint *p;
	int m;

	while(n > 1) {
		m = n/2;
		p = t + m*ne;
		if(c >= p[0]) {
			t = p;
			n = n-m;
		} else
			n = m;
	}
	if(n && c == t[0])
		return t;
	return 0;
}

[bsearch64 omitted]

int
runedecompose(Rune a, Rune *d)
{
	uint *p;
	uvlong *q;

	if(a <= 0xffff){
		p = bsearch32(a, __decompose2, nelem(__decompose2)/2, 2);
		if(p){
			d[0] = p[1] >> 16;
			d[1] = p[1] & 0xffff;
			return 0;
		}
	}else{
		q = bsearch64(a, __decompose264, nelem(__decompose264)/2, 2);
		if(q){
			d[0] = q[1] >> 32;
			d[1] = q[1] & 0xfffffff;
			return 0;
		}
	}
	return -1;
}

all the other rune tables work this way.  there is one
table per property.  having a structure doesn't fit the
current programming interface, nor usage.

- erik



  reply	other threads:[~2013-06-24 15:11 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-24 13:15 Steffen Daode Nurpmeso
2013-06-24 15:11 ` erik quanstrom [this message]
2013-06-24 20:25   ` Steffen Daode Nurpmeso
2013-06-24 20:59     ` erik quanstrom
2013-06-25 12:11       ` Steffen Daode Nurpmeso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=34723d59b4618c0a19b67299d8c27dc6@ladd.quanstro.net \
    --to=quanstro@quanstro.net \
    --cc=9fans@9fans.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).