From mboxrd@z Thu Jan  1 00:00:00 1970
Mime-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
To: 9fans@cse.psu.edu
From: erik quanstrom <quanstro@quanstro.net>
Message-Id: <20050902043503.A768BB3C4D@dexter-peak.quanstro.net>
Date: Thu,  1 Sep 2005 23:35:03 -0500
Subject: [9fans] runetype.c
Topicbox-Message-UUID: 82d46830-ead0-11e9-9d60-3106f5b1d025

unicode sure gets wierd if you start looking too closely.

i'm curious about a few lines in __alpharune2[] and __toupper2[] for example

	0x3260,	0x327b,	/* ㉠ - ㉻ */

and (from __toupper2[])

	0x24d0,	0x24e9, 474,	/* ⓐ-ⓩ Ⓐ-Ⓩ */

and

	0x3371,	0x3376,	/* ㍱ - ㍶ */

however, from UnicodeData.txt (version 4.1), all of these are classified as symbols.

so, as far as i can tell, we either:

(a) declare anything with an upper/lower case a letter, never mind the unicode classification.
this makes 1 == isalpharune(ⓐ)  while 0 == isalpharune(㉻), however.

(b) go with the flow. anything that unicode says is a symbol is not a letter, even if some
symbols have an uppercase. 

(c) (this appears to be what the current table is doing) anything that either is a letter
or is composed as "<operator> L+" where L+ are 1..N letter symbols is considered to be a letter.


any thoughts?