From mboxrd@z Thu Jan 1 00:00:00 1970 Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable To: 9fans@cse.psu.edu From: erik quanstrom Message-Id: <20050902043503.A768BB3C4D@dexter-peak.quanstro.net> Date: Thu, 1 Sep 2005 23:35:03 -0500 Subject: [9fans] runetype.c Topicbox-Message-UUID: 82d46830-ead0-11e9-9d60-3106f5b1d025 unicode sure gets wierd if you start looking too closely. i'm curious about a few lines in __alpharune2[] and __toupper2[] for example 0x3260, 0x327b, /* ㉠ - ㉻ */ and (from __toupper2[]) 0x24d0, 0x24e9, 474, /* ⓐ-ⓩ Ⓐ-Ⓩ */ and 0x3371, 0x3376, /* ㍱ - ㍶ */ however, from UnicodeData.txt (version 4.1), all of these are classified as symbols. so, as far as i can tell, we either: (a) declare anything with an upper/lower case a letter, never mind the unicode classification. this makes 1 == isalpharune(ⓐ) while 0 == isalpharune(㉻), however. (b) go with the flow. anything that unicode says is a symbol is not a letter, even if some symbols have an uppercase. (c) (this appears to be what the current table is doing) anything that either is a letter or is composed as " L+" where L+ are 1..N letter symbols is considered to be a letter. any thoughts?