From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: Date: Fri, 19 May 2006 11:16:10 -0500 From: quanstro@quanstro.net To: 9fans@cse.psu.edu Subject: Re: [9fans] combining characters In-Reply-To: <12bfb211154a2ebb15ff7e46a375c74d@quanstro.net> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Topicbox-Message-UUID: 520fcf4a-ead1-11e9-9d60-3106f5b1d025 >> yet another is how one should deal with character-based indexing, for = instance >> indexing in sam expressions - does /=C3=A9/-#0+#1 point to the charact= er after >> the unadorned e, or after the whole sequence? >thair be dragons here. the library of congress has a 100-page manual on= alphebetization >of languages with roman letters. different languages have different rul= es (sometimes for the=20 >same codepoint); a language sometimes has different rules for different = codepoints. >then there are ligatures. in german ss and =C3=9F are sorted the same.=20 uff. this answer doesn't fit the question. i think base+combiner* shoul= d be treated as=20 an indivisible character. but again, if we use cannonical compositions, = this case can be avoided except in cases where the character can't be drawn anyway. - erik