From mboxrd@z Thu Jan 1 00:00:00 1970 Message-ID: <12bfb211154a2ebb15ff7e46a375c74d@quanstro.net> Date: Fri, 19 May 2006 10:11:26 -0500 From: quanstro@quanstro.net To: 9fans@cse.psu.edu Subject: Re: [9fans] combining characters In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Topicbox-Message-UUID: 52096c7c-ead1-11e9-9d60-3106f5b1d025 On Fri May 19 09:38:23 CDT 2006, rog@vitanuova.com wrote: > perhaps there are actually two problems here: > 1) how to get libdraw to map back from a sequence of combining characte= rs > to a character in the font that represents that sequence. this is pretty easy. the unicode standards provides cannonical compositi= ons. i think it would be easier for libdraw to insist that string be given str= ings that have been cannonicaly composed. perhaps a job for tcs. > 2) how to draw sequences of combining characters that don't exist in pr= ecombined > form within unicode. it's quite possible that one might wish to provide > pre-rendered glyphs for some of these sequences - the current font form= at > can't deal with that. the general case doesn't seem like it would yield a solution with a bitma= p font. sure you could put a circumflex on an "a". but what about dashed letters= like =C5=82? drawing a dash through an arbitrary character gets to be a real = pain. =20 the good news is that solving #1 would take care of most problems. unfor= tunately, some romanized versions of russian and vietnamise (i believe) would still= not work. but we would get 80% of what we would like without the pain of trying to = treat a bitmap as if they were vector character descriptions a la metafont. >=20 > another issue is dealing with code (e.g. libframe) that assumes that > characters do not overstrike - i.e. that there's a 1-1 correspondence > between Runes and glyphs. charofpt would be a problem. there would be some problems with picking a= proper endpoint for highlighting. a break between the base and the combiners wo= uld be a problem. i think the largest problem here would be dealing with the= character height. currently in libdraw a character's height is the font's height. = this isn't true for many fonts we already have -- =C3=84=C3=96=C3=9C=E2=98=BA tend to get= clipped with pelm because they are=20 taller than the font file claims. just expanding the height of the font w= ould look pretty=20 funny in the absence of taller characters. > yet another is how one should deal with character-based indexing, for i= nstance > indexing in sam expressions - does /=C3=A9/-#0+#1 point to the characte= r after > the unadorned e, or after the whole sequence? thair be dragons here. the library of congress has a 100-page manual on = alphebetization of languages with roman letters. different languages have different rule= s (sometimes for the=20 same codepoint); a language sometimes has different rules for different c= odepoints. then there are ligatures. in german ss and =C3=9F are sorted the same.=20 there are probablly only two sensible ways to deal with this. either str= ip/do not strip all combiners and do a naive sort or define some sort of locale. > it'd be nice to sort this issue out properly; surely it shouldn't be > too hard? i believe this is another entry for the "famous lies list," ranking somew= hat below "check's in the mail" and above "i have this friend who...." - erik