From mboxrd@z Thu Jan  1 00:00:00 1970
Message-ID: <12bfb211154a2ebb15ff7e46a375c74d@quanstro.net>
Date: Fri, 19 May 2006 10:11:26 -0500
From: quanstro@quanstro.net
To: 9fans@cse.psu.edu
Subject: Re: [9fans] combining characters
In-Reply-To: <c349ca3fdc72f917b195e8fd2113eb78@vitanuova.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Topicbox-Message-UUID: 52096c7c-ead1-11e9-9d60-3106f5b1d025

On Fri May 19 09:38:23 CDT 2006, rog@vitanuova.com wrote:
> perhaps there are actually two problems here:
> 1) how to get libdraw to map back from a sequence of combining characte=
rs
> to a character in the font that represents that sequence.

this is pretty easy.  the unicode standards provides cannonical compositi=
ons.
i think it would be easier for libdraw to insist that string be given str=
ings that
have been cannonicaly composed.  perhaps a job for tcs.

> 2) how to draw sequences of combining characters that don't exist in pr=
ecombined
> form within unicode. it's quite possible that one might wish to provide
> pre-rendered glyphs for some of these sequences - the current font form=
at
> can't deal with that.

the general case doesn't seem like it would yield a solution with a bitma=
p font.
sure you could put a circumflex on an "a".  but what about dashed letters=
 like
=C5=82?  drawing a dash through an arbitrary character gets to be a real =
pain. =20

the good news is that solving #1 would take care of most problems.  unfor=
tunately,
some romanized versions of russian and vietnamise (i believe) would still=
 not work.
but we would get 80% of what we would like without the pain of trying to =
treat
a bitmap as if they were vector character descriptions a la metafont.

>=20
> another issue is dealing with code (e.g. libframe) that assumes that
> characters do not overstrike - i.e. that there's a 1-1 correspondence
> between Runes and glyphs.

charofpt would be a problem.  there would be some problems with picking a=
 proper
endpoint for highlighting.  a break between the base and the combiners wo=
uld
be a problem.  i think the largest problem here would be dealing with the=
 character
height.  currently in libdraw a character's height is the font's height. =
this isn't true
for many fonts we already have -- =C3=84=C3=96=C3=9C=E2=98=BA tend to get=
 clipped with pelm because they are=20
taller than the font file claims. just expanding the height of the font w=
ould look pretty=20
funny in the absence of taller characters.

> yet another is how one should deal with character-based indexing, for i=
nstance
> indexing in sam expressions - does /=C3=A9/-#0+#1 point to the characte=
r after
> the unadorned e, or after the whole sequence?

thair be dragons here.  the library of congress has a 100-page manual on =
alphebetization
of languages with roman letters.  different languages have different rule=
s (sometimes for the=20
same codepoint); a language sometimes has different rules for different c=
odepoints.
then there are ligatures.  in german ss and =C3=9F are sorted the same.=20

there are probablly only two sensible ways to deal with this.  either str=
ip/do not strip
all combiners and do a naive sort or define some sort of locale.

> it'd be nice to sort this issue out properly; surely it shouldn't be
> too hard?

i believe this is another entry for the "famous lies list," ranking somew=
hat
below "check's in the mail" and above "i have this friend who...."

- erik