From mboxrd@z Thu Jan  1 00:00:00 1970
Date: Thu, 16 Jun 2011 14:17:00 +0200
From: tlaronde@polynum.com
To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net>
Message-ID: <20110616121700.GA9131@polynum.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.3i
Subject: [9fans] [RFC] fonts and unicode/utf [TeX]
Topicbox-Message-UUID: effb380c-ead6-11e9-9d60-3106f5b1d025

Hello,

I'm currently exploring, for kerTeX, the area I have the least knowledge
till now: fonts.

It seems that the TeX community has spent a huge amount of time, and
produced a huge amount of tricks to try to use fonts that have glyphes
the Computer Modern have not, specially accented letters.

In 1990, D.E. Knuth wrote an article to explain the correct (both
simpler and most versatile) solution: virtual fonts.

But since people have spent a huge amount of time, it seems furthermore
that they were reluctant to throw everything away, and we are still
dragging tons of data and struggling with puzzling tricks just because
of human nature...

So, trying to give the easiest solution for now, and trying to think
about what can be done to use Plan9 established simplest way: utf, I'm
on the following tracks about this.

Adobe has published the AFM for the 35 standard base fonts for
PostScript (the fonts that are resident in a PS printer). Starting from
these AFM, kerTeX will produce the corresponding TFM, plus a virtual
font.

A virtual font can combine several distinct fonts, and furthermore can
map glyphes. Since TeX uses (for now) its input as a stream of octets,
the deal is to map this input encoding to the correct glyphes. One of
the great feature of the AFM is that a glyphe is described by an ascii
litteral name. The position of the glyphe, its index, in the font is not
of a great concern: the virtual font can take care of the mapping (while
the use directly of TFM will take the input encoding as the index).

I have so extended the encoding used to generate the virtual fonts so
that for the ASCII range it matches the Computer Modern expectations
(hence it is totally compatible with plain TeX), and so that the latin1
encoding used as input will give the correct glyphes. And the cryptic
names will be gone, because loading the (virtual) font will be defined
by calling latin1/the_font.

Why latin1? Not only because, as a French, I use it, but because it is
compatible with unicode.

First question: any feelings about this?

Second question: I'm trying to find if, in western languages, including
ligatures for ae and oe would be good since it is generally needed (one
can forbid ligatures by inserting "{}" between the letters), or if it's
not correct to set this by default for fonts (having the glyphes) since
some western languages use generally the ae or oe combinations without
knowing or expecting the substitution.

A futur step can be made in the following direction:

TeX is not limited to octet character, since for math, it uses indeed
positive wydes (2^15). The code is always mapped to [0..255], but the
whole number is used to switch between fonts (to simplify: see math
mode, \fam and so on).

Something like that could be done in the future, to use a TeX file
directly, encoded in utf, using the rune to select fonts or subfonts.

Cheers,
--
        Thierry Laronde <tlaronde +AT+ polynum +dot+ com>
                      http://www.kergis.com/
Key fingerprint = 0FF7 E906 FBAF FE95 FD89  250D 52B1 AE95 6006 F40C