From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 16 Jun 2011 14:17:00 +0200 From: tlaronde@polynum.com To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-ID: <20110616121700.GA9131@polynum.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.3i Subject: [9fans] [RFC] fonts and unicode/utf [TeX] Topicbox-Message-UUID: effb380c-ead6-11e9-9d60-3106f5b1d025 Hello, I'm currently exploring, for kerTeX, the area I have the least knowledge till now: fonts. It seems that the TeX community has spent a huge amount of time, and produced a huge amount of tricks to try to use fonts that have glyphes the Computer Modern have not, specially accented letters. In 1990, D.E. Knuth wrote an article to explain the correct (both simpler and most versatile) solution: virtual fonts. But since people have spent a huge amount of time, it seems furthermore that they were reluctant to throw everything away, and we are still dragging tons of data and struggling with puzzling tricks just because of human nature... So, trying to give the easiest solution for now, and trying to think about what can be done to use Plan9 established simplest way: utf, I'm on the following tracks about this. Adobe has published the AFM for the 35 standard base fonts for PostScript (the fonts that are resident in a PS printer). Starting from these AFM, kerTeX will produce the corresponding TFM, plus a virtual font. A virtual font can combine several distinct fonts, and furthermore can map glyphes. Since TeX uses (for now) its input as a stream of octets, the deal is to map this input encoding to the correct glyphes. One of the great feature of the AFM is that a glyphe is described by an ascii litteral name. The position of the glyphe, its index, in the font is not of a great concern: the virtual font can take care of the mapping (while the use directly of TFM will take the input encoding as the index). I have so extended the encoding used to generate the virtual fonts so that for the ASCII range it matches the Computer Modern expectations (hence it is totally compatible with plain TeX), and so that the latin1 encoding used as input will give the correct glyphes. And the cryptic names will be gone, because loading the (virtual) font will be defined by calling latin1/the_font. Why latin1? Not only because, as a French, I use it, but because it is compatible with unicode. First question: any feelings about this? Second question: I'm trying to find if, in western languages, including ligatures for ae and oe would be good since it is generally needed (one can forbid ligatures by inserting "{}" between the letters), or if it's not correct to set this by default for fonts (having the glyphes) since some western languages use generally the ae or oe combinations without knowing or expecting the substitution. A futur step can be made in the following direction: TeX is not limited to octet character, since for math, it uses indeed positive wydes (2^15). The code is always mapped to [0..255], but the whole number is used to switch between fonts (to simplify: see math mode, \fam and so on). Something like that could be done in the future, to use a TeX file directly, encoded in utf, using the rune to select fonts or subfonts. Cheers, -- Thierry Laronde http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C