From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Thu, 30 Jun 2011 19:00:48 +0200 From: tlaronde@polynum.com To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-ID: <20110630170048.GA999@polynum.com> References: <20110627114856.GA7099@polynum.com> <9308c52f360f6274e0730399741278ce@ladd.quanstro.net> <20110627172006.GA497@polynum.com> <4E08DDDE.94AB.00CC.0@wlu.ca> <20110628111915.GA498@polynum.com> <4E0B804C.94AB.00CC.0@wlu.ca> <20110630130254.GA7276@polynum.com> <4E0C5549.94AB.00CC.0@wlu.ca> <20110630162524.GA442@polynum.com> <59e4d419ba69189bc467a330651c7044@ladd.quanstro.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <59e4d419ba69189bc467a330651c7044@ladd.quanstro.net> User-Agent: Mutt/1.4.2.3i Subject: Re: [9fans] [RFC] fonts and unicode/utf [TeX] Topicbox-Message-UUID: f8adaeda-ead6-11e9-9d60-3106f5b1d025 On Thu, Jun 30, 2011 at 12:31:17PM -0400, erik quanstrom wrote: > > I don't despise XeTeX. Nor Unicode. And I will take Unicode as is. But I > > will take TeX conventions as is too, since I'm working on TeX, and not > > another formatting system; since these conventions are confined to the > > ASCII subrange and only diverging from ASCII for the not glyph > > positions. I still fail to see what's the big deal? > > you can't have it both ways. you can't at the same time say tex is > only defined for ascii, so utf-8 is a non sequitor, and at the same time > put out a version of tex that takes latin1 input. No, this is an error you and others are making. There is a distinction between the encoding input (for the moment TeX expect only 8 bits), and some conventions in the font organization. The Computer Modern fonts provide ASCII "visible" characters (glyphes) in the ASCII positions. But they are other positions in the 0-127 range that are free. These positions are used "internally" by the plain TeX conventions (TeX is the compiler/interpreter; tex(1) is the interpreter having loaded a special set of conventions, the ones of plain TeX; one can do almost totally without or totally differently). These free (as far as a font is concerned) positions are filled with non ASCII characters/glyphes. For example, in the text font layout, the 0x1a position has the glyphe for the \ae. If a user, using plain TeX, specifies \ae, the TFM constructed will give the correct metrics for the glyph, and the dvi driver will put the correct glyph. This does not preclude the user from directly entering the unicode codepoint: in the TFM, if you want, the glyph information is duplicated, in the conventional plain TeX position, and as a literal in the unicode position. In this case, the plain TeX convention is accessed whether by the \ae char definition, the 0x1a code (ASCII control "sub"), or the 0x00e6 unicode. This is not the input encoding; this is a font mapping. -- Thierry Laronde http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C