From mboxrd@z Thu Jan 1 00:00:00 1970 Date: Sat, 25 Jun 2011 08:50:17 +0200 From: tlaronde@polynum.com To: Fans of the OS Plan 9 from Bell Labs <9fans@9fans.net> Message-ID: <20110625065017.GA638@polynum.com> References: <20110616121700.GA9131@polynum.com> <9556bc097d90b774c37c16af5a7c20eb@brasstown.quanstro.net> <20110619163458.GA424@polynum.com> <3c7e401c771bdd0d9bd8950ceb60eb9e@ladd.quanstro.net> <20110620111845.GA540@polynum.com> <76aac2169637c7af09dcd0b368aa0c7a@ladd.quanstro.net> <20110621105626.GA536@polynum.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Subject: Re: [9fans] [RFC] fonts and unicode/utf [TeX] Topicbox-Message-UUID: f5a5c8c6-ead6-11e9-9d60-3106f5b1d025 On Fri, Jun 24, 2011 at 11:05:23PM +0000, Mauricio CA wrote: > > I found this text in TeX by Topic[1] that seems to support Quanstrom's > idea. It describes how TeX reads input, and says it's done one line at > a time (where it follows what the system defines as lines) and then for > each line it first removes trailing spaces; then (possibly) ads a return > to the end of the line; and then, since "computers may also differ in > the character encoding (the most common schemes are ASCII and EBCDIC), > so TeX converts the characters that are read from the file to its own > character codes. These codes are then used exclusively [...]" This is simply and extract of what is explained, partly in the TeXbook, and in TeX: the program, 2 volumes of the 5 D.E. Knuth' series on computer typesetting. The initial exchange between characters is, shall we say, on the "system" level. But it is, in the code, limited to the ASCII (7 bits) range (and even if virtex(1) is almost the bare metal, it can be only bootstrapped by ASCII macro commands); and furthermore, TeX is "8 bits clean", that is only using, for "text", 8 bits for input... and as CID for fonts. The exchange is defined at compilation time, but can also be remapped via macro-commands. So casting utf in 8 bits is: - useless for ASCII (by definition); - will work only for latin1 input. Extending TeX to wydes (runes) will be relatively easy superficially for input and output (because D.E.K. has organized the code so that these parts can be easily changed), but will not work with TeX fonts: all the fonts machinery has to be changed. Furthermore, this will not work, as is, with all the Unicode range, since TeX is "left-to-right" (but what is fundamental is that, all in all, with the exception perhaps of Frege's ideography, all languages seem to be linear; so a switch in TeX for width and height of the boxes computed, and hints for dvi drivers to flip/mirror can achieve the task). So this also is to be adapted (hence the suggestion for XeTeX). So for now, TeX is kept 8 bits. I make no assumption for the encoding (and user has to feed "8 bits encoding" to TeX; ASCII users have nothing to change; others, if they want to use directly another 8 bits encoding (ex.: directly accented letters latin1 code) have to tcs(1) the file first. What I will change is only on the fonts available. For historical reasons, the fonts derived from the PostScript standard ones were in "EC" encoding, aka Cork, mapping mainly latin1 characters in the 128-255 in not the latin1 encoding (because it was defined in 1990). A macro set shall install its own expected fonts. KerTeX shall be usable to full (relatively to its present state) extent with the KerTeX provided data, here fonts. And to avoid providing non D.E.K.'s fonts with the same (cryptic) names as the ones commonly found in other TeX distributions, the kerTeX ones will use a Unix feature: directory hierarchy, to explain the dependencies: not an initial letter for the font forgery, but a subdirectory: adobe/ etc. This does not prevent anyone from generating other flavours, especially because by looking to the dir layout and to the conf/KERTEX.post-install Bourne shell script, everything is shown and explained. -- Thierry Laronde http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C