* [9fans] [RFC] fonts and unicode/utf [TeX] @ 2011-06-16 12:17 tlaronde 2011-06-16 16:49 ` Russ Cox ` (3 more replies) 0 siblings, 4 replies; 52+ messages in thread From: tlaronde @ 2011-06-16 12:17 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Hello, I'm currently exploring, for kerTeX, the area I have the least knowledge till now: fonts. It seems that the TeX community has spent a huge amount of time, and produced a huge amount of tricks to try to use fonts that have glyphes the Computer Modern have not, specially accented letters. In 1990, D.E. Knuth wrote an article to explain the correct (both simpler and most versatile) solution: virtual fonts. But since people have spent a huge amount of time, it seems furthermore that they were reluctant to throw everything away, and we are still dragging tons of data and struggling with puzzling tricks just because of human nature... So, trying to give the easiest solution for now, and trying to think about what can be done to use Plan9 established simplest way: utf, I'm on the following tracks about this. Adobe has published the AFM for the 35 standard base fonts for PostScript (the fonts that are resident in a PS printer). Starting from these AFM, kerTeX will produce the corresponding TFM, plus a virtual font. A virtual font can combine several distinct fonts, and furthermore can map glyphes. Since TeX uses (for now) its input as a stream of octets, the deal is to map this input encoding to the correct glyphes. One of the great feature of the AFM is that a glyphe is described by an ascii litteral name. The position of the glyphe, its index, in the font is not of a great concern: the virtual font can take care of the mapping (while the use directly of TFM will take the input encoding as the index). I have so extended the encoding used to generate the virtual fonts so that for the ASCII range it matches the Computer Modern expectations (hence it is totally compatible with plain TeX), and so that the latin1 encoding used as input will give the correct glyphes. And the cryptic names will be gone, because loading the (virtual) font will be defined by calling latin1/the_font. Why latin1? Not only because, as a French, I use it, but because it is compatible with unicode. First question: any feelings about this? Second question: I'm trying to find if, in western languages, including ligatures for ae and oe would be good since it is generally needed (one can forbid ligatures by inserting "{}" between the letters), or if it's not correct to set this by default for fonts (having the glyphes) since some western languages use generally the ae or oe combinations without knowing or expecting the substitution. A futur step can be made in the following direction: TeX is not limited to octet character, since for math, it uses indeed positive wydes (2^15). The code is always mapped to [0..255], but the whole number is used to switch between fonts (to simplify: see math mode, \fam and so on). Something like that could be done in the future, to use a TeX file directly, encoded in utf, using the rune to select fonts or subfonts. Cheers, -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-16 12:17 [9fans] [RFC] fonts and unicode/utf [TeX] tlaronde @ 2011-06-16 16:49 ` Russ Cox 2011-06-16 17:37 ` tlaronde 2011-06-16 17:43 ` tlaronde ` (2 subsequent siblings) 3 siblings, 1 reply; 52+ messages in thread From: Russ Cox @ 2011-06-16 16:49 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Virtual fonts tricks can't be the correct solution. The correct solution is to use a font format that can handle >256 glyphs, such as OTF. This is what heirloom troff does. Failing that, it is not clear how much you want to hack up tex versus just going along to get along. For Latin alphabets, the Plan 9 tex iso has an extra style file called 'unicode.sty' that does some serious latex heroics to trick latex into interpreting UTF-8 byte sequences as their corresponding Latex equivalents. Russ ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-16 16:49 ` Russ Cox @ 2011-06-16 17:37 ` tlaronde 2011-06-16 18:43 ` Bakul Shah 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-16 17:37 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 16, 2011 at 12:49:12PM -0400, Russ Cox wrote: > Virtual fonts tricks can't be the correct solution. Virtual fonts are not the whole solution. To accept, naturally, utf as input, TeX will have to be adapted (and it is perhaps not as deep as one could think). But virtual fonts can use fonts where "glyphes" are not organised conforming to unicode, leaving the fonts untouched. That's where the present situation seems not optimal, since afm2tfm(1) is used to even reencode the PostScript fonts. > The correct solution is to use a font format that > can handle >256 glyphs, such as OTF. > This is what heirloom troff does. > > Failing that, it is not clear how much you want to > hack up tex versus just going along to get along. > For Latin alphabets, the Plan 9 tex iso has an extra > style file called 'unicode.sty' that does some serious > latex heroics to trick latex into interpreting UTF-8 > byte sequences as their corresponding Latex > equivalents. See above. But as always, the first step is to simplify things so that the bottlenecks are clear. That's what I'm presently doing, and that's why the Bourne shell conf/KERTEX_T.post-install generates everything (while compiled fonts are portable for example, and I could simply provide the result for download): so that a user can see "how it is done"---even if nobody cares. Tracking the current acrobatics done between PostScript fonts (or others), encoding, tex macro. and so on is puzzling to say things charitably. And trying to understand how it is supposed to work by scrutinizing the current state is definitively not the best path (and I suspect that this is really the "wizardry": deceiving by complexity to hide a simple reality). I seem to recall reading (by a cursory look) about subfonts in Plan9, precisely for fonts not describing the whole unicode range. Modifying TeX to accept utf as input (I mean the compiler/interpreter by itself; not macros), converting to rune and then using 16 bits à la math mode to switch inside a font family to the "correct" 256 vector is something that, for a first step, seems to me both reasonable and simple. I think I have even a solution to handle right to left, top to bottom, bottom to top (??), and mixing these inside a page... but this is not on the top of the stack (and TeX by itself would be lightly touched; the core will be put in the font format and the dvi drivers). One of the best "feature" of the TeX package is... METAFONT. For a mathematician; for a philologist etc. the ability to create signs is, to my never humble opinion, a must. And for example, D.E. Knuth math fonts have both +/- and -/+ glyphes. This is where you can see the mathematician touch. I have old (19th century) main math textbooks where these are used to explain vertical alternatives in a "linear" equation, and the order matters. troff(1) combined with eqn(1) etc. gives already a superb formatting medium. But it does not provide the designing of fonts... So the TeX system has to be adapted. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-16 17:37 ` tlaronde @ 2011-06-16 18:43 ` Bakul Shah 2011-06-16 19:20 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: Bakul Shah @ 2011-06-16 18:43 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 698 bytes --] > Modifying TeX to accept utf as input (I mean the compiler/interpreter by > itself; not macros), converting to rune and then using 16 bits � la math > mode to switch inside a font family to the "correct" 256 vector is > something that, for a first step, seems to me both reasonable and > simple. What about XeTeX? It is a merge of TeX with Unicode and modern font tech. Works with OpenType Fonts. Included in TeX Live among others. I can use XeTeX with TeXShop & TeXWorks. I am just a user so don't know how hard it would be to port but seems like it is widely used now. See http://scripts.sil.org/xetex Some more examples @ http://nitens.org/taraborelli/latex ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-16 18:43 ` Bakul Shah @ 2011-06-16 19:20 ` tlaronde 0 siblings, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-16 19:20 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 16, 2011 at 11:43:28AM -0700, Bakul Shah wrote: > > Modifying TeX to accept utf as input (I mean the compiler/interpreter by > > itself; not macros), converting to rune and then using 16 bits à la math > > mode to switch inside a font family to the "correct" 256 vector is > > something that, for a first step, seems to me both reasonable and > > simple. > > What about XeTeX? It is a merge of TeX with Unicode and > modern font tech. Works with OpenType Fonts. I will give it a look. The decision will depend on: 1) The licence: if it is GPL, I will not touch it even with a long spoon... 2) If the core modifications are separated enough from the kpathsea and so on dance. 3) The nature of the solution. There is another program I have to give it a look: John Hobby has given me the information about an evolution of his MetaPost. It is original AT&T and LGPL, so for the licence it's OK. For the modifications, I will have to look. So just to say that I'm not discarding existing solutions by principle. If XeTeX does answer correctly---to my taste---to the problem, why not? But since there is now almost only the lost needle in kerTeX, I will not add back hay. And just for the record once more: LaTeX can work with kerTeX; so even the unicode.sty hack Russ Cox wrote about can work with kerTeX. User has all the rope he can dream of... -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-16 12:17 [9fans] [RFC] fonts and unicode/utf [TeX] tlaronde 2011-06-16 16:49 ` Russ Cox @ 2011-06-16 17:43 ` tlaronde 2011-06-17 14:18 ` Joel C. Salomon 2011-06-19 14:07 ` erik quanstrom 3 siblings, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-16 17:43 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 16, 2011 at 02:17:00PM +0200, tlaronde@polynum.com wrote: >[...] > Second question: I'm trying to find if, in western languages, including > ligatures for ae and oe would be good since it is generally needed (one > can forbid ligatures by inserting "{}" between the letters), or if it's > not correct to set this by default for fonts (having the glyphes) since > some western languages use generally the ae or oe combinations without > knowing or expecting the substitution. Answering to myself: the "co" prefix---coexist etc.---implies that the "oe" ligature would be a mistake. And if "ae" in accented french seems to be correct, it is not the case in english for example: "aerial" would definitively not benefit from the ligature. So the two shall not be ligatures. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-16 12:17 [9fans] [RFC] fonts and unicode/utf [TeX] tlaronde 2011-06-16 16:49 ` Russ Cox 2011-06-16 17:43 ` tlaronde @ 2011-06-17 14:18 ` Joel C. Salomon 2011-06-17 15:37 ` tlaronde 2011-06-19 14:07 ` erik quanstrom 3 siblings, 1 reply; 52+ messages in thread From: Joel C. Salomon @ 2011-06-17 14:18 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 16, 2011 at 8:17 AM, <tlaronde@polynum.com> wrote: > Second question: I'm trying to find if, in western languages, including > ligatures for ae and oe would be good since it is generally needed (one > can forbid ligatures by inserting "{}" between the letters), or if it's > not correct to set this by default for fonts (having the glyphes) since > some western languages use generally the ae or oe combinations without > knowing or expecting the substitution. Unicode has part of the answer: Some ligatures (e.g., U+FB03 “ffi”) are “Presentation Forms”, i.e., not “real” characters but alternate visual presentations of the the comprising characters. Those are (usually) OK to generate automatically. But “ae”≠“æ” and “oe”≠“œ”, &c.—please don’t make these substitutions Unicode doesn’t have many of these Presentation Forms, and only includes them for round-trip compatibility with other code sets that have them. New ligatures of this sort are *very* unlikely to be added. Better solution, as Russ suggested, is OpenType. In that format, the font designer can include common (e.g., “ff”), historical (e.g., “st” & “ſs”), and even ad-hoc ligatures. (There are “fun” fonts with “LOL” ligatures, for example.) Different sets of ligatures can be enabled/disabled by selecting combinations of OTF features. At which point you’ve reinvented XɘTeX. —Joel ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-17 14:18 ` Joel C. Salomon @ 2011-06-17 15:37 ` tlaronde 2011-06-17 18:07 ` Joel C. Salomon 2011-06-19 14:21 ` erik quanstrom 0 siblings, 2 replies; 52+ messages in thread From: tlaronde @ 2011-06-17 15:37 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Fri, Jun 17, 2011 at 10:18:20AM -0400, Joel C. Salomon wrote: >[...] > OK to generate automatically. But ?ae???æ? and ?oe?????, &c.?please > don?t make these substitutions I have already found (and answered) that "oe" can not be a ligature since, even in french, the "oe" sequence appears in words that do not want the substitution ("coefficient"), and "ae" is rare enough and not even a regular rule (there are greek words [in french] that do not want it; and even from latin, it is not regular). >[...] > At which point you?ve reinvented X?TeX. I've given a look at it. I don't want to start a discussion about Unicode, since, supplementary to the "characters" (alphabetical, syllabics, ideographics; but no hieroglyphes or Linear B, so it's not complete ;) there are formatting commands or rendering (the ligature fi is not a character; but in the XeTeX FAQ it is said user has to insert directly the Unicode for this codepoint since there is no ligature), that I don't think should be there (only the historical ASCII controls should be there; others should be undefined). But for XeTeX and Plan9 there is a special point: XeTeX uses some C++. As I have answered privately to someone, it is not an absolute obstacle---the files are not very numerous so a C flavour could be achieved. But if people start throwing me XeTeX in the legs, I will start crying for a C++ compiler on Plan9... No, no: I don't make threats! -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-17 15:37 ` tlaronde @ 2011-06-17 18:07 ` Joel C. Salomon 2011-06-17 18:37 ` tlaronde 2011-06-19 14:21 ` erik quanstrom 1 sibling, 1 reply; 52+ messages in thread From: Joel C. Salomon @ 2011-06-17 18:07 UTC (permalink / raw) To: 9fans On 06/17/2011 11:37 AM, tlaronde@polynum.com wrote: > On Fri, Jun 17, 2011 at 10:18:20AM -0400, Joel C. Salomon wrote: >> At which point you've reinvented XeTeX. > > I've given a look at it. I don't want to start a discussion about > Unicode, since, supplementary to the "characters" <snip> > there are formatting commands or rendering <snip> > that I don't think should be there (only the historical ASCII controls > should be there; others should be undefined). Ignore 'em. Or map them to TeX control sequences. > but no hieroglyphes or Linear B, so it's not complete ;) The fonts may be lacking, but Hieroglyphs & Linear B *are* in Unicode; see <alanwood.net/unicode/egyptian-hieroglyphs.html> and <alanwood.net/unicode/linear_b_syllabary.html>. > (the ligature fi > is not a character; but in the XeTeX FAQ it is said user has to insert > directly the Unicode for this codepoint since there is no ligature), That's true for TeX's "--" and "---" pseudo-ligatures; the XeTeX way is to insert the Unicode en- & em-dashes, or to use the "tex-text" font mapping. But for "fi" &c., or the more exotic ones, XeTeX will use whatever ligatures the font's designer has put into the OTF file. (Also be aware that the XeTeX FAQ on the SIL site is *seriously* out-of-date.) > But for XeTeX and Plan9 there is a special point: XeTeX uses some C++. > As I have answered privately to someone, it is not an absolute > obstacle---the files are not very numerous so a C flavour could be > achieved. > > But if people start throwing me XeTeX in the legs, I will start crying > for a C++ compiler on Plan9... A C version of the PDF library XeTeX uses to translate its "extended" XDVI format to PDF would be interesting. C++, though.... No, I'll not reopen that can of worms today. --Joel ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-17 18:07 ` Joel C. Salomon @ 2011-06-17 18:37 ` tlaronde 0 siblings, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-17 18:37 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Fri, Jun 17, 2011 at 02:07:42PM -0400, Joel C. Salomon wrote: > On 06/17/2011 11:37 AM, tlaronde@polynum.com wrote: >[...] > > but no hieroglyphes or Linear B, so it's not complete ;) > > The fonts may be lacking, but Hieroglyphs & Linear B *are* in Unicode; > see <alanwood.net/unicode/egyptian-hieroglyphs.html> and > <alanwood.net/unicode/linear_b_syllabary.html>. I stand corrected (and this is why I think METAFONT is great: the ability to create "easily" what would be too expensive due to small audience). I do think that if Hilbert had had METAFONT to give to one of his students at Göttingen, he would not have plagued mathematics with gothic... >[...] > > A C version of the PDF library XeTeX uses to translate its "extended" > XDVI format to PDF would be interesting. C++, though.... No, I'll not > reopen that can of worms today. I'm definitively not a C++ fan, so it's a pure threat. For now (I mean kerTeX 1.0) I will go the farthest I can go with 8bit TeX and simplicity, and try to gather enough knowledge around fonts so that I can decide after where to invest my limited amount of time. Not in the thread about mouse vs keyboard, I guess. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-17 15:37 ` tlaronde 2011-06-17 18:07 ` Joel C. Salomon @ 2011-06-19 14:21 ` erik quanstrom 1 sibling, 0 replies; 52+ messages in thread From: erik quanstrom @ 2011-06-19 14:21 UTC (permalink / raw) To: 9fans > I've given a look at it. I don't want to start a discussion about > Unicode, since, supplementary to the "characters" (alphabetical, > syllabics, ideographics; but no hieroglyphes or Linear B, so it's not > complete ;) not central to my point, but this is not correct ; grep -i 'linear b syllable b008' /lib/unicode 010000 linear b syllable b008 a ; grep -i 'egyptian hieroglyph a001' /lib/unicode 013000 egyptian hieroglyph a001 > there are formatting commands or rendering (the ligature fi > is not a character; but in the XeTeX FAQ it is said user has to insert > directly the Unicode for this codepoint since there is no ligature), > that I don't think should be there (only the historical ASCII controls > should be there; others should be undefined). the general idea behind unicode is that it is a sequenced collection of codepoints, not characters. this implies that formatting differences such as ligatures that have not sematic component (typesetting artifacts, if you will) shouldn't be encoded in the character set. i realize there are some exceptions to this, but imho, the unicode committee are not perfect. it's easy enough to escape non-codepoints or encode them in one of the private unicode ranges. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-16 12:17 [9fans] [RFC] fonts and unicode/utf [TeX] tlaronde ` (2 preceding siblings ...) 2011-06-17 14:18 ` Joel C. Salomon @ 2011-06-19 14:07 ` erik quanstrom 2011-06-19 16:34 ` tlaronde 3 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2011-06-19 14:07 UTC (permalink / raw) To: 9fans > I have so extended the encoding used to generate the virtual fonts so > that for the ASCII range it matches the Computer Modern expectations > (hence it is totally compatible with plain TeX), and so that the latin1 > encoding used as input will give the correct glyphes. And the cryptic > names will be gone, because loading the (virtual) font will be defined > by calling latin1/the_font. > > Why latin1? Not only because, as a French, I use it, but because it is > compatible with unicode. perhaps you mean the subset of unicode corresponding to the codepoints encoded by latin1 encoded in utf-8. the system character set is utf-8, and latin1 is not a compatable encoding. utf-8 is assumed everwhere except when the data is inbound, and explicitly tagged as having a different caracter set. programs like upas/fs and webfs do the conversion at the border. there's really no reason for latin1 in 2011. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-19 14:07 ` erik quanstrom @ 2011-06-19 16:34 ` tlaronde 2011-06-19 18:01 ` tlaronde 2011-06-19 22:38 ` erik quanstrom 0 siblings, 2 replies; 52+ messages in thread From: tlaronde @ 2011-06-19 16:34 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sun, Jun 19, 2011 at 10:07:19AM -0400, erik quanstrom wrote: > > > > Why latin1? Not only because, as a French, I use it, but because it is > > compatible with unicode. > > perhaps you mean the subset of unicode corresponding to the codepoints > encoded by latin1 encoded in utf-8. the system character set is utf-8, > and latin1 is not a compatable encoding. utf-8 is assumed everwhere except > when the data is inbound, and explicitly tagged as having a different > caracter set. programs like upas/fs and webfs do the conversion at the > border. > > there's really no reason for latin1 in 2011. There is a reason here: for now, TeX is 8 bits and that's all. So, if allowing to use, at least, all of the 8 bits means something, it shall be latin1. This does not prevent somebody to use whatever character set one wants; but as a default, and _for now_, it's better than nothing; and significantly better than some random character set that no tcs(1) will know how to deal with. To accept directly utf-8 as input will not be addressed for the 1.0 release of kerTeX. And if people think that I'm too slow: be my guest. I claim it is easier to tackle the task with kerTeX, than with TeXlive. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-19 16:34 ` tlaronde @ 2011-06-19 18:01 ` tlaronde 2011-06-19 22:38 ` erik quanstrom 1 sibling, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-19 18:01 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sun, Jun 19, 2011 at 06:34:58PM +0200, tlaronde wrote: > > There is a reason here: for now, TeX is 8 bits and that's all. So, if > allowing to use, at least, all of the 8 bits means something, it shall > be latin1. To be more accurate: TeX is 8 bits, and wants ASCII for the first semi-range. The Computer Modern are ASCII (plus, in control positions, ligatures and so on). The PostScript standard fonts have all latin1. Hence, by default, the fonts built from the PostScript core fonts shall be with a latin1 encoding, since this is the best that can be done, with the glyphes in the font on one side, and the 8 bits capabilities of TeX on the other. Other encoding for TeX is possible too (in a 256 glyphes limit), but by default I will provide latin1 for the fonts built from Adobe afm. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-19 16:34 ` tlaronde 2011-06-19 18:01 ` tlaronde @ 2011-06-19 22:38 ` erik quanstrom 2011-06-20 11:18 ` tlaronde 1 sibling, 1 reply; 52+ messages in thread From: erik quanstrom @ 2011-06-19 22:38 UTC (permalink / raw) To: 9fans > > perhaps you mean the subset of unicode corresponding to the codepoints > > encoded by latin1 encoded in utf-8. the system character set is utf-8, > > and latin1 is not a compatable encoding. utf-8 is assumed everwhere except > > when the data is inbound, and explicitly tagged as having a different > > caracter set. programs like upas/fs and webfs do the conversion at the > > border. > > > > there's really no reason for latin1 in 2011. > > There is a reason here: for now, TeX is 8 bits and that's all. So, if > allowing to use, at least, all of the 8 bits means something, it shall > be latin1. This does not prevent somebody to use whatever character set > one wants; but as a default, and _for now_, it's better than nothing; > and significantly better than some random character set that no tcs(1) > will know how to deal with. > > To accept directly utf-8 as input will not be addressed for the 1.0 > release of kerTeX. i think you've missed my point. latin1 is an encoding, utf-8 is an encoding. if tex is so backwards that it can't accept a character wider than 8 bits, then it would be reasonable to not be different than the rest of the plan 9 system to read utf 8 runes (i.e. not latin1) in and then reject runes with a codepoint above 255. then, if tex is fixed to accept larger codepoints, one can remove this limit. if latin1 is used, then it can not be retrofitted in a way that is compatable with older tex input. nobody cares what font encoding tex uses internally. the real issue is the input to tex. i sure would be very reluctant to load anything on my system that will mangle utf-8, especially for codepoints <256. that's the path to wchar_t. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-19 22:38 ` erik quanstrom @ 2011-06-20 11:18 ` tlaronde 2011-06-20 21:53 ` erik quanstrom 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-20 11:18 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sun, Jun 19, 2011 at 06:38:59PM -0400, erik quanstrom wrote: > > nobody cares what font encoding tex uses internally. the > real issue is the input to tex. i sure would be very reluctant > to load anything on my system that will mangle utf-8, especially > for codepoints <256. that's the path to wchar_t. That TeX on Plan9 should accept utf-8 is not a question. But TeX has a present state, and kerTeX has a present state. For now, TeX only chews bytes (octets); there is apparently some acrobatics with a LaTeX macro set trying to accomodate with utf in input (according to Russ Cox if I understood correctly what he wrote). One can use TeX with utf as long as one uses only ASCII (by design/definition of utf). That is one can use TeX in interactive mode on Plan9 conforming to the TeXbook, since the TeXbook uses ASCII, even to create non ASCII glyphes (accented with escape sequences). TeX will do non desired things if it chews non ASCII encoded in utf (and this starts even with the Unicode-latin1 range). BUT, since the "codepoints" described in the latin1 subrange are present (except for /dcroat and /Dcroat) in the 229 glyphes PostScript Core fonts, and I can create fonts (tfm) for TeX covering "ASCII/latin1" characters, this allows people using this more wide (even if limited) range, to enter the text on Plan9; to use tcs(1) to convert this range to latin1 i.e. 8 bits encoding, and to feed (not interactive) this file to TeX. This adds, for now (and for others than Plan9 that still use chars == octets) some supplementary ability, without removing something. I have to make a choice. YES, "latin1" too is not less special than not ASCII in utf; but glyphes are there (in PS core fonts) ; it is in the same value than Unicode ; so it seems more natural to choose this than any other _for now_. Paris has not been built in one day. KerTeX neither. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-20 11:18 ` tlaronde @ 2011-06-20 21:53 ` erik quanstrom 2011-06-21 10:56 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2011-06-20 21:53 UTC (permalink / raw) To: 9fans On Mon Jun 20 07:17:16 EDT 2011, tlaronde@polynum.com wrote: > On Sun, Jun 19, 2011 at 06:38:59PM -0400, erik quanstrom wrote: > > > > nobody cares what font encoding tex uses internally. the > > real issue is the input to tex. i sure would be very reluctant > > to load anything on my system that will mangle utf-8, especially > > for codepoints <256. that's the path to wchar_t. > > That TeX on Plan9 should accept utf-8 is not a question. But TeX has a > present state, and kerTeX has a present state. i'm not sure what the hard part is. just front the normal input function with one that calls chartorune and rejects anything above codepoint 255. that can't be more than 10 lines of code. that way there is no possibility of latin1 nonsense breaking previously- functional .tex files, and you don't have to change any assumptions in the code. (it might be better later on to operate directly on utf-8 rather than some sort of wide character format like a rune, but that can't break existing .tex files.) - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-20 21:53 ` erik quanstrom @ 2011-06-21 10:56 ` tlaronde 2011-06-24 23:05 ` Mauricio CA 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-21 10:56 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Mon, Jun 20, 2011 at 05:53:25PM -0400, erik quanstrom wrote: > > i'm not sure what the hard part is. just front the normal input > function with one that calls chartorune and rejects anything above > codepoint 255. that can't be more than 10 lines of code. > > that way there is no possibility of latin1 nonsense breaking previously- > functional .tex files, and you don't have to change any assumptions > in the code. (it might be better later on to operate directly on utf-8 > rather than some sort of wide character format like a rune, but that > can't break existing .tex files.) Yes, "casting" to byte can do and this is almost trivial since the input is buffered and handled via libweb (in kerTeX). But this will disallow use of TeX for non ASCII, non latin1... It seems to me better to document, and let user convert his files via tcs(1) to feed TeX. Alternative solution would be to introduce some TEX_ENCODING env variable to let input/output in TeX doing the conversion. But on Plan9 this seems to me simply ugly... to reintroduce by the window what was thrown out by the door... To be noted that at the moment I do not change _anything_ in the TeX code. The "latin1" is just the "encoding" of the fontes derived from the PS core ones (the same can be made with Computer Modern via virtual fonts to allow to the use directly of accented letters). -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-21 10:56 ` tlaronde @ 2011-06-24 23:05 ` Mauricio CA 2011-06-25 6:50 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: Mauricio CA @ 2011-06-24 23:05 UTC (permalink / raw) To: 9fans >> i'm not sure what the hard part is. just front the normal input function >> with one that calls chartorune and rejects anything above codepoint 255. >> that can't be more than 10 lines of code. [...] > Yes, "casting" to byte can do and this is almost trivial since the input > is buffered and handled via libweb (in kerTeX). But this will disallow > use of TeX for non ASCII, non latin1... It seems to me better to document, > and let user convert his files via tcs(1) to feed TeX. [...] I found this text in TeX by Topic[1] that seems to support Quanstrom's idea. It describes how TeX reads input, and says it's done one line at a time (where it follows what the system defines as lines) and then for each line it first removes trailing spaces; then (possibly) ads a return to the end of the line; and then, since "computers may also differ in the character encoding (the most common schemes are ASCII and EBCDIC), so TeX converts the characters that are read from the file to its own character codes. These codes are then used exclusively [...]" So, it seems it's expected that encoding specific transformation is applied to TeX input. Removing trailing spaces, at least, can't be done without understanding utf-8. (I warn, though, that I have no expertise in this subject.) Best, Maurício [1] http://eijkhout.net/texbytopic/texbytopic.html. I got a ready to use PDF at http://tex.loria.fr/general/texbytopic.pdf. What I describe is found at section 2.2. ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-24 23:05 ` Mauricio CA @ 2011-06-25 6:50 ` tlaronde 2011-06-25 12:19 ` erik quanstrom 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-25 6:50 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Fri, Jun 24, 2011 at 11:05:23PM +0000, Mauricio CA wrote: > > I found this text in TeX by Topic[1] that seems to support Quanstrom's > idea. It describes how TeX reads input, and says it's done one line at > a time (where it follows what the system defines as lines) and then for > each line it first removes trailing spaces; then (possibly) ads a return > to the end of the line; and then, since "computers may also differ in > the character encoding (the most common schemes are ASCII and EBCDIC), > so TeX converts the characters that are read from the file to its own > character codes. These codes are then used exclusively [...]" This is simply and extract of what is explained, partly in the TeXbook, and in TeX: the program, 2 volumes of the 5 D.E. Knuth' series on computer typesetting. The initial exchange between characters is, shall we say, on the "system" level. But it is, in the code, limited to the ASCII (7 bits) range (and even if virtex(1) is almost the bare metal, it can be only bootstrapped by ASCII macro commands); and furthermore, TeX is "8 bits clean", that is only using, for "text", 8 bits for input... and as CID for fonts. The exchange is defined at compilation time, but can also be remapped via macro-commands. So casting utf in 8 bits is: - useless for ASCII (by definition); - will work only for latin1 input. Extending TeX to wydes (runes) will be relatively easy superficially for input and output (because D.E.K. has organized the code so that these parts can be easily changed), but will not work with TeX fonts: all the fonts machinery has to be changed. Furthermore, this will not work, as is, with all the Unicode range, since TeX is "left-to-right" (but what is fundamental is that, all in all, with the exception perhaps of Frege's ideography, all languages seem to be linear; so a switch in TeX for width and height of the boxes computed, and hints for dvi drivers to flip/mirror can achieve the task). So this also is to be adapted (hence the suggestion for XeTeX). So for now, TeX is kept 8 bits. I make no assumption for the encoding (and user has to feed "8 bits encoding" to TeX; ASCII users have nothing to change; others, if they want to use directly another 8 bits encoding (ex.: directly accented letters latin1 code) have to tcs(1) the file first. What I will change is only on the fonts available. For historical reasons, the fonts derived from the PostScript standard ones were in "EC" encoding, aka Cork, mapping mainly latin1 characters in the 128-255 in not the latin1 encoding (because it was defined in 1990). A macro set shall install its own expected fonts. KerTeX shall be usable to full (relatively to its present state) extent with the KerTeX provided data, here fonts. And to avoid providing non D.E.K.'s fonts with the same (cryptic) names as the ones commonly found in other TeX distributions, the kerTeX ones will use a Unix feature: directory hierarchy, to explain the dependencies: not an initial letter for the font forgery, but a subdirectory: adobe/ etc. This does not prevent anyone from generating other flavours, especially because by looking to the dir layout and to the conf/KERTEX.post-install Bourne shell script, everything is shown and explained. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-25 6:50 ` tlaronde @ 2011-06-25 12:19 ` erik quanstrom 2011-06-25 15:03 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2011-06-25 12:19 UTC (permalink / raw) To: 9fans > So for now, TeX is kept 8 bits. I make no assumption for the encoding > (and user has to feed "8 bits encoding" to TeX; ASCII users have nothing > to change; others, if they want to use directly another 8 bits encoding > (ex.: directly accented letters latin1 code) have to tcs(1) the file > first. i am not clear on what "the file" means in this context. do you mean the .tex input file or font files? - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-25 12:19 ` erik quanstrom @ 2011-06-25 15:03 ` tlaronde 2011-06-25 15:11 ` erik quanstrom 2011-06-25 16:34 ` Mauricio CA 0 siblings, 2 replies; 52+ messages in thread From: tlaronde @ 2011-06-25 15:03 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Jun 25, 2011 at 08:19:40AM -0400, erik quanstrom wrote: > > So for now, TeX is kept 8 bits. I make no assumption for the encoding > > (and user has to feed "8 bits encoding" to TeX; ASCII users have nothing > > to change; others, if they want to use directly another 8 bits encoding > > (ex.: directly accented letters latin1 code) have to tcs(1) the file > > first. > > i am not clear on what "the file" means in this context. do you mean > the .tex input file or font files? I mean the .tex file. The font files as seen by TeX are only the metrics tfm, and they are binaries. Since TeX is "8 bits", the tex file must have characters encoded in 8 bits, with the not control positions of the first half being, after perhaps mapping defined at compile time (can be remapped at user level but with apparently "strange" macro commands), conforming to ASCII--- used as litterals but also for the primitives. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-25 15:03 ` tlaronde @ 2011-06-25 15:11 ` erik quanstrom 2011-06-25 16:33 ` tlaronde 2011-06-25 16:34 ` Mauricio CA 1 sibling, 1 reply; 52+ messages in thread From: erik quanstrom @ 2011-06-25 15:11 UTC (permalink / raw) To: 9fans On Sat Jun 25 11:01:38 EDT 2011, tlaronde@polynum.com wrote: > On Sat, Jun 25, 2011 at 08:19:40AM -0400, erik quanstrom wrote: > > > So for now, TeX is kept 8 bits. I make no assumption for the encoding > > > (and user has to feed "8 bits encoding" to TeX; ASCII users have nothing > > > to change; others, if they want to use directly another 8 bits encoding > > > (ex.: directly accented letters latin1 code) have to tcs(1) the file > > > first. > > > > i am not clear on what "the file" means in this context. do you mean > > the .tex input file or font files? > > I mean the .tex file. The font files as seen by TeX are only the metrics > tfm, and they are binaries. so are you planning on hiding this conversion within the tex executable or some shell script fronting the executable? that would work. but letting ancient and deprecated latin1 escape into editors, &c. would be a mistake. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-25 15:11 ` erik quanstrom @ 2011-06-25 16:33 ` tlaronde 0 siblings, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-25 16:33 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Jun 25, 2011 at 11:11:50AM -0400, erik quanstrom wrote: > On Sat Jun 25 11:01:38 EDT 2011, tlaronde@polynum.com wrote: > > > > I mean the .tex file. The font files as seen by TeX are only the metrics > > tfm, and they are binaries. > > so are you planning on hiding this conversion within the tex > executable or some shell script fronting the executable? > that would work. but letting ancient and deprecated > latin1 escape into editors, &c. would be a mistake. For the moment I will "hide" strictly nothing in the compiled program. The only modification is an external choice: that the fonts built from Adobe PostScript Times-Roman etc. that have only 256 positions will have in the 128-255 range an "encoding" corresponding to latin1---while the Cork. aka EC encoding still shipped for historical reason with distributions of TeX and al. (there is also "8r" encoding that is latin1 compatible IIRC...) has a not latin1 encoding of the latin1 characters. The simplest for a Plan9 user is indeed to access his JIT compiled macro set (latex(1) is just TeX masquerading behind to know what predigested version to load using argv[0]) via a script to convert the utf .tex file to some 8bits characters set before passing this one to TeX. The correct solution, later, will be to let TeX directly handle utf as input... but this means not only extending to wydes instead of bytes, but a heavy lifting for font support too and, if possible, not only left-to-right direction. This is why XeTeX comes in the discussion, but with C++ floating around it, it is not an immediate candidate for Plan9. (And I'm personnally more keen on having an extended DVI and a dvipdf driver---because there can be a dvi whatever and there can be a dvi "standalone" viewer---, than plaguing TeX directly with pdf whose "interactive" features are growing exponentially with all the open source solutions lagging far behind Adobe solutions; and I'm afraid that I will not devote time to support on a "page" interactive animations, like a paper clip rolling big eyes, even if it is supposed to sing "la Marseillaise"...) -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-25 15:03 ` tlaronde 2011-06-25 15:11 ` erik quanstrom @ 2011-06-25 16:34 ` Mauricio CA 2011-06-25 17:11 ` tlaronde 1 sibling, 1 reply; 52+ messages in thread From: Mauricio CA @ 2011-06-25 16:34 UTC (permalink / raw) To: 9fans > Since TeX is "8 bits", the tex file must have characters encoded in > 8 bits, with the not control positions of the first half being, after > perhaps mapping defined at compile time (can be remapped at user level > but with apparently "strange" macro commands), conforming to ASCII--- > used as litterals but also for the primitives. Is it possible to change the representation of a character from an 8 bits char to, say, a 32 bits integer? If those integers are still mapped to the existing 8 bits font metrics, wouldn't the basic engine be kept the same? This probably means extending the syntax of a few control sequences denoting characters, though. Best, Maurício ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-25 16:34 ` Mauricio CA @ 2011-06-25 17:11 ` tlaronde 2011-06-25 18:43 ` Michael Kerpan 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-25 17:11 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Sat, Jun 25, 2011 at 04:34:17PM +0000, Mauricio CA wrote: > > Since TeX is "8 bits", the tex file must have characters encoded in > > 8 bits, with the not control positions of the first half being, after > > perhaps mapping defined at compile time (can be remapped at user level > > but with apparently "strange" macro commands), conforming to ASCII--- > > used as litterals but also for the primitives. > > Is it possible to change the representation of a character from an 8 bits > char to, say, a 32 bits integer? If those integers are still mapped to > the existing 8 bits font metrics, wouldn't the basic engine be kept the > same? This probably means extending the syntax of a few control sequences > denoting characters, though. No, if there is to be a promotion, this is from byte to wyde. There is "prior art" even in TeX (present program): in math mode, the characters are wydes (almost), this being interpreted as the combination of a font family and an index in the font. This exists in PostScript too (see the Red book). This extension would allow to accept utf as input (and as output for messges) without touching the font format. But for the 1.0 release of kerTeX, I will make strictly no acrobatics or tries (using tcs(1) gives a solution, even if not ideal). And I will first spend time thinking of the next step before starting implementing: take time to decide; once decided and sure of the solution, speed implementation. Not the common reverse: start "something" without thinking; and the "final" release being an asymptotical aim, every year added in implementation leading to a more minuscule gain without ever crossing the line... -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-25 17:11 ` tlaronde @ 2011-06-25 18:43 ` Michael Kerpan 2011-06-26 7:57 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: Michael Kerpan @ 2011-06-25 18:43 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs Modern TeX implementations like XeTeX and LuaTeX handle UTF-8 natively and also bring all sorts of benefits like OpenType support (automagic ligatures, real small caps, selectable lining or old-style figures and more) and the ability to define fonts from the system font pool rather than using archaic incantations and magic scrolls from the early 90s. The problem is that these modern implementations are HUGE. On the average Linux system, TeX, LaTeX and other paraphernalia seem to take up well over 1 GB these days. I've given up on TeX because it's just so darn big. There is, however, hope. Heirloom troff manages to include many of the same whizz-bang typographic features as XeTeX and friends (including Unicode support, smartfont support, easy loading of fonts in modern formats) while taking up about 1/100th the resource footprint. Clearly what we REALLY need is a filter that takes LaTeX sources and processes them into TROFF commands to feed to a port of Heirloom troff ;) Mike ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-25 18:43 ` Michael Kerpan @ 2011-06-26 7:57 ` tlaronde 2011-06-27 1:01 ` Michael Kerpan 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-26 7:57 UTC (permalink / raw) To: mjkerpan, Fans of the OS Plan 9 from Bell Labs On Sat, Jun 25, 2011 at 02:43:32PM -0400, Michael Kerpan wrote: > Modern TeX implementations like XeTeX and LuaTeX handle UTF-8 natively > and also bring all sorts of benefits like OpenType support (automagic > ligatures, real small caps, selectable lining or old-style figures and > more) and the ability to define fonts from the system font pool rather > than using archaic incantations and magic scrolls from the early 90s. I don't know what "automagic" ligatures are; but ligatures are here in the kerTeX fonts, user having nothing special to do to have them. Small caps are here. Using the system fonts is here too, at least for T1 fonts: afm2tfm(1) makes them available. For other fonts format, writing a whatever2tfm(1) will do the job. And "archaic" is definitively a marketing sentence, not a scientific judgement: "Euclid? Well... it was perhaps good for the epoch..." > The problem is that these modern implementations are HUGE. On the > average Linux system, TeX, LaTeX and other paraphernalia seem to take > up well over 1 GB these days. I've given up on TeX because it's just > so darn big. So have I. > > There is, however, hope. Heirloom troff manages to include many of the > same whizz-bang typographic features as XeTeX and friends (including > Unicode support, smartfont support, easy loading of fonts in modern > formats) while taking up about 1/100th the resource footprint. Clearly > what we REALLY need is a filter that takes LaTeX sources and processes > them into TROFF commands to feed to a port of Heirloom troff ;) kerTeX is 1/100th of the current TeX distributions and is C89, that is the most portable. It lacks some Heirloom troff features, but it is for text and mathematics, includes a font designer: METAFONT, a figure designer: MetaPost and a bunch of debugging utilities, coding utilities (WEB), fonts and a state of the art documentation. So I stick to kerTeX. And I have recorded what _you_ propose to do ;) Since you seem to claim that the way _you are engaged in_ is easier than the road I have taken, you should have finished before I have finished kerTeX, rendering it /* sigh */ obsolete... Not to mention that I can work on kerTeX only during limited slots of time, since my main developing time is for a huge beast: KerGIS. And it should be noted that I manage alone forks of G.R.A.S.S. and TeX and al., while "millions of users! thousands of programmers! hundreds of developers!" seem to be unable to evolve correctly the "community driven" equivalents... So imagine what one can achieve if one can concentrate on a far more limited scale? But beware of the tortoise... This is a lesson "GPL fanatics" have learned: say, by principle, that "free software" is perfect, and closed source one a desaster. Why? Simply because if someone criticizes open source code the answer is immediate: "code is here, be my guest". While, with closed source, one can spend gallons of electronic ink saying: "This sucks ! If only I had the code...". -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-26 7:57 ` tlaronde @ 2011-06-27 1:01 ` Michael Kerpan 2011-06-27 11:48 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: Michael Kerpan @ 2011-06-27 1:01 UTC (permalink / raw) To: tlaronde; +Cc: Fans of the OS Plan 9 from Bell Labs On Sun, Jun 26, 2011 at 3:57 AM, <tlaronde@polynum.com> wrote: > I don't know what "automagic" ligatures are; but ligatures are here in > the kerTeX fonts, user having nothing special to do to have them. Small > caps are here. Using the system fonts is here too, at least for T1 > fonts: afm2tfm(1) makes them available. For other fonts format, > writing a whatever2tfm(1) will do the job. In general using a simple Type 1 font isn't going to get you things like true small caps, ligatures (beyond maybe the basic "fi" and "fl") or the ability to choose between old-style and lining figures. The 256 glyph limit means that you had to split things up into multiple fonts, This works well enough for simply creating a PostScript file that will be fed straight to a laser printer, but for creating searchable PDF files, it's far from ideal. In TeX, it also require a lot of manual work above and beyond what would be needed to get those features using Computer Modern. With OpenType support (and using OpenType fonts, of course), typographic features become as easy to use with third-party fonts as they are with Computer Modern. > And "archaic" is definitively a marketing sentence, not a scientific > judgement: "Euclid? Well... it was perhaps good for the epoch..." True enough. it's more my opinion than anything else. Still, it must be an opinion shared by someone else, given the widespread use of "fontspec" wherever available compared to the older methods. >> The problem is that these modern implementations are HUGE. On the >> average Linux system, TeX, LaTeX and other paraphernalia seem to take >> up well over 1 GB these days. I've given up on TeX because it's just >> so darn big. > > So have I. > kerTeX is 1/100th of the current TeX distributions and is C89, that is > the most portable. It lacks some Heirloom troff features, but it is for > text and mathematics, includes a font designer: METAFONT, a figure > designer: MetaPost and a bunch of debugging utilities, coding utilities > (WEB), fonts and a state of the art documentation. I'm not disparaging your work. In fact I think its pretty good. I was mainly trying to point out the problems that have arisen in some "modern" TeX distros in the past. > So I stick to kerTeX. And I have recorded what _you_ propose to do ;) > Since you seem to claim that the way _you are engaged in_ is easier than > the road I have taken, you should have finished before I have finished > kerTeX, rendering it /* sigh */ obsolete... I doubt that, as tounge-in-cheek suggestions seldom seem to turn into working ideas (at least when they come from me) ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 1:01 ` Michael Kerpan @ 2011-06-27 11:48 ` tlaronde 2011-06-27 12:36 ` erik quanstrom 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-27 11:48 UTC (permalink / raw) To: Michael Kerpan; +Cc: Fans of the OS Plan 9 from Bell Labs On Sun, Jun 26, 2011 at 09:01:13PM -0400, Michael Kerpan wrote: > On Sun, Jun 26, 2011 at 3:57 AM, <tlaronde@polynum.com> wrote: > > > I don't know what "automagic" ligatures are; but ligatures are here in > > the kerTeX fonts, user having nothing special to do to have them. Small > > caps are here. Using the system fonts is here too, at least for T1 > > fonts: afm2tfm(1) makes them available. For other fonts format, > > writing a whatever2tfm(1) will do the job. > > In general using a simple Type 1 font isn't going to get you things > like true small caps, ligatures (beyond maybe the basic "fi" and "fl") > or the ability to choose between old-style and lining figures. These are not limitations of the software by itself but limitations due to the obscurity of the whole process. Ligatures can be added via the encoding passed to afm2tfm(1). As an example, if the next-to-come publication, I add the standard TeX classical ones (``, '', fi, fl, en-dash, em-dash, inverted ponctuation for spanish) plus << and >> for french guillemets, ,, for basedoublequote. Once you know how it is done (and since, if the corresponding glyphes do not exist, this is discarded), it is just a matter of calling the utility with the correct encoding. And once this is documented, no more "wizzards" needed... >The 256 > glyph limit means that you had to split things up into multiple fonts, > This works well enough for simply creating a PostScript file that will > be fed straight to a laser printer, but for creating searchable PDF > files, it's far from ideal. In TeX, it also require a lot of manual > work above and beyond what would be needed to get those features using > Computer Modern. With OpenType support (and using OpenType fonts, of > course), typographic features become as easy to use with third-party > fonts as they are with Computer Modern. > Same answer. TeX does not need the design of the glyphes. It needs only the metrics (Adobe has published the AFM for the core PostScript; the definition of the fonts is not public, that's why the "urw" ones are used.) These are not a limitation of TeX by itself, but of the surrounding environment and of the "freedom wizardry by obscurity". That's why too, I want to preserve dvi, because one can write a dvi2whatever, while putting directly pdf as the layout language is tying TeX to something external support. The huge mess "TeX distributions" have become will sooner of later kill TeX. One of the major lack of kerTeX now is a dvi display renderer (for X and Rio). So that the system is standalone and sheltered from external mood. What Donald E. Knuth wanted is the ability to write his books without depending on someone else anymore---"we can't print this way, since this is deprecated, unavailable etc.". KerTeX will definitively miss the goal if it depends on something else. The other intellectual context (on my side) is also the following. How did Michael Ventris find the clues to decipher linear B? The signs were too numerous to be alphabetical, not enough to be ideographic. So he guessed they were syllabic with some standalone ideographic ones. I suspect that if some civilizations have not evolved rapidly, this is due in part to the way the knowledge is transmitted. It is easy to learn alphabetic and, furthermore, this disconnects the signs partly from the sound and totally from the sight of the object (for real ones). Alphabetic has rules. While ideographic requires erudition, and since it seems unnatural to have an ideographic base (few signs that combined can describe highler level notions), it renders new ideas more difficult to express/transmit. Unicode is a good idea to avoid "guessing" the language and to plague code with the language knowledge. With this, utf encoding is the best idea, keeping ASCII and keeping the "smallest addressable" i.e. bytes. But I don't want to have the obligation to "know" 65536 signs to express what I want to express. I'm sorry, but I think that the main majority (remember that for latin1/latin2 accented letters are just variants so need less "user memory" than plain different characters) can do with (less than) 256 signs blocks, and switch fonts when "speaking" about special things (the switch can be automatic by the way). As far as TeX is concerned, all the control codepoints (positions) are useless in the fonts. There is still availbale room even if for the latin1 encoded tfm built for (next) kerTeX from PostScript core. Does a whole Unicode "Times-Roman" font makes sense? Ideograms in "Times-Roman"? So Unicode is not a panacea. It is a mean, not an aim. ("Un moyen, pas une fin.") -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 11:48 ` tlaronde @ 2011-06-27 12:36 ` erik quanstrom 2011-06-27 14:38 ` Karljurgen Feuerherm 2011-06-27 17:20 ` tlaronde 0 siblings, 2 replies; 52+ messages in thread From: erik quanstrom @ 2011-06-27 12:36 UTC (permalink / raw) To: 9fans > But I don't want to have the obligation to "know" 65536 signs to > express what I want to express. I'm sorry, but I think that the > main majority (remember that for latin1/latin2 accented letters > are just variants so need less "user memory" than plain different > characters) can do with (less than) 256 signs blocks, and switch > fonts when "speaking" about special things (the switch can be > automatic by the way). As far as TeX is concerned, all the control > codepoints (positions) are useless in the fonts. There is still > availbale room even if for the latin1 encoded tfm built for (next) > kerTeX from PostScript core. there are currently 0x10ffff+1 codepoints (1114112), not 65536, but only 23669 + the large chinese blocks are currently defined. but anyway, i think you are missing the point. every one of those codepoints is used, or was used in human written communication. the fact that you or i probablly don't know them all is beside the point entirely. there are 600000 words in the oxford english dictionary. i don't know them all. let's suppose i had the power to eliminate all the ones that i don't know. wouldn't that be a horrible idea? then i would not be able to learn any new words. odious. so with unicode. if you strip out all the languages you don't know by restricting yourself to the latin1 codepoints [0, 256), then you can't easily add, say, greek or sumerian codepoints should you or anyone else need them. since, as you can see, there is a 1:1 identity mapping between latin1 and unicode codepoints [0, 256), i don't see why one wouldn't give oneself the option to increase this subset to cover more ground. i use alphas, arrows, math symbols, etc. quite often in code. and even more often when i used to use tex. it's really quite a drag to read \alpha instead of “α.” > Does a whole Unicode "Times-Roman" font makes sense? Ideograms in > "Times-Roman"? i get confused on terms. i think the right term is typeface. extended fonts collections of a given typeface covering very wide sections of unicode do exist and are sold by the major font vendors. i don't think that it's too hard to imagine that one can make most symbols look compatable enough. in fact, i'm using a font with ~32000 glyphs on my plan 9 terminal right now. and there's no penalty for having that many glyphs. it just means that my font file as a couple hundred subfonts. these are only open if needed. typically only 3 subfonts are open at any one time. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 12:36 ` erik quanstrom @ 2011-06-27 14:38 ` Karljurgen Feuerherm 2011-06-27 17:20 ` tlaronde 1 sibling, 0 replies; 52+ messages in thread From: Karljurgen Feuerherm @ 2011-06-27 14:38 UTC (permalink / raw) To: 9fans [-- Attachment #1: Type: text/plain, Size: 2966 bytes --] Thanks for bringing up Sumerian (better: Sumero-Akkadian Cuneiform). I was thinking along exactly those lines. For me at least, solutions that satisfy 'the majority' are no solutions at all. And obviously, I'm not alone. (Though it could well be that I missed the intent of Thierry's comment and am barking up the wrong tree.) K >>> erik quanstrom <quanstro@quanstro.net> 06/27/11 8:36 AM >>> > But I don't want to have the obligation to "know" 65536 signs to > express what I want to express. I'm sorry, but I think that the > main majority (remember that for latin1/latin2 accented letters > are just variants so need less "user memory" than plain different > characters) can do with (less than) 256 signs blocks, and switch > fonts when "speaking" about special things (the switch can be > automatic by the way). As far as TeX is concerned, all the control > codepoints (positions) are useless in the fonts. There is still > availbale room even if for the latin1 encoded tfm built for (next) > kerTeX from PostScript core. there are currently 0x10ffff+1 codepoints (1114112), not 65536, but only 23669 + the large chinese blocks are currently defined. but anyway, i think you are missing the point. every one of those codepoints is used, or was used in human written communication. the fact that you or i probablly don't know them all is beside the point entirely. there are 600000 words in the oxford english dictionary. i don't know them all. let's suppose i had the power to eliminate all the ones that i don't know. wouldn't that be a horrible idea? then i would not be able to learn any new words. odious. so with unicode. if you strip out all the languages you don't know by restricting yourself to the latin1 codepoints [0, 256), then you can't easily add, say, greek or sumerian codepoints should you or anyone else need them. since, as you can see, there is a 1:1 identity mapping between latin1 and unicode codepoints [0, 256), i don't see why one wouldn't give oneself the option to increase this subset to cover more ground. i use alphas, arrows, math symbols, etc. quite often in code. and even more often when i used to use tex. it's really quite a drag to read \alpha instead of “α.” > Does a whole Unicode "Times-Roman" font makes sense? Ideograms in > "Times-Roman"? i get confused on terms. i think the right term is typeface. extended fonts collections of a given typeface covering very wide sections of unicode do exist and are sold by the major font vendors. i don't think that it's too hard to imagine that one can make most symbols look compatable enough. in fact, i'm using a font with ~32000 glyphs on my plan 9 terminal right now. and there's no penalty for having that many glyphs. it just means that my font file as a couple hundred subfonts. these are only open if needed. typically only 3 subfonts are open at any one time. - erik [-- Attachment #2: HTML --] [-- Type: text/html, Size: 4344 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 12:36 ` erik quanstrom 2011-06-27 14:38 ` Karljurgen Feuerherm @ 2011-06-27 17:20 ` tlaronde 2011-06-27 17:34 ` erik quanstrom 2011-06-27 23:45 ` Karljurgen Feuerherm 1 sibling, 2 replies; 52+ messages in thread From: tlaronde @ 2011-06-27 17:20 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Mon, Jun 27, 2011 at 08:36:35AM -0400, erik quanstrom wrote: > > and there's no penalty for having that many glyphs. it just > means that my font file as a couple hundred subfonts. these > are only open if needed. typically only 3 subfonts are open > at any one time. As can be clear from the even more desastrous level of my english than usual, I only had a minute or two to write the message. I DON'T SAY THAT I WILL RESTRICT TEX TO THE FIRST 256 CODEPOINTS. This is precisely why I have rejected your proposal. KerTeX will provide, because this is what is in the fonts, "latin1" font. But if there are other fonts for cyrillic, greek etc. I don't want to render TeX unusable. There are fonts on the one side; TeX on another. And TFM to link them. I only say that: 1) Forcing, as this was written in the XeTeX FAQ, user to enter the special codepoint for the fi ligature since, white eyes, scornful wave of the hand: "this is the way this is done with Unicode" is sheer stupidity. I don't want to be forced to specify a printing sugar instead of the composition of the alphabet. I want to be able to use ~ as a visible sign saying: don't break here, and not the "unbreakable" space plaguing messages nowdays. Etc. I hate languages supposed to be human oriented taking whites as semantically significant... 2) I say that one can add utf as input for TeX, and use whatever one wants/needs---if I speak about Linear B that's perhaps because I have some interest even in defunct scripting, no?---without dramatically changing everything in the core TeX engine. TeX, for maths, already switches fonts by using almost 16 bits. The same can be made for text, and there is no need to extend the conception of a font metric for TeX (except marginally for the flipping/mirroring of boxes for direction of writing), and one can have everything with TeX using 256 glyphes SUBFONTS, and more precisely, 256 entries TFM. (I add that all in all, if languages are not mixed, the present TeX can be used for whatever direction of writing: let the PS interpreter mirror the page, rotate, flip etc.; more involved when languages are mixed in the same page.) Subfonts are precisely what you are talking about. TeX does not use fonts. TeX uses TeX Font Metric. It needs only the metrics, and one can use whatever fonts, as long as it is described according to the expectations of TeX. One can imaging extending a little TeX to switch to TFM "subfonts" to let it mastered a layout that the _drivers_ will have to translate according to the native format of the fonts (the drivers handling really the direction of writing: depending on the hint, the box rendered is mirrored, flipped etc., TeX needing only to know what is the height and width [the correct corner] of the result). "Simplicity is the shortest path to the truth." I suspect that the current state is not the truth, considering the path taken and the size of the change files. (In an interview, D.E.K. spoke about omega, whose change file [against TeX source] was several times the size of the TeX source...) -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 17:20 ` tlaronde @ 2011-06-27 17:34 ` erik quanstrom 2011-06-27 18:01 ` tlaronde 2011-06-27 23:45 ` Karljurgen Feuerherm 1 sibling, 1 reply; 52+ messages in thread From: erik quanstrom @ 2011-06-27 17:34 UTC (permalink / raw) To: 9fans > As can be clear from the even more desastrous level of my english > than usual, I only had a minute or two to write the message. > > I DON'T SAY THAT I WILL RESTRICT TEX TO THE FIRST 256 CODEPOINTS. > > This is precisely why I have rejected your proposal. KerTeX will > provide, because this is what is in the fonts, "latin1" font. But if > there are other fonts for cyrillic, greek etc. I don't want to render > TeX unusable. There are fonts on the one side; TeX on another. And TFM > to link them. no need to yell. i must be confused. i thought you said that you were using latin1 for .tex files. i don't see a forward-compatable way to get from latin1 input to utf-8 input. > I only say that: > > 1) Forcing, as this was written in the XeTeX FAQ, user to enter the > special codepoint for the fi ligature since, white eyes, scornful wave > of the hand: "this is the way this is done with Unicode" is sheer > stupidity. I don't want to be forced to specify a printing sugar > instead of the composition of the alphabet. I want to be able to use ~ > as a visible sign saying: don't break here, and not the "unbreakable" > space plaguing messages nowdays. Etc. I hate languages supposed to be > human oriented taking whites as semantically significant... i don't even have an opinion on this. i don't understand the conflation of the input character set and tex's internal representations. could you explain why you are taking about them as the same? to be brutally honest, tex could internally use an array of monkeys flinging poo to represent characters /internally/ and i would be much happer than with a reasonable internal representation and a difficult and incompatable external representation. at least that way the monkeys flinging poo are hermetically sealed within the program and not flinging poo all over my system. :-) > 2) I say that one can add utf as input for TeX, and use whatever one > wants/needs---if I speak about Linear B that's perhaps because I have > some interest even in defunct scripting, no?---without dramatically > changing everything in the core TeX engine. TeX, for maths, already > switches fonts by using almost 16 bits. The same can be made for text, > and there is no need to extend the conception of a font metric for TeX > (except marginally for the flipping/mirroring of boxes for direction of > writing), and one can have everything with TeX using 256 glyphes > SUBFONTS, and more precisely, 256 entries TFM. (I add that all in all, > if languages are not mixed, the present TeX can be used for whatever > direction of writing: let the PS interpreter mirror the page, rotate, > flip etc.; more involved when languages are mixed in the same page.) again, i don't think anyone cares if this is how things work internally. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 17:34 ` erik quanstrom @ 2011-06-27 18:01 ` tlaronde 2011-06-27 21:17 ` Michael Kerpan 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-27 18:01 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Mon, Jun 27, 2011 at 01:34:07PM -0400, erik quanstrom wrote: > > i don't even have an opinion on this. i don't understand the conflation > of the input character set and tex's internal representations. could > you explain why you are taking about them as the same? > > to be brutally honest, tex could internally use an array of monkeys > flinging poo to represent characters /internally/ and i would be much > happer than with a reasonable internal representation and a difficult > and incompatable external representation. at least that way the monkeys > flinging poo are hermetically sealed within the program and not flinging > poo all over my system. :-) In TeX there is, initially, a defined subset: ASCII. Because TeX is a compiler/interpreter and one needs to be able to send some "bootstrapping" commands. This can be rapidly overwritten (but starting with some ASCII like characters). This can be almost arbitrary. What people were precisely arguing is precisely that external business, and "state of the art" (that is soon to be "out of fashion") fonts and whatever mood "du jour" should lead to the rewrite of TeX internals. I precisely claim to let TeX internals alone. The majority of the work is external (the main being in the dvi drivers). If I want to use ligatures, I shall be able to do. If others want to put directly the code for the ligatured glyph, they can, but this is their problem and not a holy rule. >[...] > again, i don't think anyone cares if this is how things work internally. > Unfortunately wrong. Read back the thread (if you really have nothing more interesting to do). I have explained this "256 subfonts" business in the first message, and immediately got answers that the "correct way" was teaching TeX "modern" fonts. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 18:01 ` tlaronde @ 2011-06-27 21:17 ` Michael Kerpan 2011-06-28 11:25 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: Michael Kerpan @ 2011-06-27 21:17 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Mon, Jun 27, 2011 at 2:01 PM, <tlaronde@polynum.com> wrote: > On Mon, Jun 27, 2011 at 01:34:07PM -0400, erik quanstrom wrote: >> >> i don't even have an opinion on this. i don't understand the conflation >> of the input character set and tex's internal representations. could >> you explain why you are taking about them as the same? >> >> to be brutally honest, tex could internally use an array of monkeys >> flinging poo to represent characters /internally/ and i would be much >> happer than with a reasonable internal representation and a difficult >> and incompatable external representation. at least that way the monkeys >> flinging poo are hermetically sealed within the program and not flinging >> poo all over my system. :-) > > In TeX there is, initially, a defined subset: ASCII. Because TeX is a > compiler/interpreter and one needs to be able to send some > "bootstrapping" commands. This can be rapidly overwritten (but starting > with some ASCII like characters). This can be almost arbitrary. > > What people were precisely arguing is precisely that external business, > and "state of the art" (that is soon to be "out of fashion") fonts and > whatever mood "du jour" should lead to the rewrite of TeX internals. > > I precisely claim to let TeX internals alone. The majority of the work > is external (the main being in the dvi drivers). If I want to use > ligatures, I shall be able to do. If others want to put directly the > code for the ligatured glyph, they can, but this is their problem and > not a holy rule. That's not how OpenType works, actually. It actually works more like TeX, in that it allows for files to store text as basic ASCII/Latin-1/8-bit UTF-8 subset format which an OpenType-enabled renderer (such as XeTeX, InDesign or even Office 2010) then presents (on screen or on page) as the correct ligature. Thus the big advantage of OpenType over, say Type 1, is that it offers a featureset much closer to Computer Modern's full set of ligatures, accents and alternatives than Type 1 ever could (at least without serious scripting to combine multiple Type 1 fonts containing all the needed glyphs into a single "virtual font" as described in your first post) > Unfortunately wrong. Read back the thread (if you really have > nothing more interesting to do). I have explained this "256 subfonts" > business in the first message, and immediately got answers that > the "correct way" was teaching TeX "modern" fonts. The subfont system works fine if you both have a complete Type 1 font set including all the "expert fonts" including the extra glyphs and the like AND are willing to put together a mapping for it. The problem is that fonts haven't shipped (to consumers, at least) in that form for about 10 years. Unless I fundamentally misunderstand the subfont system (which I admit that I might), for any font made within the last 10 years or so, using the subfont/virtual font system would entail the following steps: 1. Break the complete OpenType font down into a combination of PFBs and AFMs containing the complete set of characters between them, carefully remapping each glyph outside of 8-bit range into it so that they remain accessible. This may break the license agreement for many fonts and would almost certainly cause the loss of many kerning pairs, hints and other metadata (I'm not sure how much of that TeX uses, so that may not be as big a problem as it sounds) 2. Build the virtual font mappings as with a with a "real" Type 1 set 3. Hope for the best. Given the complexity of the process involved, I would hope you can understand why, as a USER, teaching TeX to play nice with modern fonts looks like a good way to go ;) Again, none of this is meant as a put-down of your quite impressive work, but rather as a reminder of some areas where others might run into problems with making USE of said work. Mike ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 21:17 ` Michael Kerpan @ 2011-06-28 11:25 ` tlaronde 0 siblings, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-28 11:25 UTC (permalink / raw) To: mjkerpan, Fans of the OS Plan 9 from Bell Labs On Mon, Jun 27, 2011 at 05:17:16PM -0400, Michael Kerpan wrote: > > The subfont system works fine if you both have a complete Type 1 font > set including all the "expert fonts" including the extra glyphs and > the like AND are willing to put together a mapping for it. The problem > is that fonts haven't shipped (to consumers, at least) in that form > for about 10 years. Unless I fundamentally misunderstand the subfont > system (which I admit that I might), for any font made within the > last 10 years or so, using the subfont/virtual font system would > entail the following steps: > 1. Break the complete OpenType font down into a combination of PFBs > and AFMs containing the complete set of characters between them, >[...] You miss my point: the "subfont" is just letting the inners of TeX alone by splitting, for TeX, tfm in subfonts. The fonts by themselves are left alone. The dvi drivers deal with the fonts; not TeX. TeX needs only the metrics. After some substitutions (by virtual fonts), TeX tells only "take this glyph of this font and put it here". And the font, eventually, has a foreign link to the TFM used by TeX. The main particularity with afm2tfm(1) is that the PostScript standard encoding is not Unicode, even not latin1. So one needs to specify an encoding. As long as the encoding of the fonts is known, a program will "import" (just creates the TFM) for a font so that TeX will know the metrics. The main support is in the dvi drivers (mainly dvips(1)). -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 17:20 ` tlaronde 2011-06-27 17:34 ` erik quanstrom @ 2011-06-27 23:45 ` Karljurgen Feuerherm 2011-06-27 23:48 ` erik quanstrom 2011-06-28 11:19 ` tlaronde 1 sibling, 2 replies; 52+ messages in thread From: Karljurgen Feuerherm @ 2011-06-27 23:45 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 643 bytes --] Thierry, > I only say that: > 1) Forcing, as this was written in the XeTeX FAQ, user to enter the special codepoint for the fi ligature since, white eyes, scornful wave of the hand: "this is the way this is done with Unicode" is sheer stupidity. I don't know who told you that... just because there is a codepoint for something does not mean that one has to access that codepoint directly in all cases. Software at various levels can render a ligature on the basis of various actual character sequences (e.g. f + i, or f, i when ligatures are forced, etc. It's simply a level of what support one wishes to offer.... KF [-- Attachment #2: HTML --] [-- Type: text/html, Size: 1705 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 23:45 ` Karljurgen Feuerherm @ 2011-06-27 23:48 ` erik quanstrom 2011-06-28 11:19 ` tlaronde 1 sibling, 0 replies; 52+ messages in thread From: erik quanstrom @ 2011-06-27 23:48 UTC (permalink / raw) To: 9fans > I don't know who told you that... just because there is a codepoint > for something does not mean that one has to access that codepoint > directly in all cases. Software at various levels can render a > ligature on the basis of various actual character sequences (e.g. f + > i, or f, i when ligatures are forced, etc. > > It's simply a level of what support one wishes to offer.... +1. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-27 23:45 ` Karljurgen Feuerherm 2011-06-27 23:48 ` erik quanstrom @ 2011-06-28 11:19 ` tlaronde 2011-06-28 11:32 ` tlaronde ` (2 more replies) 1 sibling, 3 replies; 52+ messages in thread From: tlaronde @ 2011-06-28 11:19 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Mon, Jun 27, 2011 at 07:45:34PM -0400, Karljurgen Feuerherm wrote: > Thierry, > > > I only say that: > > > 1) Forcing, as this was written in the XeTeX FAQ, user to enter the > special codepoint for the fi ligature since, white eyes, scornful wave > of the hand: "this is the way this is done with Unicode" is sheer > stupidity. > > I don't know who told you that... just because there is a codepoint for something does not mean that one has to access that codepoint directly in all cases. Software at various levels can render a ligature on the basis of various actual character sequences (e.g. f + i, or f, i when ligatures are forced, etc. > > It's simply a level of what support one wishes to offer.... This is exactly what I'm trying to say. If one enters \'e, \' is just the "charname" or macro command to access the acute accent in the font. One can enter directly the code for the acute accent. Or one can enter directly the é (if the CID entered is classified as "other" [literal], and the fonts have something at the corresponding index). BUT the documentation found told that with "modern" fonts, one has the absolute obligation threatened by Thy Unicode GOD to enter the codepoint and that ligatures were deprecated. TeX is absolutely agnostic. It is an engine, a compiler/interpreter. Even tex(1) is just the name of an instance of TeX with a special convention: D.E. Knuth's plain TeX. some \'e let CID > > KF -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-28 11:19 ` tlaronde @ 2011-06-28 11:32 ` tlaronde 2011-06-28 12:16 ` erik quanstrom 2011-06-29 23:43 ` Karljurgen Feuerherm 2 siblings, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-28 11:32 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Tue, Jun 28, 2011 at 01:19:15PM +0200, tlaronde@polynum.com wrote: >[...] > some \'e let > CID Please ignore this trailing garbage. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-28 11:19 ` tlaronde 2011-06-28 11:32 ` tlaronde @ 2011-06-28 12:16 ` erik quanstrom 2011-06-29 23:43 ` Karljurgen Feuerherm 2 siblings, 0 replies; 52+ messages in thread From: erik quanstrom @ 2011-06-28 12:16 UTC (permalink / raw) To: 9fans > BUT the documentation found told that with "modern" fonts, one has the > absolute obligation threatened by Thy Unicode GOD to enter the codepoint > and that ligatures were deprecated. well of course, just use tcs. ;-|. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-28 11:19 ` tlaronde 2011-06-28 11:32 ` tlaronde 2011-06-28 12:16 ` erik quanstrom @ 2011-06-29 23:43 ` Karljurgen Feuerherm 2011-06-30 13:02 ` tlaronde 2 siblings, 1 reply; 52+ messages in thread From: Karljurgen Feuerherm @ 2011-06-29 23:43 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 5565 bytes --] I'd like to make a few comments concerning what you say below. 1. I've been involved with Unicode, both UTC and as a representative to WG2, and I can confidently affirm that there is no Unicode God. No one has ever said There is no Code but Unicode, and UTC/WG2 is its prophet, or anything like that. If you have a reference to the Unicode Standard where I can read in black and white what you are referring to, I will happily look at it. (This is not intended as a smart remark. I'm quite seriously interested in understanding the facts of this issue.) 2. Anyone involved in Unicode, including inner core members of UTC etc, recognize that it's far from perfect. There is acknowledgement that a number of things could have been handled differently, but weren't. Stability Policy may seem like a problematic restriction to some in cases like this, but it guarantees backward compatibility, so has wisdom to it. 3. Whatever views one may have on Unicode, for better or worse, it is what it is. As you said yourself, c'est un moyen et non pas une fin.... One is free to use it, or not, and or devise alternatives. (But more on alternatives below.) 4. You suggested in an earlier email that you'd like to think the whole thing through carefully in advance, rather than implement things in stages, as others do, who then never get to the advanced stages. To me this begs the question of whether such is always universally the case. In particular, if anyone or any group tried/had tried to implement all of what Unicode proposes to be/become (UCS--Universal Character Set), the sheer magnitude of the task (which of course grows over time since scripts either in themselves or as a set are not static), he/she/they would never get the thing off the ground. This is in part why there are (arguably) flaws in Unicode. In any case, I seriously doubt that even if one attempted to "redo" it "the right way this time" one would manage. This is just not within the grasp of human endeavour. The mistakes would simply be different or in different areas. Likewise, there are plenty of things one could bring against the process of Unicode endorsing proposals, i.e. the inherent politics of interested groups, but that again is always a reality. 5. All that being said--Plan 9, as far as I can see, intentionally supports Unicode (see http://plan9.bell-labs.com/plan9/about.html). ( http://plan9.bell-labs.com/plan9/about.html). ) So to me, it's a non-starter to want to port *TeX to Plan 9 but rail against Unicode, whether justifiably or through misunderstanding. 6. Unicode isn't Eternal, any more than any other encoding standard. (I'm sure there were--and perhaps still are--those who think that BCD, no wait! EBCD, no wait! ASCII, no wait...!--were/are the be all and end all). In time, something else will develop in response to developing needs. 7. But at present, the recognized standard out there that for most practical intents and purposes (in particular, to service the needs of something other than just North American anglophone techie society) is Unicode, with whatever blemishes it may have. So it seems to me that in keeping with your principle alluded to above, and given that were talking about a Plan 9 environment here, you ought to be talking UTF-8 right off the bad. As I said--"seems to me". Could be I'm seriously misunderstanding the discussion... but then again, the diminishing dialogue in terms of number of participants suggests to me that there may be at least *some* truth in what I'm thinking.... Please don't think this is intended as a rant, either due to the way I've formatted this or on account of the content. I'm interested in following what you're doing; I'm just a bit puzzled, and I sincerely wish you the best in your efforts with this project. K >>> <tlaronde@polynum.com> 06/28/11 7:19 AM >>> On Mon, Jun 27, 2011 at 07:45:34PM -0400, Karljurgen Feuerherm wrote: > Thierry, > > > I only say that: > > > 1) Forcing, as this was written in the XeTeX FAQ, user to> special codepoint for the fi ligature since, white eyes, scornful wave > of the hand: "this is the way this is done with Unicode" is sheer > stupidity. > > I don't know who told you that... just because there is a codepoint for something does not mean that one has to access that codepoint directly in all cases. Software at various levels can render a ligature on the basis of various actual character sequences (e.g. f + i, or f, i when ligatures are forced, etc. > > It's simply a level of what support one wishes to offer.... This is exactly what I'm trying to say. If one enters \'e, \' is just the "charname" or macro command to access the acute accent in the font. One can enter directly the code for the acute accent. Or one can enter directly the é (if the CID entered is classified as "other" [literal], and the fonts have something at the corresponding index). BUT the documentation found told that with "modern" fonts, one has the absolute obligation threatened by Thy Unicode GOD to enter the codepoint and that ligatures were deprecated. TeX is absolutely agnostic. It is an engine, a compiler/interpreter. Even tex(1) is just the name of an instance of TeX with a special convention: D.E. Knuth's plain TeX. some \'e let CID > > KF -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C [-- Attachment #2: HTML --] [-- Type: text/html, Size: 8356 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-29 23:43 ` Karljurgen Feuerherm @ 2011-06-30 13:02 ` tlaronde 2011-06-30 13:14 ` erik quanstrom 2011-06-30 14:51 ` Karljurgen Feuerherm 0 siblings, 2 replies; 52+ messages in thread From: tlaronde @ 2011-06-30 13:02 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Wed, Jun 29, 2011 at 07:43:08PM -0400, Karljurgen Feuerherm wrote: >[...] First to make clear what I was refering to (and making a false generalization) : the XeTeX FAQ: "However, standard Unicode-compliant fonts do not include ligatures for these sequences, as the normal expectation is that the actual Unicode characters will be used in the source text." Re-reading it, it's not "all ligatures" that are gone with "Unicode-compliant fonts", but it spoke about the em- and en-dashes and double quotes. So on these ones, I plead guilty. But starting with "modern fonts", "modern system", "archaic" and the like, it's like starting with: "only Adolf Hitler would still use not Unicode fonts". > 4. You suggested in an earlier email that you'd like to think the whole > thing through carefully in advance, rather than implement things in > stages, as others do, who then never get to the advanced stages. To me > this begs the question of whether such is always universally the case. There are 2 essential things in a human mental process: 1) there is a string of thoughts; even ideas that seem for others foreign had a path in the discoverer mind; he started in the vicinity of his knowledge. One that gets dropped in the middle of nowhere will never make a discovery. The former is clearing the virgin forest; the latter is beating around the bush. The first is the tortoise; the second the hare. 2) There is no actual infinite: resources, specially in time, are limited. And generally, when a resource is of the highest quality, it is scarce. For example, I have a stupendous patience; but not a lot of it. A first step for TeX is obvious: put aside the direction of writing, and do things so that at least Unicode (in utf encoding) can be mastered as input (and at least interactive output), and also, since it is a formatting system, that adequate fonts can be accessed. But, as the present state allows the use for every character set that fits in eight bits, by using (for Plan9 users) tcs(1) to feed TeX with what it expects, I will not delay forever the release of 1.0 waiting for this next solution. And this extension to utf will be done in the spirit of utf: it will be an extension, but compatible with the existing. One would take the TeXbook and obtains exactly what described here. In particular, selecting a Computer Modern font will work as described in the TeXbook. >[...] > 5. All that being said--Plan 9, as far as I can see, intentionally > supports Unicode (see http://plan9.bell-labs.com/plan9/about.html). ( > http://plan9.bell-labs.com/plan9/about.html). ) So to me, it's a > non-starter to want to port *TeX to Plan 9 but rail against Unicode, > whether justifiably or through misunderstanding. I have written that TeX shall accept utf as input and output (for text) and utf is an encoding of a special all encompassing character set: Unicode. So where did I wrote that I don't plan to support Unicode (because of utf)? What I did say, and say again, is that, whether people continue throwing "archaic vs modern" and various other Godwin points arguments or not, if to my taste I still want to have ligatures for em-, en-dashes, various quoting and whatever; and if I want to put in ASCII control places _in font_, characters expected for tex-text compatibility, I will do. This doesn't prevent anybody from doing whatever one likes; but symetrically, I'm libre to do whatever I want. Specially if I do the work. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-30 13:02 ` tlaronde @ 2011-06-30 13:14 ` erik quanstrom 2011-06-30 13:47 ` tlaronde 2011-06-30 14:51 ` Karljurgen Feuerherm 1 sibling, 1 reply; 52+ messages in thread From: erik quanstrom @ 2011-06-30 13:14 UTC (permalink / raw) To: 9fans > But, as the present state allows the use for every character set that > fits in eight bits, by using (for Plan9 users) tcs(1) to feed TeX with > what it expects, I will not delay forever the release of 1.0 waiting for > this next solution. good grief. how hard is it to write this code!? this bit depends on just a few simple functions from the plan 9 c library and that can be easily appropriated, namely chartorune and fullrune, and a user-defined getc. (not compiled, just dashed off. just an example of how easy this is.) char texgetutfchar(void) { char ibuf[UTFmax + 1]; int c, utfi; Rune r; for(;;){ if(utfi == sizeof ibuf - 1){ itfi = 0; print("garbage input rejected\n"); } ibuf[utfi++] = getc(); ibuf[utfi] = 0; if(fullrune(ibuf, utfi)){ r = chartorune(&r, ibuf); utfi = 0; if(r >= 256){ print("codepoint %#.6ux rejected", r); continue; } return (char)r; } } } - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-30 13:14 ` erik quanstrom @ 2011-06-30 13:47 ` tlaronde 0 siblings, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-30 13:47 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 30, 2011 at 09:14:10AM -0400, erik quanstrom wrote: > > But, as the present state allows the use for every character set that > > fits in eight bits, by using (for Plan9 users) tcs(1) to feed TeX with > > what it expects, I will not delay forever the release of 1.0 waiting for > > this next solution. > > good grief. how hard is it to write this code!? this bit depends on just a > few simple functions from the plan 9 c library and that can be easily > appropriated, namely chartorune and fullrune, and a user-defined getc. > (not compiled, just dashed off. just an example of how easy this is.) This is easy just for input. But as I said, constraining to only the first 256 bits will render TeX unusable for other 8bits sets (latin2, etc.). Would starting with an ASCII character set and guessing the character set from the first not ASCII code (and remaining in this state) work? And what will be the interaction of for example the LaTeX macro-definitions, that handle re-encoding out of my reach? The place where to put the conversion is (thanks to D.E.K.) well identified. The problem is that (I speak now about going from byte to wyde), this has an inpact on macro-définition, and this is useless if there is not the adequate font support. So limiting the input for now to a subset of Unicode will only be a (superficial) convenience for the users of this subset and forbid the use of TeX to others. Trying to guess the 8 bits character set could lead to some surprises (and I'm reluctant even temporarily to introduce some KERTEX_CS to specify the character set). And extending at least to whatever left-to-right can not be confined only to the input/output convention, but some surgery (even if I want it limited) must be put in the guts of TeX. So for now, I will complete the fonts; fix what I wanted to fix (essentially so that if D.E. Knuth had the need to reinstall his software, he could have the 5 minutes solution at hand); give a look at the newer version of MetaPost. And after 1.0 will be another day. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-30 13:02 ` tlaronde 2011-06-30 13:14 ` erik quanstrom @ 2011-06-30 14:51 ` Karljurgen Feuerherm 2011-06-30 15:22 ` Michael Kerpan 2011-06-30 16:25 ` tlaronde 1 sibling, 2 replies; 52+ messages in thread From: Karljurgen Feuerherm @ 2011-06-30 14:51 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs [-- Attachment #1: Type: text/plain, Size: 1215 bytes --] Thanks for this. Two notes: >Re-reading it, it's not "all ligatures" that are gone with "Unicode-compliant fonts", but it spoke about the em- and en-dashes and double quotes. So on these ones, I plead guilty. Alright. Not a big deal, it seems to me. >But starting with "modern fonts", "modern system", "archaic" and the like, it's like starting with: "only Adolf Hitler would still use not Unicode fonts". Looking here: http://scripts.sil.org/cms/scripts/page.php?item_id=xetex_faq ( http://scripts.sil.org/cms/scripts/page.php?item_id=xetex_faq ) I cannot find this; you'll have to help me out. But still: it's not about being Adolf Hitler by any means. XeTeX aims to be a Unicode compliant system, and that means adhering to the Unicode standard. I personally don't see a need to defend the purpose and use of standards. If you don't want to, don't; you are free, as you said, to do as you please, and no one is disputing that. But then you and everyone else have to accept the obvious consequences. For all its flaws (and even the encoding I helped to author has them), to *me* (and to the authors of the FAQ you're looking at) the benefits far far outweigh the downsides. Best KF [-- Attachment #2: HTML --] [-- Type: text/html, Size: 2688 bytes --] ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-30 14:51 ` Karljurgen Feuerherm @ 2011-06-30 15:22 ` Michael Kerpan 2011-06-30 16:25 ` tlaronde 1 sibling, 0 replies; 52+ messages in thread From: Michael Kerpan @ 2011-06-30 15:22 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 30, 2011 at 10:51 AM, Karljurgen Feuerherm <kfeuerherm@wlu.ca> wrote: > Thanks for this. Two notes: > >>Re-reading it, it's not "all ligatures" that are gone with > "Unicode-compliant fonts", but it spoke about the em- and en-dashes and > double quotes. So on these ones, I plead guilty. > > Alright. Not a big deal, it seems to me. Unless XeTeX has changed since I last used it, traditional TeX ligatures ARE still there.if you load the fonts properly. I always used tradition TeX punctuation pseudo-ligatures when I used XeTeX because, unlike accents which are easy to type and not used very frequently, dashes and quotation marks and frequently used and hard to type. Thus shortcuts are welcome. Mike ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-30 14:51 ` Karljurgen Feuerherm 2011-06-30 15:22 ` Michael Kerpan @ 2011-06-30 16:25 ` tlaronde 2011-06-30 16:31 ` erik quanstrom 1 sibling, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-30 16:25 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 30, 2011 at 10:51:53AM -0400, Karljurgen Feuerherm wrote: > [...] > > >But starting with "modern fonts", "modern system", "archaic" and the > like, it's like starting with: "only Adolf Hitler would still use not > Unicode fonts". > > Looking here: http://scripts.sil.org/cms/scripts/page.php?item_id=xetex_faq ( http://scripts.sil.org/cms/scripts/page.php?item_id=xetex_faq ) I cannot find this; you'll have to help me out. > [...] It was not about the XeTeX FAQ this time, it was about this thread. When I first said: OK, I take the bull by the horns, I will redo from scratch a TeX distribution, I heard: "current TeX on Plan9, even if obsolete, is enough..." When I announced the job was done with the core of TeX, answer: "Nobody uses TeX: everybody uses LaTeX; so it is almost useless." [This is my special favorite!] When I saw that the TFM provided with recent TeX distributions provide latin1 glyphes but not at the latin1 (i.e. Unicode) positions, I decided it was an historical artefact and was inconsistent. Then my first message and the avalanche about "teaching TeX _modern_ fonts" etc. For me _these_ arguments about modern, archaic etc. are Godwin points. A vast majority of contemporary mathematicians could read the "archaic" Euclid to learn, for example, that Euclid has never written that "a line is composed of points" even less "a line is composed of an infinity of points". And they should confer this with the fifth book. Because if the Greeks have not said that, there is probably a reason why... So back to the technic: as far as TeX is concerned, there is input (provided by an user, normally) leading to layout rendered by a dvi driver. A font interacts with the user input by providing some facilities (ligatures); since these facilities can be added to TeX view of the font (TFM), without even changing the fonts as viewed by the drivers, I don't see why they should be discarded. Furthermore, I don't see why some special glyphes put, in plain TeX conventions, in ASCII control positions should not be added to TeX view of a font (TFM). I've read in a hurry the directory layout of XeTeX, the WEB change file and the FAQ, just in order to have an idea about what was going on and a rough idea of the work needed to import it in kerTeX. Hence my mistake about believing "modern fonts" have thrown away _every ligature_. I'm relieved to see that I was wrong on this one. But I would probably have read the whole more coolly if people have not used some arguments. I don't despise XeTeX. Nor Unicode. And I will take Unicode as is. But I will take TeX conventions as is too, since I'm working on TeX, and not another formatting system; since these conventions are confined to the ASCII subrange and only diverging from ASCII for the not glyph positions. I still fail to see what's the big deal? -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-30 16:25 ` tlaronde @ 2011-06-30 16:31 ` erik quanstrom 2011-06-30 17:00 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: erik quanstrom @ 2011-06-30 16:31 UTC (permalink / raw) To: 9fans > I don't despise XeTeX. Nor Unicode. And I will take Unicode as is. But I > will take TeX conventions as is too, since I'm working on TeX, and not > another formatting system; since these conventions are confined to the > ASCII subrange and only diverging from ASCII for the not glyph > positions. I still fail to see what's the big deal? you can't have it both ways. you can't at the same time say tex is only defined for ascii, so utf-8 is a non sequitor, and at the same time put out a version of tex that takes latin1 input. the question is, should you use latin1 or utf-8. and i think the answer to this for plan 9 is pretty clear. use utf-8. a subset would be much better than latin1. the fact that there is a latin2 is proof that latin1 is misguided in ways that utf-8 does fix. - erik ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-30 16:31 ` erik quanstrom @ 2011-06-30 17:00 ` tlaronde 2011-06-30 17:12 ` tlaronde 0 siblings, 1 reply; 52+ messages in thread From: tlaronde @ 2011-06-30 17:00 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 30, 2011 at 12:31:17PM -0400, erik quanstrom wrote: > > I don't despise XeTeX. Nor Unicode. And I will take Unicode as is. But I > > will take TeX conventions as is too, since I'm working on TeX, and not > > another formatting system; since these conventions are confined to the > > ASCII subrange and only diverging from ASCII for the not glyph > > positions. I still fail to see what's the big deal? > > you can't have it both ways. you can't at the same time say tex is > only defined for ascii, so utf-8 is a non sequitor, and at the same time > put out a version of tex that takes latin1 input. No, this is an error you and others are making. There is a distinction between the encoding input (for the moment TeX expect only 8 bits), and some conventions in the font organization. The Computer Modern fonts provide ASCII "visible" characters (glyphes) in the ASCII positions. But they are other positions in the 0-127 range that are free. These positions are used "internally" by the plain TeX conventions (TeX is the compiler/interpreter; tex(1) is the interpreter having loaded a special set of conventions, the ones of plain TeX; one can do almost totally without or totally differently). These free (as far as a font is concerned) positions are filled with non ASCII characters/glyphes. For example, in the text font layout, the 0x1a position has the glyphe for the \ae. If a user, using plain TeX, specifies \ae, the TFM constructed will give the correct metrics for the glyph, and the dvi driver will put the correct glyph. This does not preclude the user from directly entering the unicode codepoint: in the TFM, if you want, the glyph information is duplicated, in the conventional plain TeX position, and as a literal in the unicode position. In this case, the plain TeX convention is accessed whether by the \ae char definition, the 0x1a code (ASCII control "sub"), or the 0x00e6 unicode. This is not the input encoding; this is a font mapping. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
* Re: [9fans] [RFC] fonts and unicode/utf [TeX] 2011-06-30 17:00 ` tlaronde @ 2011-06-30 17:12 ` tlaronde 0 siblings, 0 replies; 52+ messages in thread From: tlaronde @ 2011-06-30 17:12 UTC (permalink / raw) To: Fans of the OS Plan 9 from Bell Labs On Thu, Jun 30, 2011 at 07:00:48PM +0200, tlaronde wrote: > > This does not preclude the user from directly entering the unicode > codepoint: in the TFM, if you want, the glyph information is duplicated, > in the conventional plain TeX position, and as a literal in the unicode > position. More precisely (I hope), since the "latin minuscule ae" is duplicated too in the conventional TeX positions (overwriting only ASCII range control positions that are useless), the plain TeX conventions work, while user can also enter directly the unicode for this. -- Thierry Laronde <tlaronde +AT+ polynum +dot+ com> http://www.kergis.com/ Key fingerprint = 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C ^ permalink raw reply [flat|nested] 52+ messages in thread
end of thread, other threads:[~2011-06-30 17:12 UTC | newest] Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-06-16 12:17 [9fans] [RFC] fonts and unicode/utf [TeX] tlaronde 2011-06-16 16:49 ` Russ Cox 2011-06-16 17:37 ` tlaronde 2011-06-16 18:43 ` Bakul Shah 2011-06-16 19:20 ` tlaronde 2011-06-16 17:43 ` tlaronde 2011-06-17 14:18 ` Joel C. Salomon 2011-06-17 15:37 ` tlaronde 2011-06-17 18:07 ` Joel C. Salomon 2011-06-17 18:37 ` tlaronde 2011-06-19 14:21 ` erik quanstrom 2011-06-19 14:07 ` erik quanstrom 2011-06-19 16:34 ` tlaronde 2011-06-19 18:01 ` tlaronde 2011-06-19 22:38 ` erik quanstrom 2011-06-20 11:18 ` tlaronde 2011-06-20 21:53 ` erik quanstrom 2011-06-21 10:56 ` tlaronde 2011-06-24 23:05 ` Mauricio CA 2011-06-25 6:50 ` tlaronde 2011-06-25 12:19 ` erik quanstrom 2011-06-25 15:03 ` tlaronde 2011-06-25 15:11 ` erik quanstrom 2011-06-25 16:33 ` tlaronde 2011-06-25 16:34 ` Mauricio CA 2011-06-25 17:11 ` tlaronde 2011-06-25 18:43 ` Michael Kerpan 2011-06-26 7:57 ` tlaronde 2011-06-27 1:01 ` Michael Kerpan 2011-06-27 11:48 ` tlaronde 2011-06-27 12:36 ` erik quanstrom 2011-06-27 14:38 ` Karljurgen Feuerherm 2011-06-27 17:20 ` tlaronde 2011-06-27 17:34 ` erik quanstrom 2011-06-27 18:01 ` tlaronde 2011-06-27 21:17 ` Michael Kerpan 2011-06-28 11:25 ` tlaronde 2011-06-27 23:45 ` Karljurgen Feuerherm 2011-06-27 23:48 ` erik quanstrom 2011-06-28 11:19 ` tlaronde 2011-06-28 11:32 ` tlaronde 2011-06-28 12:16 ` erik quanstrom 2011-06-29 23:43 ` Karljurgen Feuerherm 2011-06-30 13:02 ` tlaronde 2011-06-30 13:14 ` erik quanstrom 2011-06-30 13:47 ` tlaronde 2011-06-30 14:51 ` Karljurgen Feuerherm 2011-06-30 15:22 ` Michael Kerpan 2011-06-30 16:25 ` tlaronde 2011-06-30 16:31 ` erik quanstrom 2011-06-30 17:00 ` tlaronde 2011-06-30 17:12 ` tlaronde
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).