From mboxrd@z Thu Jan 1 00:00:00 1970 Message-Id: <4E0B804C.94AB.00CC.0@wlu.ca> Date: Wed, 29 Jun 2011 19:43:08 -0400 From: "Karljurgen Feuerherm" To: "Fans of the OS Plan 9 from Bell Labs" <9fans@9fans.net> References: <20110625150327.GA425@polynum.com> <20110625171134.GA3661@polynum.com> <20110626075745.GA395@polynum.com> <20110627114856.GA7099@polynum.com> <9308c52f360f6274e0730399741278ce@ladd.quanstro.net> <20110627172006.GA497@polynum.com> <4E08DDDE.94AB.00CC.0@wlu.ca> <20110628111915.GA498@polynum.com> In-Reply-To: <20110628111915.GA498@polynum.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="=__Part0926419C.0__=" Subject: Re: [9fans] [RFC] fonts and unicode/utf [TeX] Topicbox-Message-UUID: f7cf48f2-ead6-11e9-9d60-3106f5b1d025 This is a MIME message. If you are reading this text, you may want to consider changing to a mail reader or gateway that understands how to properly handle MIME multipart messages. --=__Part0926419C.0__= Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I=27d like to make a few comments concerning what you say below.=20 1. I=27ve been involved with Unicode, both UTC and as a representative to WG2, and I can confidently affirm that there is no Unicode God. No one has ever said There is no Code but Unicode, and UTC/WG2 is its prophet, or anything like that. If you have a reference to the Unicode Standard where I can read in black and white what you are referring to, I will happily look at it. (This is not intended as a smart remark. I=27m quite seriously interested in understanding the facts of this issue.)=20 2. Anyone involved in Unicode, including inner core members of UTC etc, recognize that it=27s far from perfect. There is acknowledgement that a number of things could have been handled differently, but weren=27t. Stability Policy may seem like a problematic restriction to some in cases like this, but it guarantees backward compatibility, so has wisdom to it.=20 3. Whatever views one may have on Unicode, for better or worse, it is what it is. As you said yourself, c=27est un moyen et non pas une fin.... One is free to use it, or not, and or devise alternatives. (But more on alternatives below.)=20 4. You suggested in an earlier email that you=27d like to think the whole thing through carefully in advance, rather than implement things in stages, as others do, who then never get to the advanced stages. To me this begs the question of whether such is always universally the case. In particular, if anyone or any group tried/had tried to implement all of what Unicode proposes to be/become (UCS--Universal Character Set), the sheer magnitude of the task (which of course grows over time since scripts either in themselves or as a set are not static), he/she/they would never get the thing off the ground. This is in part why there are (arguably) flaws in Unicode. In any case, I seriously doubt that even if one attempted to =22redo=22 it =22the right way this time=22 one would = manage. This is just not within the grasp of human endeavour. The mistakes would simply be different or in different areas. Likewise, there are plenty of things one could bring against the process of Unicode endorsing proposals, i.e. the inherent politics of interested groups, but that again is always a reality.=20 5. All that being said--Plan 9, as far as I can see, intentionally supports Unicode (see http://plan9.bell-labs.com/plan9/about.html). ( http://plan9.bell-labs.com/plan9/about.html). ) So to me, it=27s a non-starter to want to port *TeX to Plan 9 but rail against Unicode, whether justifiably or through misunderstanding.=20 6. Unicode isn=27t Eternal, any more than any other encoding standard. (I=27m sure there were--and perhaps still are--those who think that BCD, no wait=21 EBCD, no wait=21 ASCII, no wait...=21--were/are the be all and = end all). In time, something else will develop in response to developing needs.=20 7. But at present, the recognized standard out there that for most practical intents and purposes (in particular, to service the needs of something other than just North American anglophone techie society) is Unicode, with whatever blemishes it may have.=20 So it seems to me that in keeping with your principle alluded to above, and given that were talking about a Plan 9 environment here, you ought to be talking UTF-8 right off the bad.=20 As I said--=22seems to me=22. Could be I=27m seriously misunderstanding = the discussion... but then again, the diminishing dialogue in terms of number of participants suggests to me that there may be at least *some* truth in what I=27m thinking....=20 Please don=27t think this is intended as a rant, either due to the way I=27ve formatted this or on account of the content. I=27m interested in following what you=27re doing; I=27m just a bit puzzled, and I sincerely wish you the best in your efforts with this project.=20 K >>> 06/28/11 7:19 AM >>> On Mon, Jun 27, 2011 at 07:45:34PM -0400, Karljurgen Feuerherm wrote: > Thierry, > > > I only say that: > > > 1) Forcing, as this was written in the XeTeX FAQ, user to> special codepoint for the fi ligature = since, white eyes, scornful wave > of the hand: =22this is the way this is done with Unicode=22 is sheer > stupidity. > > I don=27t know who told you that... just because there is a codepoint for something does not mean that one has to access that codepoint directly in all cases. Software at various levels can render a ligature on the basis of various actual character sequences (e.g. f + i, or f, i when ligatures are forced, etc. > > It=27s simply a level of what support one wishes to offer.... This is exactly what I=27m trying to say. If one enters =5C=27e, =5C=27 is = just the =22charname=22 or macro command to access the acute accent in the = font. One can enter directly the code for the acute accent. Or one can enter directly the =C3=A9 (if the CID entered is classified as =22other=22 = =5Bliteral=5D, and the fonts have something at the corresponding index). BUT the documentation found told that with =22modern=22 fonts, one has the absolute obligation threatened by Thy Unicode GOD to enter the codepoint and that ligatures were deprecated. TeX is absolutely agnostic. It is an engine, a compiler/interpreter. Even tex(1) is just the name of an instance of TeX with a special convention: D.E. Knuth=27s plain TeX. some =5C=27e let CID > > KF -- Thierry Laronde http://www.kergis.com/ Key fingerprint =3D 0FF7 E906 FBAF FE95 FD89 250D 52B1 AE95 6006 F40C --=__Part0926419C.0__= Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Description: HTML =20

I'd like to make a few = comments concerning what you say below.


=20

1. I've been involved = with Unicode, both UTC and as a representative to WG2, and I can = confidently affirm that there is no Unicode God. No one has ever said = There is no Code but Unicode, and UTC/WG2 is its prophet, or = anything like that. If you have a reference to the Unicode Standard where = I can read in black and white what you are referring to, I will = happily look at it. (This is not intended as a smart remark. I'm = quite seriously interested in understanding the facts of this issue.)


=20

2. Anyone involved in = Unicode, including inner core members of UTC etc, recognize that = it's far from perfect. There is acknowledgement that a number of = things could have been handled differently, but weren't. Stability = Policy may seem like a problematic restriction to some in cases like = this, but it guarantees backward compatibility, so has wisdom to = it.


=20

3. Whatever views one may = have on Unicode, for better or worse, it is what it is. As you = said yourself, c'est un moyen et non pas une fin.... One is free = to use it, or not, and or devise alternatives. (But more on = alternatives below.)


=20

4. You suggested in an = earlier email that you'd like to think the whole thing through = carefully in advance, rather than implement things in stages, as = others do, who then never get to the advanced stages. To me this begs = the question of whether such is always universally the case. In particular&= #44; if anyone or any group tried/had tried to implement all of what = Unicode proposes to be/become (UCS--Universal Character Set), = the sheer magnitude of the task (which of course grows over time since = scripts either in themselves or as a set are not static), = he/she/they would never get the thing off the ground. This is in part why = there are (arguably) flaws in Unicode. In any case, I = seriously doubt that even if one attempted to "redo" it = "the right way this time" one would manage. This is just not = within the grasp of human endeavour. The mistakes would simply be = different or in different areas. Likewise, there are plenty of things = one could bring against the process of Unicode endorsing proposals, = i.e. the inherent politics of interested groups, but that again is = always a reality.


=20

5. All that being said--Plan = 9, as far as I can see, intentionally supports Unicode (see = http://plan9.bell-lab= s.com/plan9/about.html). So to me, it's a non-starter to want to port = *TeX to Plan 9 but rail against Unicode, whether justifiably or = through misunderstanding.


=20

6. Unicode isn't = Eternal, any more than any other encoding standard. (I'm sure = there were--and perhaps still are--those who think that BCD, no = wait! EBCD, no wait! ASCII, no wait...!--were/are the = be all and end all). In time, something else will develop in = response to developing needs.


=20

7. But at present, the = recognized standard out there that for most practical intents and purposes = (in particular, to service the needs of something other than just = North American anglophone techie society) is Unicode, with = whatever blemishes it may have.


=20

So it seems to me that in = keeping with your principle alluded to above, and given that were = talking about a Plan 9 environment here, you ought to be talking UTF-8 = right off the bad.


=20

As I said--"seems to = me". Could be I'm seriously misunderstanding the discussion... = but then again, the diminishing dialogue in terms of number of = participants suggests to me that there may be at least *some* = truth in what I'm thinking....


=20

Please don't think this = is intended as a rant, either due to the way I've formatted this = or on account of the content. I'm interested in following what = you're doing; I'm just a bit puzzled, and I sincerely wish = you the best in your efforts with this project.


=20

K

>>> = <tlaronde@polynum.com> 06/28/11 7:19 AM >>>
On Mon, = Jun 27, 2011 at 07:45:34PM -0400, Karljurgen Feuerherm wrote:
&g= t; Thierry,
>
> > I only say that:
>
> > = 1) Forcing, as this was written in the XeTeX FAQ, user to = enter the
> special codepoint for the fi ligature since, white = eyes, scornful wave
> of the hand: "this is the way this is = done with Unicode" is sheer
> stupidity.
>
> I = don't know who told you that...  just because there is a = codepoint for something does not mean that one has to access that = codepoint directly in all cases. Software at various levels can render a = ligature on the basis of various actual character sequences (e.g. f = + i, or f, i when ligatures are forced, etc.
>
>= ; It's simply a level of what support one wishes to offer....

Th= is is exactly what I'm trying to say. If one enters \'e, = \' is just
the "charname" or macro command to access = the acute accent in the font.
One can enter directly the code for the = acute accent. Or one can enter
directly the é (if = the CID entered is classified as "other" [literal],and the fonts have something at the corresponding index).

BUT = the documentation found told that with "modern" fonts, one = has the
absolute obligation threatened by Thy Unicode GOD to enter the = codepoint
and that ligatures were deprecated.

TeX is absolutely = agnostic. It is an engine, a compiler/interpreter.
Even tex(1= 1; is just the name of an instance of TeX with a special
convention: = D.E. Knuth's plain TeX.
some \'e let
CID
>
> = KF

--
        Thierry = Laronde <tlaronde +AT+ polynum +dot+ com>
 &#= 160;            = ;        http://www.kergis.com/
Key fingerprint = 0FF7 E906 = FBAF FE95 FD89  250D 52B1 AE95 6006 F40C


--=__Part0926419C.0__=--