From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Sun, 19 Jun 2011 18:38:59 -0400
To: 9fans@9fans.net
Message-ID: <3c7e401c771bdd0d9bd8950ceb60eb9e@ladd.quanstro.net>
In-Reply-To: <20110619163458.GA424@polynum.com>
References: <20110616121700.GA9131@polynum.com>
	<9556bc097d90b774c37c16af5a7c20eb@brasstown.quanstro.net>
	<20110619163458.GA424@polynum.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] [RFC] fonts and unicode/utf [TeX]
Topicbox-Message-UUID: f2d183ba-ead6-11e9-9d60-3106f5b1d025

> > perhaps you mean the subset of unicode corresponding to the codepoints
> > encoded by latin1 encoded in utf-8.  the system character set is utf-8,
> > and latin1 is not a compatable encoding.  utf-8 is assumed everwhere except
> > when the data is inbound, and explicitly tagged as having a different
> > caracter set.  programs like upas/fs and webfs do the conversion at the
> > border.
> >
> > there's really no reason for latin1 in 2011.
>
> There is a reason here: for now, TeX is 8 bits and that's all. So, if
> allowing to use, at least, all of the 8 bits means something, it shall
> be latin1. This does not prevent somebody to use whatever character set
> one wants; but as a default, and _for now_, it's better than nothing;
> and significantly better than some random character set that no tcs(1)
> will know how to deal with.
>
> To accept directly utf-8 as input will not be addressed for the 1.0
> release of kerTeX.

i think you've missed my point.  latin1 is an encoding,
utf-8 is an encoding.  if tex is so backwards that it can't
accept a character wider than 8 bits, then it would be reasonable
to not be different than the rest of the plan 9 system to
read utf 8 runes (i.e. not latin1) in and then reject runes
with a codepoint above 255.

then, if tex is fixed to accept larger codepoints, one can
remove this limit.  if latin1 is used, then it can not be retrofitted
in a way that is compatable with older tex input.

nobody cares what font encoding tex uses internally.  the
real issue is the input to tex.  i sure would be very reluctant
to load anything on my system that will mangle utf-8, especially
for codepoints <256.  that's the path to wchar_t.

- erik