From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 19 Jun 2011 18:38:59 -0400 To: 9fans@9fans.net Message-ID: <3c7e401c771bdd0d9bd8950ceb60eb9e@ladd.quanstro.net> In-Reply-To: <20110619163458.GA424@polynum.com> References: <20110616121700.GA9131@polynum.com> <9556bc097d90b774c37c16af5a7c20eb@brasstown.quanstro.net> <20110619163458.GA424@polynum.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] [RFC] fonts and unicode/utf [TeX] Topicbox-Message-UUID: f2d183ba-ead6-11e9-9d60-3106f5b1d025 > > perhaps you mean the subset of unicode corresponding to the codepoints > > encoded by latin1 encoded in utf-8. the system character set is utf-8, > > and latin1 is not a compatable encoding. utf-8 is assumed everwhere except > > when the data is inbound, and explicitly tagged as having a different > > caracter set. programs like upas/fs and webfs do the conversion at the > > border. > > > > there's really no reason for latin1 in 2011. > > There is a reason here: for now, TeX is 8 bits and that's all. So, if > allowing to use, at least, all of the 8 bits means something, it shall > be latin1. This does not prevent somebody to use whatever character set > one wants; but as a default, and _for now_, it's better than nothing; > and significantly better than some random character set that no tcs(1) > will know how to deal with. > > To accept directly utf-8 as input will not be addressed for the 1.0 > release of kerTeX. i think you've missed my point. latin1 is an encoding, utf-8 is an encoding. if tex is so backwards that it can't accept a character wider than 8 bits, then it would be reasonable to not be different than the rest of the plan 9 system to read utf 8 runes (i.e. not latin1) in and then reject runes with a codepoint above 255. then, if tex is fixed to accept larger codepoints, one can remove this limit. if latin1 is used, then it can not be retrofitted in a way that is compatable with older tex input. nobody cares what font encoding tex uses internally. the real issue is the input to tex. i sure would be very reluctant to load anything on my system that will mangle utf-8, especially for codepoints <256. that's the path to wchar_t. - erik