From mboxrd@z Thu Jan 1 00:00:00 1970 From: Lucio De Re To: 9fans@cse.psu.edu Subject: Re: [9fans] A simple question Message-ID: <20030711081111.H7106@cackle.proxima.alt.za> References: <20030710160854.E7106@cackle.proxima.alt.za> <4b98d4a6bc053f2a6d06aed8997d50ff@plan9.bell-labs.com> <20030710162509.F7106@cackle.proxima.alt.za> <3F0D8198.6000009@nas.com> <20030711064145.G7106@cackle.proxima.alt.za> <00f601c34772$215ea340$b9844051@insultant.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 In-Reply-To: <00f601c34772$215ea340$b9844051@insultant.net>; from boyd, rounin on Fri, Jul 11, 2003 at 08:03:20AM +0200 Date: Fri, 11 Jul 2003 08:11:13 +0200 Content-Transfer-Encoding: quoted-printable Topicbox-Message-UUID: f58b0f6e-eacb-11e9-9e20-41e7f4b1d025 On Fri, Jul 11, 2003 at 08:03:20AM +0200, boyd, rounin wrote: >=20 > well if unicode was organised so that each language had its own > code space it would be a trivial problem. >=20 Well, I'm assuming a total rewrite of the alphabets anyway. Although ideograms really don't fit in the same representation space. > look at latin 1 (for want of a better term): it covers a whole bunch > of languages, with different collation sequences and many of the > glyphs are not actually _real letters_. >=20 Yes, that's where composition came into the picture. But that needs to be clever, with character scaling to make room for accents becoming more than a trivial nuisance. Technically, it doesn't matter what a symbol stands for, as much as it needs to be _presented_ in an unambiguous, clear fashion. Whether it's a pronunciation issue or a distinct character (is the final letter in "papa" and "pap=E0" a pronunciation aid or a different character in the sense of differentiating words with different meanings?) is not important to its internal or external repreentation. But I do get your point that overlaps of alphabets for different languages does add complexity. Maybe there is enough scope in UTF-8 or Unicode to allow many-to one internal to external mappings. The existence of a phonetic alphabet is a different issue, too vast to address here (without composition capabilities, specially). Suffice to say even in Plan 9 there are fonts that do not have all the useful glyphs in them, so whereas UTF-8 is a great abstraction for internal purposes, there should be a more definite standard about externalising it. ++L