From mboxrd@z Thu Jan  1 00:00:00 1970
From: erik quanstrom <quanstro@quanstro.net>
Date: Sun, 19 Jun 2011 10:21:15 -0400
To: 9fans@9fans.net
Message-ID: <e1b02cbabd0b8e61564e905f0fe2ee7a@brasstown.quanstro.net>
In-Reply-To: <20110617153716.GA440@polynum.com>
References: <20110616121700.GA9131@polynum.com>
	<BANLkTin+jjx1ppKsS2HHUZJNqW8gpwQpzA@mail.gmail.com>
	<20110617153716.GA440@polynum.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
Subject: Re: [9fans] [RFC] fonts and unicode/utf [TeX]
Topicbox-Message-UUID: f2c079d0-ead6-11e9-9d60-3106f5b1d025

> I've given a look at it. I don't want to start a discussion about
> Unicode, since, supplementary to the "characters" (alphabetical,
> syllabics, ideographics; but no hieroglyphes or Linear B, so it's not
> complete ;)

not central to my point, but this is not correct

; grep -i 'linear b syllable b008' /lib/unicode
010000	linear b syllable b008 a
; grep -i 'egyptian hieroglyph a001' /lib/unicode
013000	egyptian hieroglyph a001

> there are formatting commands or rendering (the ligature fi
> is not a character; but in the XeTeX FAQ it is said user has to insert
> directly the Unicode for this codepoint since there is no ligature),
> that I don't think should be there (only the historical ASCII controls
> should be there; others should be undefined).

the general idea behind unicode is that it is a sequenced collection
of codepoints, not characters.  this implies that formatting differences
such as ligatures that have not sematic component (typesetting artifacts,
if you will) shouldn't be encoded in the character set.  i realize there are
some exceptions to this, but imho, the unicode committee are not perfect.

it's easy enough to escape non-codepoints or encode them in one of the
private unicode ranges.

- erik