From mboxrd@z Thu Jan 1 00:00:00 1970 From: erik quanstrom Date: Sun, 19 Jun 2011 10:21:15 -0400 To: 9fans@9fans.net Message-ID: In-Reply-To: <20110617153716.GA440@polynum.com> References: <20110616121700.GA9131@polynum.com> <20110617153716.GA440@polynum.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Subject: Re: [9fans] [RFC] fonts and unicode/utf [TeX] Topicbox-Message-UUID: f2c079d0-ead6-11e9-9d60-3106f5b1d025 > I've given a look at it. I don't want to start a discussion about > Unicode, since, supplementary to the "characters" (alphabetical, > syllabics, ideographics; but no hieroglyphes or Linear B, so it's not > complete ;) not central to my point, but this is not correct ; grep -i 'linear b syllable b008' /lib/unicode 010000 linear b syllable b008 a ; grep -i 'egyptian hieroglyph a001' /lib/unicode 013000 egyptian hieroglyph a001 > there are formatting commands or rendering (the ligature fi > is not a character; but in the XeTeX FAQ it is said user has to insert > directly the Unicode for this codepoint since there is no ligature), > that I don't think should be there (only the historical ASCII controls > should be there; others should be undefined). the general idea behind unicode is that it is a sequenced collection of codepoints, not characters. this implies that formatting differences such as ligatures that have not sematic component (typesetting artifacts, if you will) shouldn't be encoded in the character set. i realize there are some exceptions to this, but imho, the unicode committee are not perfect. it's easy enough to escape non-codepoints or encode them in one of the private unicode ranges. - erik