From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/10076 Path: main.gmane.org!not-for-mail From: Hans Hagen Newsgroups: gmane.comp.tex.context Subject: Re: utf 8 / test file Date: Mon, 09 Dec 2002 00:26:16 +0100 Sender: ntg-context-admin@ntg.nl Message-ID: <5.1.0.14.1.20021209001503.02b22708@remote-1> References: <5.1.0.14.1.20021207123223.0254e040@server-1> <5.1.0.14.1.20021207123223.0254e040@server-1> Reply-To: ntg-context@ntg.nl NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Trace: main.gmane.org 1039390497 3203 80.91.224.249 (8 Dec 2002 23:34:57 GMT) X-Complaints-To: usenet@main.gmane.org NNTP-Posting-Date: Sun, 8 Dec 2002 23:34:57 +0000 (UTC) Return-path: Original-Received: from ref.vet.uu.nl ([131.211.172.13] helo=ref.ntg.nl) by main.gmane.org with esmtp (Exim 3.35 #1 (Debian)) id 18LAwp-0000pX-00 for ; Mon, 09 Dec 2002 00:34:55 +0100 Original-Received: from ref.ntg.nl (localhost.localdomain [127.0.0.1]) by ref.ntg.nl (Postfix) with ESMTP id 6EFDD10AE8; Mon, 9 Dec 2002 00:35:02 +0100 (MET) Original-Received: from smtp03.wxs.nl (smtp03.wxs.nl [195.121.6.37]) by ref.ntg.nl (Postfix) with ESMTP id 4B4A210AE8 for ; Mon, 9 Dec 2002 00:32:03 +0100 (MET) Original-Received: from LAPTOP-3.wxs.nl ([213.75.95.175]) by smtp03.wxs.nl (Netscape Messaging Server 4.15) with ESMTP id H6TQPC02.JJN for ; Mon, 9 Dec 2002 00:32:00 +0100 X-Sender: hagen-mail@remote-1 X-Mailer: QUALCOMM Windows Eudora Version 5.1 Original-To: ntg-context@ntg.nl In-Reply-To: <20021208203834.GA642@scaprea> Errors-To: ntg-context-admin@ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: List-Post: List-Subscribe: , List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: Xref: main.gmane.org gmane.comp.tex.context:10076 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:10076 At 09:38 PM 12/8/2002 +0100, you wrote: >I have looked at how emacs and Unicode browser deal with unicode and >fonts. Unicode browser is an application on the CD-ROM that comes with >the Unicode 3.0 book. They both use font sets, i.e., collections of so i have to buy that book -) what is the best place to get it? For Context this might be worked out as follows: Each font family must >be in a known encoding. When a font family is loaded, the encoding and >the associated font family are added to a table of loaded >encodings. When a unicode character is sought, the loaded encodings >are scanned in the order in which they appear in the table, until an >encoding is found that provides a glyph for that character. hm, must think this over, esp since tex has no way (except measuring) to determine if a slot is really taken >It is possible that two font families are loaded that overlap in the >range covered. Then the glyphs in the overlap area are taken from the >font loaded first. This behaviour can be changed by configuring a font >to contribute only a certain range of characters, or to exclude a >certain range of characters from a font. This is a refinement that >might be added later on. > >The NFSS in LaTeX provides a default encoding for a character (not to >be confused with Context's default encoding, which is a different >thing). When the character is not found in the current encoding, it is >taken from this default encoding. Such a strategy may be more >efficient than going through the list of loaded encodings. eh ... context does have fall backs (nearly always something default, often very plain); if something does not show up, it's probably not defined (yet); so, maybe i misunderstand you >The above strategy may be efficient for a text that mainly consists of >ascii characters. For a text that mainly consists of non-ascii >characters, e.g. a chinese text, it requires much processing. Such a >situation may be dealt with like encodings: When you are writing in a >West European language, it is more efficient to use Latin-1 than >utf-8. Similarly, when one is writing in chinese, a more efficient >setup with a more limited coverage of characters may be used. chinese is even more complicated: there can be mixed utf-like encodings, and chars need some kind of postprocessing (adding breakpoints and so, or rotation in vertical typesetting, and/or special numbering things; this is already handled;) >I prefer to use font families rather than fonts. This makes it easy to >switch from one font family to another, while keeping constant the >other font parameters such as shape and weight. I like the way this is >done in LaTeX's NFSS. I do not (yet) know much about the way Context >organizes its fonts. the organization is roughly the same as in any tex (a few axis); for scripts like chinese, names like SomeNiceFont automatically expand into SomeNiceFontBold at a certain size; this is a byproduct of using symbolic filenames; it also means a pretty nice way of mixing latin, idiographic, and math scripts. >One should be aware of the difference between character and >glyph. Unicode is about characters, typesetters like TeX are about >glyphs. It is very well possible that one font provides several >variant glyphs for one and the same Unicode character. The user must >have some way to express preference for one or the other. i read somewhere that unicode is about scripts -) you're right; somehow we need to deal with the open type language dependent glyphs; pretty nasty >I think the user should load the appropriate input regime, as he only >knows the encoding of the input file. For XML files it is different; >in DocbookInContext I will try to load the appropriate input regime >automatically from the encoding mentioned in the xml declaration. > >Configuring an appropriate font set is difficult. Perhaps font sets >should be preconfigured, and fonts should be loaded as available. Good >error messages when no font provides a glyph for a character in the >text document should alert the user to missing fonts. Indeed i think that we should have some reasonable defaults, and it seems that there are no free complete unicode fonts, so we probably end up with something => defaultfont but maybe even with => defaultfont this needs some research. Thanks for your input. Hans ------------------------------------------------------------------------- Hans Hagen | PRAGMA ADE | pragma@wxs.nl Ridderstraat 27 | 8061 GH Hasselt | The Netherlands tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com ------------------------------------------------------------------------- information: http://www.pragma-ade.com/roadmap.pdf documentation: http://www.pragma-ade.com/showcase.pdf -------------------------------------------------------------------------