From: Hans Hagen <pragma@wxs.nl>
Subject: Re: utf 8 / test file
Date: Mon, 09 Dec 2002 00:26:16 +0100 [thread overview]
Message-ID: <5.1.0.14.1.20021209001503.02b22708@remote-1> (raw)
In-Reply-To: <20021208203834.GA642@scaprea>
At 09:38 PM 12/8/2002 +0100, you wrote:
>I have looked at how emacs and Unicode browser deal with unicode and
>fonts. Unicode browser is an application on the CD-ROM that comes with
>the Unicode 3.0 book. They both use font sets, i.e., collections of
so i have to buy that book -) what is the best place to get it?
For Context this might be worked out as follows: Each font family must
>be in a known encoding. When a font family is loaded, the encoding and
>the associated font family are added to a table of loaded
>encodings. When a unicode character is sought, the loaded encodings
>are scanned in the order in which they appear in the table, until an
>encoding is found that provides a glyph for that character.
hm, must think this over, esp since tex has no way (except measuring) to
determine if a slot is really taken
>It is possible that two font families are loaded that overlap in the
>range covered. Then the glyphs in the overlap area are taken from the
>font loaded first. This behaviour can be changed by configuring a font
>to contribute only a certain range of characters, or to exclude a
>certain range of characters from a font. This is a refinement that
>might be added later on.
>
>The NFSS in LaTeX provides a default encoding for a character (not to
>be confused with Context's default encoding, which is a different
>thing). When the character is not found in the current encoding, it is
>taken from this default encoding. Such a strategy may be more
>efficient than going through the list of loaded encodings.
eh ... context does have fall backs (nearly always something default, often
very plain); if something does not show up, it's probably not defined
(yet); so, maybe i misunderstand you
>The above strategy may be efficient for a text that mainly consists of
>ascii characters. For a text that mainly consists of non-ascii
>characters, e.g. a chinese text, it requires much processing. Such a
>situation may be dealt with like encodings: When you are writing in a
>West European language, it is more efficient to use Latin-1 than
>utf-8. Similarly, when one is writing in chinese, a more efficient
>setup with a more limited coverage of characters may be used.
chinese is even more complicated: there can be mixed utf-like encodings,
and chars need some kind of postprocessing (adding breakpoints and so, or
rotation in vertical typesetting, and/or special numbering things; this is
already handled;)
>I prefer to use font families rather than fonts. This makes it easy to
>switch from one font family to another, while keeping constant the
>other font parameters such as shape and weight. I like the way this is
>done in LaTeX's NFSS. I do not (yet) know much about the way Context
>organizes its fonts.
the organization is roughly the same as in any tex (a few axis); for
scripts like chinese, names like SomeNiceFont automatically expand into
SomeNiceFontBold at a certain size; this is a byproduct of using symbolic
filenames; it also means a pretty nice way of mixing latin, idiographic,
and math scripts.
>One should be aware of the difference between character and
>glyph. Unicode is about characters, typesetters like TeX are about
>glyphs. It is very well possible that one font provides several
>variant glyphs for one and the same Unicode character. The user must
>have some way to express preference for one or the other.
i read somewhere that unicode is about scripts -)
you're right; somehow we need to deal with the open type language dependent
glyphs; pretty nasty
>I think the user should load the appropriate input regime, as he only
>knows the encoding of the input file. For XML files it is different;
>in DocbookInContext I will try to load the appropriate input regime
>automatically from the encoding mentioned in the xml declaration.
>
>Configuring an appropriate font set is difficult. Perhaps font sets
>should be preconfigured, and fonts should be loaded as available. Good
>error messages when no font provides a glyph for a character in the
>text document should alert the user to missing fonts.
Indeed i think that we should have some reasonable defaults, and it seems
that there are no free complete unicode fonts, so we probably end up with
something
<range> => defaultfont
but maybe even with
<subrange> => defaultfont
this needs some research.
Thanks for your input.
Hans
-------------------------------------------------------------------------
Hans Hagen | PRAGMA ADE | pragma@wxs.nl
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------
information: http://www.pragma-ade.com/roadmap.pdf
documentation: http://www.pragma-ade.com/showcase.pdf
-------------------------------------------------------------------------
next prev parent reply other threads:[~2002-12-08 23:26 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2002-12-07 11:38 Hans Hagen
2002-12-08 20:38 ` Simon Pepping
2002-12-08 23:26 ` Hans Hagen [this message]
2002-12-09 9:40 ` Taco Hoekwater
2002-12-09 10:40 ` Re[2]: " Giuseppe Bilotta
2002-12-09 11:30 ` Hans Hagen
2002-12-09 20:32 ` Simon Pepping
2002-12-09 20:44 ` Simon Pepping
2002-12-10 9:54 ` Hans Hagen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5.1.0.14.1.20021209001503.02b22708@remote-1 \
--to=pragma@wxs.nl \
--cc=ntg-context@ntg.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).