Re: utf 8 / test file

ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed

From: Hans Hagen <pragma@wxs.nl>
Subject: Re: utf 8 / test file
Date: Mon, 09 Dec 2002 00:26:16 +0100	[thread overview]
Message-ID: <5.1.0.14.1.20021209001503.02b22708@remote-1> (raw)
In-Reply-To: <20021208203834.GA642@scaprea>

At 09:38 PM 12/8/2002 +0100, you wrote:

>I have looked at how emacs and Unicode browser deal with unicode and
>fonts. Unicode browser is an application on the CD-ROM that comes with
>the Unicode 3.0 book. They both use font sets, i.e., collections of

so i have to buy that book -) what is the best place to get it?

For Context this might be worked out as follows: Each font family must
>be in a known encoding. When a font family is loaded, the encoding and
>the associated font family are added to a table of loaded
>encodings. When a unicode character is sought, the loaded encodings
>are scanned in the order in which they appear in the table, until an
>encoding is found that provides a glyph for that character.

hm, must think this over, esp since tex has no way (except measuring) to 
determine if a slot is really taken

>It is possible that two font families are loaded that overlap in the
>range covered. Then the glyphs in the overlap area are taken from the
>font loaded first. This behaviour can be changed by configuring a font
>to contribute only a certain range of characters, or to exclude a
>certain range of characters from a font. This is a refinement that
>might be added later on.
>
>The NFSS in LaTeX provides a default encoding for a character (not to
>be confused with Context's default encoding, which is a different
>thing). When the character is not found in the current encoding, it is
>taken from this default encoding. Such a strategy may be more
>efficient than going through the list of loaded encodings.

eh ... context does have fall backs (nearly always something default, often 
very plain); if something does not show up, it's probably not defined 
(yet); so, maybe i misunderstand you

>The above strategy may be efficient for a text that mainly consists of
>ascii characters. For a text that mainly consists of non-ascii
>characters, e.g. a chinese text, it requires much processing. Such a
>situation may be dealt with like encodings: When you are writing in a
>West European language, it is more efficient to use Latin-1 than
>utf-8. Similarly, when one is writing in chinese, a more efficient
>setup with a more limited coverage of characters may be used.

chinese is even more complicated: there can be mixed utf-like encodings, 
and chars need some kind of postprocessing (adding breakpoints and so, or 
rotation in vertical typesetting, and/or special numbering things; this is 
already handled;)

>I prefer to use font families rather than fonts. This makes it easy to
>switch from one font family to another, while keeping constant the
>other font parameters such as shape and weight. I like the way this is
>done in LaTeX's NFSS. I do not (yet) know much about the way Context
>organizes its fonts.

the organization is roughly the same as in any tex (a few axis); for 
scripts like chinese, names like SomeNiceFont automatically expand into 
SomeNiceFontBold at a certain size; this is a byproduct of using symbolic 
filenames; it also means a pretty nice way of mixing latin, idiographic, 
and math scripts.

>One should be aware of the difference between character and
>glyph. Unicode is about characters, typesetters like TeX are about
>glyphs. It is very well possible that one font provides several
>variant glyphs for one and the same Unicode character. The user must
>have some way to express preference for one or the other.

i read somewhere that unicode is about scripts -)

you're right; somehow we need to deal with the open type language dependent 
glyphs; pretty nasty

>I think the user should load the appropriate input regime, as he only
>knows the encoding of the input file. For XML files it is different;
>in DocbookInContext I will try to load the appropriate input regime
>automatically from the encoding mentioned in the xml declaration.
>
>Configuring an appropriate font set is difficult. Perhaps font sets
>should be preconfigured, and fonts should be loaded as available. Good
>error messages when no font provides a glyph for a character in the
>text document should alert the user to missing fonts.

Indeed i think that we should have some reasonable defaults, and it seems 
that there are no free complete unicode fonts, so we probably end up with 
something

<range> => defaultfont

but maybe even with

<subrange> => defaultfont

this needs some research.

Thanks for your input.

Hans
-------------------------------------------------------------------------
                                   Hans Hagen | PRAGMA ADE | pragma@wxs.nl
                       Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
  tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
-------------------------------------------------------------------------
                        information: http://www.pragma-ade.com/roadmap.pdf
                     documentation: http://www.pragma-ade.com/showcase.pdf
-------------------------------------------------------------------------

next prev parent reply	other threads:[~2002-12-08 23:26 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-12-07 11:38 Hans Hagen
2002-12-08 20:38 ` Simon Pepping
2002-12-08 23:26   ` Hans Hagen [this message]
2002-12-09  9:40     ` Taco Hoekwater
2002-12-09 10:40     ` Re[2]: " Giuseppe Bilotta
2002-12-09 11:30       ` Hans Hagen
2002-12-09 20:32     ` Simon Pepping
2002-12-09 20:44 ` Simon Pepping
2002-12-10  9:54   ` Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5.1.0.14.1.20021209001503.02b22708@remote-1 \
    --to=pragma@wxs.nl \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).