ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Simon Pepping <spepping@scaprea.hobby.nl>
Subject: Re: utf 8 / test file
Date: Sun, 8 Dec 2002 21:38:34 +0100	[thread overview]
Message-ID: <20021208203834.GA642@scaprea> (raw)
In-Reply-To: <5.1.0.14.1.20021207123223.0254e040@server-1>

Hans,

I have looked at how emacs and Unicode browser deal with unicode and
fonts. Unicode browser is an application on the CD-ROM that comes with
the Unicode 3.0 book. They both use font sets, i.e., collections of
fonts that are put together so as to cover a large part of the Unicode
range. The unicode browser scans the fonts in the order listed in its
configuration file. When it finds a font that provides the sought
character, it uses the glyph from that font. It is possible to refine
the configuration: One can indicate that a font only contributes a
certain range. One can exclude a range from a font. I believe this is
a strategy that could be used by other applications.

For Context this might be worked out as follows: Each font family must
be in a known encoding. When a font family is loaded, the encoding and
the associated font family are added to a table of loaded
encodings. When a unicode character is sought, the loaded encodings
are scanned in the order in which they appear in the table, until an
encoding is found that provides a glyph for that character.

It is possible that two font families are loaded that overlap in the
range covered. Then the glyphs in the overlap area are taken from the
font loaded first. This behaviour can be changed by configuring a font
to contribute only a certain range of characters, or to exclude a
certain range of characters from a font. This is a refinement that
might be added later on.

The NFSS in LaTeX provides a default encoding for a character (not to
be confused with Context's default encoding, which is a different
thing). When the character is not found in the current encoding, it is
taken from this default encoding. Such a strategy may be more
efficient than going through the list of loaded encodings.

The above strategy may be efficient for a text that mainly consists of
ascii characters. For a text that mainly consists of non-ascii
characters, e.g. a chinese text, it requires much processing. Such a
situation may be dealt with like encodings: When you are writing in a
West European language, it is more efficient to use Latin-1 than
utf-8. Similarly, when one is writing in chinese, a more efficient
setup with a more limited coverage of characters may be used.

I prefer to use font families rather than fonts. This makes it easy to
switch from one font family to another, while keeping constant the
other font parameters such as shape and weight. I like the way this is
done in LaTeX's NFSS. I do not (yet) know much about the way Context
organizes its fonts.

One should be aware of the difference between character and
glyph. Unicode is about characters, typesetters like TeX are about
glyphs. It is very well possible that one font provides several
variant glyphs for one and the same Unicode character. The user must
have some way to express preference for one or the other.

I think the user should load the appropriate input regime, as he only
knows the encoding of the input file. For XML files it is different;
in DocbookInContext I will try to load the appropriate input regime
automatically from the encoding mentioned in the xml declaration.

Configuring an appropriate font set is difficult. Perhaps font sets
should be preconfigured, and fonts should be loaded as available. Good
error messages when no font provides a glyph for a character in the
text document should alert the user to missing fonts.

These are my thoughts.

Simon

On Sat, Dec 07, 2002 at 12:38:46PM +0100, Hans Hagen wrote:
> Hi,
> 
> I posted
> 
>   http://www.pragma-ade.com/temp/titus.pdf
> 
> now, one thing with unicode (utf) is that support needs to have an 
> associated font / language switch.
> 
> Traditionally, tex font mechanisms have been complicated by the fact that 
> there are many shapes per font and math has to be dealt with.
> 
> If we're dealing with say sanskrit, is it then safe to assume that
> 
> (1) we can switch to the language (if not yet done) when we encounter a 
> unicode from the associated char/glyph range
> 
> (2) can we assume that a relatively simple font mechanism is used 
> (normal,bold,slanted)
> 
> (3) can we assume that only a few (possibly derived from unicode) fonts are 
> used, or at least one main type of font per language
> 
> (4) can we standardize on utf-8 [and assume some preprocessor if not]
> 
> [let's try to deal with the practical, so what's the practical usage]
> 
> Hans
> -------------------------------------------------------------------------
>                                   Hans Hagen | PRAGMA ADE | pragma@wxs.nl
>                       Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>  tel: +31 (0)38 477 53 69 | fax: +31 (0)38 477 53 74 | www.pragma-ade.com
> -------------------------------------------------------------------------
>                        information: http://www.pragma-ade.com/roadmap.pdf
>                     documentation: http://www.pragma-ade.com/showcase.pdf
> -------------------------------------------------------------------------
> 
> _______________________________________________
> ntg-context mailing list
> ntg-context@ntg.nl
> http://www.ntg.nl/mailman/listinfo/ntg-context

-- 
Simon Pepping
email: spepping@scaprea.hobby.nl

  reply	other threads:[~2002-12-08 20:38 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-12-07 11:38 Hans Hagen
2002-12-08 20:38 ` Simon Pepping [this message]
2002-12-08 23:26   ` Hans Hagen
2002-12-09  9:40     ` Taco Hoekwater
2002-12-09 10:40     ` Re[2]: " Giuseppe Bilotta
2002-12-09 11:30       ` Hans Hagen
2002-12-09 20:32     ` Simon Pepping
2002-12-09 20:44 ` Simon Pepping
2002-12-10  9:54   ` Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021208203834.GA642@scaprea \
    --to=spepping@scaprea.hobby.nl \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).