ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: "Adam Lindsay" <atl@comp.lancs.ac.uk>
Subject: Re: unicode and out-of-box usability
Date: Mon, 5 Jan 2004 13:43:49 +0000	[thread overview]
Message-ID: <20040105134349.11800@smtp.btinternet.com> (raw)
In-Reply-To: <6.0.1.1.2.20040103232206.01e51ec0@localhost>

Hi Hans. Thanks for the reply.

Hans Hagen said this at Sat, 3 Jan 2004 23:38:02 +0100:

>>In unic-ini:
>>\chardef\utfunihashmode=0 % 1 = enabled
>>
>>Actually, if I understand things correctly, '1' means "disabled", which
>>is what I preferred, having not yet created any unicode vectors. So the
>>internal documentation there seems wrong, and I would argue the default
>>case (0) makes it harder for beginners.
>
>hm, did you look at the unic-001 etc files? the trick is in fast and
efficient
>expansion without the need to define lots of named glyphs

I looked at them, but perhaps I didn't "get" it. What I saw was precisely
lots and lots of named glyphs in these hashes. I was setting some greek
and cyrillic text, so was dealing with glyphs that would be described in
a non-existent unic-003, unic-004 and other vectors.

Basically, with utfunihashmode set to zero, I saw lots and lots of black
rectangles, even though I had correctly defined and installed the unicode
fonts. When I disabled it by setting it to one, it worked. That seemed
confusing to a beginner for two reasons. (documentation & not looking for
fonts when characters are not in a hash).

>>More confusingly, in font-uni:
>
>forget about that one, although it's called unicode, it's actually a 
>mechanism for
>the many vectors derived from unicode / related to unicode but not entirely 
>i.e. cjk fonts

As you know, I'm somewhat unhealthily obsessed with fonts, and the
\defineunicodefont mechanism seemed to be the only one that 1) allowed
multiple unicode fonts to be defined, and 2) made an effort to
synchronise with document styles. This becomes especially sensitive with
typesetting things like Vietnamese, where you're sliding between named
glyphs that fall into a "normal" encoding vector and glyphs that are in
unicode space. 

I might be wrong on those points, but it seemed as though the UTF-8
mechanism assumed only one font. Japanese can bold things for emphasis,
Greek can be in italic, and there are both Sans and Serif fonts for very
many of the world's languages. On a first pass, the defineunicodefont
mechanism (aside from the \useregime[unicode]) worked fairly well for
those purposes.

>>\def\unicodeasciicharacter{\uchar{0}}
>>
>>(I'm not certain the above is release-quality code, but I've been testing
>>it with a stripped down \utfunifontglyph that should be functionally
>>equivalent.)
>
>play with it and we'll see

I've been playing with it. It works for me. Wouldn't publicly suggest it
otherwise. :)

>>Working with the unicode code makes me appreciate that it's really
>>powerful part of ConTeXt. Thanks, Hans!
>
>how about the following:
>
>there are many font encodings around but none is really complete enough to 
>deal with basic unicode (0/1/2 range)
>
>why not define a new font encoding with characters only so that we can have 
>as many chars as needed in a 0-255 vector, all those
>special characters (registered, and so) are (1) used seldom, 

Okay, I think I get what you're saying, a hyper-dense letter-only
encoding. One of the reasons I got into your support for Unicode was that
"seldom"-used is very relative term, and requires a prioritisation of
languages, and that a vector of 256 glyphs just wasn't going to cut it. I
got tired of hand-rolling encodings, and wanted to use someone else's
hard work for a while. :)

Another reason is that Eddie Kohler's OpenType tools uses /.notdefs in an
encoding as a place for extended glyphs like swashes and ligatures. I'm
rather fond of those features, and so such a dense encoding is not that
useful to me.

>(2) not 
>related to hyphenation and kerning; it is also a way to get
>rid of some 'ligatures' like --- becoming an emdash (in context and xml we 
>can conformtably directly call symbols, and these may
>come from a different instance of the font

So you would make all punctuation active, essentially? Interesting...

Here's another question for folks working in non-latin languages: how
does hyphenation work?
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Adam T. Lindsay                      atl@comp.lancs.ac.uk
 Computing Dept, Lancaster University   +44(0)1524/594.537
 Lancaster, LA1 4YR, UK             Fax:+44(0)1524/593.608
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

  reply	other threads:[~2004-01-05 13:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-01-02 17:59 Adam Lindsay
2004-01-03 22:38 ` Hans Hagen
2004-01-05 13:43   ` Adam Lindsay [this message]
2004-01-05 14:23   ` My Way: Unicode Symbols Adam Lindsay
2004-01-05 19:20     ` Adam Lindsay
2004-01-05 20:12     ` Hans Hagen
2004-01-06 12:22       ` Adam Lindsay
2004-01-06 14:01         ` Hans Hagen
2004-01-07 15:26           ` Adam Lindsay
2004-01-11 11:30       ` Adam Lindsay

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20040105134349.11800@smtp.btinternet.com \
    --to=atl@comp.lancs.ac.uk \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).