ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: "Mojca Miklavec" <mojca.miklavec.lists@gmail.com>
Cc: Jonathan Kew <jonathan_kew@sil.org>,
	Philipp Reichmuth <reichmuth@web.de>
Subject: Re: Unicode stuff (was: Re: Specifying BibTeX engine)
Date: Thu, 9 Nov 2006 17:47:31 +0100	[thread overview]
Message-ID: <6faad9f00611090847l67df5b49w9a3ad313bad7cf2b@mail.gmail.com> (raw)
In-Reply-To: <eiieio$b7s$1@sea.gmane.org>

On 11/4/06, Philipp Reichmuth wrote:
> I've been starting to reuse some of this work in a script to do active
> character assignment for XeTeX depending on what glyphs are present in
> an OpenType font, so that those characters for which the font doesn't
> have a glyph are generated by ConTeXt.  Basically I want to produce
> something like this:
>
> \ifnum\XeTeXcharglyph"010D=0
>      \catcode`č=\active \def č{\ccaron}
> \else
>      \catcode`č=\letter
> \fi % ConTeXt knows this letter -> better hyphenation
>
> \ifnum\XeTeXcharglyph"1E0D=0
>      \catcode`ḍ=\active \def ḍ{\b{d}}
> \else
>      \catcode`ḍ=\letter
> \fi % ConTeXt doesn't know this letter

No reason for not adding it.

> (with \other, respectively, for non-letters).  Being somewhat of a
> novice to TeX programming, I'm not sure if this will work, though, and
> I'm also not sure if it's better to generate static scripts that do this
> for every font (so the resulting TeX file is a font-specific big list of
> \catcode`$CHARACTERs) or to do this dynamically on every font change,
> maybe limited to selectable Unicode ranges (which is more general but
> also a lot slower).

Generating this for every single font would be stupid. This should be
part of low-level XeTeX (Jonathan has promised to look into it some
time). In my opinion the best way to deal with it would be the ability
to define a fall-back definition for "every" missing letter in a font.
Consequently, if you have "ddotbelow" missing in your font, XeTeX
would ask ConTeXt if some fallback definition has been provided for
that glyph, If yes, it would fall back to it, "\b{d}", but if the
glyph would be present in that font, XeTeX would use it.

> > I'd prefer to see a context encoding added to GNU recode for the
> > benefit of future archeologists trying to decipher ancient documents.
>
> That would be better I guess, but isn't ConTeXt encoding a moving target
> in that characters can still get added?  Or is the list fixed to AGL
> glyph names and nothing else?

No, it's certainly not fixed to AGL. But I wouldn't object adding it
to GNU recode (on top of "(La)TeX" which also recognizes \v, \b, ...)
if someone would decide to make a good revision of it and if more
people think that it would be useful (and if developers are open to
that idea). I try to use Unicode when writing sources whenever
possible.

Mojca


PS for Philipp: I didn't try out your definitions, but you have a cut
out of an older conversation as an example of what certainly doesn't
work under XeTeX ;)
(answer was written by Jonathan Kew) I was trying write a few macros
to support the old tfm-based fonts, but figured out that that was the
wrong starting point (and also other reason than yours).

> \catcode`ð=\active \defð{^^f0}
> \starttext
> Testing ... ð
> \stoptext
>
> and it seems to enter some infinite loop when ð is encountered (I can
> define any other letter as well, but only ^^f0 is causing problems).

No, this seems to me like it's the wrong way to define the character!
And I think you would have the same problem with other letters if
trying to define them as their own codes; the ones that work for you
must be getting defined as *different* codes from the original input.

The ^^xx notation is converted to a literal character by TeX's input
scanning routine, so it behaves exactly as if it were that character
itself. And ^^f0 in Latin-1 (or Unicode) is the ð character. So this
definition works exactly the same as if you were to say

  \catcode`ð=\active \defð{ð}

which is clearly recursive.

Given that you don't need to remap ð in the input to some other
Unicode character for printing, there should be no need for this at
all. The only reason to use a definition like this would be if the
input text used a *different* character where you want to print eth;
or you want to print something *other* than character F0 for the
input ð.

In general, a "safe" form of the definition would be to use \chardef:

  \catcode`ð=\active \chardefð="F0

This makes ð into a macro that expands to the character "F0; there is
an important difference between this and ^^f0, which actually
"becomes" the character ð itself as the input is read (and therefore
inherits its catcode, definition, etc).
_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context

      reply	other threads:[~2006-11-09 16:47 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-11-04  9:30 Specifying BibTeX engine Philipp Reichmuth
2006-11-04  9:53 ` Taco Hoekwater
2006-11-04 10:05   ` Philipp Reichmuth
2006-11-04 15:09 ` gnwiii
2006-11-04 16:19   ` Unicode stuff (was: Re: Specifying BibTeX engine) Philipp Reichmuth
2006-11-09 16:47     ` Mojca Miklavec [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6faad9f00611090847l67df5b49w9a3ad313bad7cf2b@mail.gmail.com \
    --to=mojca.miklavec.lists@gmail.com \
    --cc=jonathan_kew@sil.org \
    --cc=ntg-context@ntg.nl \
    --cc=reichmuth@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).