From: "Mojca Miklavec" <mojca.miklavec.lists@gmail.com>
Cc: Jonathan Kew <jonathan_kew@sil.org>,
Philipp Reichmuth <reichmuth@web.de>
Subject: Re: Unicode stuff (was: Re: Specifying BibTeX engine)
Date: Thu, 9 Nov 2006 17:47:31 +0100 [thread overview]
Message-ID: <6faad9f00611090847l67df5b49w9a3ad313bad7cf2b@mail.gmail.com> (raw)
In-Reply-To: <eiieio$b7s$1@sea.gmane.org>
On 11/4/06, Philipp Reichmuth wrote:
> I've been starting to reuse some of this work in a script to do active
> character assignment for XeTeX depending on what glyphs are present in
> an OpenType font, so that those characters for which the font doesn't
> have a glyph are generated by ConTeXt. Basically I want to produce
> something like this:
>
> \ifnum\XeTeXcharglyph"010D=0
> \catcode`č=\active \def č{\ccaron}
> \else
> \catcode`č=\letter
> \fi % ConTeXt knows this letter -> better hyphenation
>
> \ifnum\XeTeXcharglyph"1E0D=0
> \catcode`ḍ=\active \def ḍ{\b{d}}
> \else
> \catcode`ḍ=\letter
> \fi % ConTeXt doesn't know this letter
No reason for not adding it.
> (with \other, respectively, for non-letters). Being somewhat of a
> novice to TeX programming, I'm not sure if this will work, though, and
> I'm also not sure if it's better to generate static scripts that do this
> for every font (so the resulting TeX file is a font-specific big list of
> \catcode`$CHARACTERs) or to do this dynamically on every font change,
> maybe limited to selectable Unicode ranges (which is more general but
> also a lot slower).
Generating this for every single font would be stupid. This should be
part of low-level XeTeX (Jonathan has promised to look into it some
time). In my opinion the best way to deal with it would be the ability
to define a fall-back definition for "every" missing letter in a font.
Consequently, if you have "ddotbelow" missing in your font, XeTeX
would ask ConTeXt if some fallback definition has been provided for
that glyph, If yes, it would fall back to it, "\b{d}", but if the
glyph would be present in that font, XeTeX would use it.
> > I'd prefer to see a context encoding added to GNU recode for the
> > benefit of future archeologists trying to decipher ancient documents.
>
> That would be better I guess, but isn't ConTeXt encoding a moving target
> in that characters can still get added? Or is the list fixed to AGL
> glyph names and nothing else?
No, it's certainly not fixed to AGL. But I wouldn't object adding it
to GNU recode (on top of "(La)TeX" which also recognizes \v, \b, ...)
if someone would decide to make a good revision of it and if more
people think that it would be useful (and if developers are open to
that idea). I try to use Unicode when writing sources whenever
possible.
Mojca
PS for Philipp: I didn't try out your definitions, but you have a cut
out of an older conversation as an example of what certainly doesn't
work under XeTeX ;)
(answer was written by Jonathan Kew) I was trying write a few macros
to support the old tfm-based fonts, but figured out that that was the
wrong starting point (and also other reason than yours).
> \catcode`ð=\active \defð{^^f0}
> \starttext
> Testing ... ð
> \stoptext
>
> and it seems to enter some infinite loop when ð is encountered (I can
> define any other letter as well, but only ^^f0 is causing problems).
No, this seems to me like it's the wrong way to define the character!
And I think you would have the same problem with other letters if
trying to define them as their own codes; the ones that work for you
must be getting defined as *different* codes from the original input.
The ^^xx notation is converted to a literal character by TeX's input
scanning routine, so it behaves exactly as if it were that character
itself. And ^^f0 in Latin-1 (or Unicode) is the ð character. So this
definition works exactly the same as if you were to say
\catcode`ð=\active \defð{ð}
which is clearly recursive.
Given that you don't need to remap ð in the input to some other
Unicode character for printing, there should be no need for this at
all. The only reason to use a definition like this would be if the
input text used a *different* character where you want to print eth;
or you want to print something *other* than character F0 for the
input ð.
In general, a "safe" form of the definition would be to use \chardef:
\catcode`ð=\active \chardefð="F0
This makes ð into a macro that expands to the character "F0; there is
an important difference between this and ^^f0, which actually
"becomes" the character ð itself as the input is read (and therefore
inherits its catcode, definition, etc).
_______________________________________________
ntg-context mailing list
ntg-context@ntg.nl
http://www.ntg.nl/mailman/listinfo/ntg-context
prev parent reply other threads:[~2006-11-09 16:47 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-04 9:30 Specifying BibTeX engine Philipp Reichmuth
2006-11-04 9:53 ` Taco Hoekwater
2006-11-04 10:05 ` Philipp Reichmuth
2006-11-04 15:09 ` gnwiii
2006-11-04 16:19 ` Unicode stuff (was: Re: Specifying BibTeX engine) Philipp Reichmuth
2006-11-09 16:47 ` Mojca Miklavec [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6faad9f00611090847l67df5b49w9a3ad313bad7cf2b@mail.gmail.com \
--to=mojca.miklavec.lists@gmail.com \
--cc=jonathan_kew@sil.org \
--cc=ntg-context@ntg.nl \
--cc=reichmuth@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).