From: Hans Hagen <pragma@wxs.nl>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: Greek in luatex
Date: Thu, 13 Sep 2007 19:51:43 +0200 [thread overview]
Message-ID: <46E978AF.7030606@wxs.nl> (raw)
In-Reply-To: <0F55D1DC-0E34-437A-8383-1D40A28E8375@uni-bonn.de>
Thomas A. Schmitz wrote:
>> For your general problem you need to define a new regime that will
>> map each relevant character sequence to the corresponding Unicode
>> character. That is, you inform ConTeXt that the character stream
>> it sees
>> is actually a way of coding another set of characters and that it can
>> forget the original stream. This treatment should be done before
>> any sort
>> of font property intervenes, because it does not depend on the
>> appearance of the typeset text. That's what regimes are for.
regimes are a solution, but what solution is best depends on the input
stream ... whole document? partial document? also written to external
files? evenually everything can become a unicode, (private aereas) and
as such travel through the system; of we can misuse virtual fonts ...
>> we could plug into the input stream reading routine (just like other
>> regimes work).
there are mechanisms for that (because that's what i played al lot with
last year; there was (maybe even is) a mechanism for chained processing
of input etc
>> actually tell ConTeXt that you are handling Latin characters with a
>> special appearance (that the font takes care of), so for example, the
>> underlying text in a PDF would be a stream of Latin characters, and
>> copying-and-pasting would yield Latin characters, not Greek.
not entirely true ... we can (and do) intercept the node stream ... ok,
at that point we're dealing with a font/char pair, but we can chang ethe
char (or node) to whatever we like ... depends on the problem
> The question of copy-and-paste is one of the big mysteries, and I
> have no clue why it works in some cases, but not in others. Right
> now, on my system (OS X 10.4), only Adobe Reader 8.0 does copy-paste
> correctly, and it does it correctly no matter if I use babel or
> Unicode input. Never touch a running system: I just take this as
> some sort of divine favor and leave it at that...
that's a matter of associating tounicode points, of course, no unicode
means no copy/paste -)
>> That is
>> not what you want here: you want your "a" to be understood as "alpha"
>> and your "less-than acute-sign w vertical-bar" to be considered an
>> "omega with dasia, varia and subscribed iota". Nor should you
>> think of
>> these transformations as a collection of ligatures (which act at the
>> font level), but rather as a text encoding, just like UTF-8 is an
>> encoding of the Unicode characters: in UTF-8 the byte sequence
>> "hexadecimal byte E1, hexadecimal byte BC, hexadecimal byte 80" is the
>> coding for the Unicode character U+1F00 GREEK SMALL LETTER ALPHA
>> WITH PSILI,
>> and in the Babel input scheme for Ancient Greek the same character is
>> encoded with the byte sequence "hexadecimal byte 3C [ASCII '<'],
>> hexadecimal byte 61 [ASCII 'a']".
>
> Yes, that's crystal clear. It would also take care of another
> problem: in the input stream, you know exactly which character
> sequence translates to what. On the font level, legacy fonts
> sometimes have their own ideas about where to put certain glyphs.
depends ... the input char becomes a node, now, if (probably controlled
by attributes) a certain char is sees (say 'a') and you want it to be an
alpha, well, we can change that char then in the node,
>> Of course in the past, these transformations were handled at the
>> font
>> level and sequences like "< a" were actually ligatures, because
>> that was
>> all we had (and copypasting from a PDF was, mostly, doomed to
>> fail); but
>> we should not persist in that use now we can treat them as real
>> Unicode
>> characters.
those hard coded mechanism were indeed not sufficient
Hans
-----------------------------------------------------------------
Hans Hagen | PRAGMA ADE
Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
| www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : https://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
prev parent reply other threads:[~2007-09-13 17:51 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-09-01 10:56 Thomas A. Schmitz
2007-09-11 6:47 ` Thomas A. Schmitz
2007-09-11 10:12 ` Hans Hagen
2007-09-13 1:15 ` Arthur Reutenauer
2007-09-13 7:03 ` Taco Hoekwater
2007-09-13 10:24 ` Arthur Reutenauer
2007-09-13 11:38 ` Taco Hoekwater
2007-09-13 12:54 ` Thomas A. Schmitz
2007-09-13 18:36 ` Arthur Reutenauer
2007-09-13 18:49 ` Hans Hagen
2007-09-13 19:24 ` Hans Hagen
2007-09-13 19:45 ` Arthur Reutenauer
2007-09-13 20:20 ` Hans Hagen
2007-09-14 0:24 ` Arthur Reutenauer
2007-09-13 20:38 ` Thomas A. Schmitz
2007-09-13 21:05 ` Hans Hagen
2007-09-13 21:52 ` Taco Hoekwater
2007-09-15 23:22 ` Arthur Reutenauer
2007-09-16 6:56 ` Taco Hoekwater
2007-09-16 8:22 ` Taco Hoekwater
2007-09-16 13:01 ` Thomas A. Schmitz
2007-09-16 23:12 ` Hans Hagen
2007-09-16 13:08 ` Arthur Reutenauer
2007-09-16 13:44 ` Thomas A. Schmitz
2007-09-17 8:48 ` Hans Hagen
2007-09-13 17:42 ` Hans Hagen
2007-09-13 9:45 ` Thomas A. Schmitz
2007-09-13 10:49 ` Arthur Reutenauer
2007-09-13 12:51 ` Thomas A. Schmitz
2007-09-13 14:25 ` Taco Hoekwater
2007-09-13 17:51 ` Hans Hagen [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46E978AF.7030606@wxs.nl \
--to=pragma@wxs.nl \
--cc=ntg-context@ntg.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).