ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Hans Hagen <pragma@wxs.nl>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: Greek in luatex
Date: Thu, 13 Sep 2007 19:51:43 +0200	[thread overview]
Message-ID: <46E978AF.7030606@wxs.nl> (raw)
In-Reply-To: <0F55D1DC-0E34-437A-8383-1D40A28E8375@uni-bonn.de>

Thomas A. Schmitz wrote:

>>   For your general problem you need to define a new regime that will
>> map each relevant character sequence to the corresponding Unicode
>> character.  That is, you inform ConTeXt that the character stream  
>> it sees
>> is actually a way of coding another set of characters and that it can
>> forget the original stream.  This treatment should be done before  
>> any sort
>> of font property intervenes, because it does not depend on the
>> appearance of the typeset text.  That's what regimes are for.

regimes are a solution, but what solution is best depends on the input 
stream ... whole document? partial document? also written to external 
files? evenually everything can become a unicode, (private aereas) and 
as such travel through the system; of we can misuse virtual fonts ...

>> we could plug into the input stream reading routine (just like other
>> regimes work).

there are mechanisms for that (because that's what i played al lot with 
last year; there was (maybe even is) a mechanism for chained processing 
of input etc

>> actually tell ConTeXt that you are handling Latin characters with a
>> special appearance (that the font takes care of), so for example, the
>> underlying text in a PDF would be a stream of Latin characters, and
>> copying-and-pasting would yield Latin characters, not Greek.

not entirely true ... we can (and do) intercept the node stream ... ok, 
at that point we're dealing with a font/char pair, but we can chang ethe 
char (or node) to whatever we like ... depends on the problem

> The question of copy-and-paste is one of the big mysteries, and I  
> have no clue why it works in some cases, but not in others. Right  
> now, on my system (OS X 10.4), only Adobe Reader 8.0 does copy-paste  
> correctly, and it does it correctly no matter if I use babel or  
> Unicode input. Never touch a running system: I just take this as  
> some  sort of divine favor and leave it at that...

that's a matter of associating tounicode points, of course, no unicode 
means no copy/paste -)

>> That is
>> not what you want here: you want your "a" to be understood as "alpha"
>> and your "less-than acute-sign w vertical-bar" to be considered an
>> "omega with dasia, varia and subscribed iota".  Nor should you  
>> think of
>> these transformations as a collection of ligatures (which act at the
>> font level), but rather as a text encoding, just like UTF-8 is an
>> encoding of the Unicode characters: in UTF-8 the byte sequence
>> "hexadecimal byte E1, hexadecimal byte BC, hexadecimal byte 80" is the
>> coding for the Unicode character U+1F00 GREEK SMALL LETTER ALPHA  
>> WITH PSILI,
>> and in the Babel input scheme for Ancient Greek the same character is
>> encoded with the byte sequence "hexadecimal byte 3C [ASCII '<'],
>> hexadecimal byte 61 [ASCII 'a']".
> 
> Yes, that's crystal clear. It would also take care of another  
> problem: in the input stream, you know exactly which character  
> sequence translates to what. On the font level, legacy fonts  
> sometimes have their own ideas about where to put certain glyphs.

depends ... the input char becomes a node, now, if (probably controlled 
by attributes) a certain char is sees (say 'a') and you want it to be an 
alpha, well, we can change that char then in the node,

>>   Of course in the past, these transformations were handled at the  
>> font
>> level and sequences like "< a" were actually ligatures, because  
>> that was
>> all we had (and copypasting from a PDF was, mostly, doomed to  
>> fail); but
>> we should not persist in that use now we can treat them as real  
>> Unicode
>> characters.

those hard coded mechanism were indeed not sufficient

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


      parent reply	other threads:[~2007-09-13 17:51 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-09-01 10:56 Thomas A. Schmitz
2007-09-11  6:47 ` Thomas A. Schmitz
2007-09-11 10:12   ` Hans Hagen
2007-09-13  1:15   ` Arthur Reutenauer
2007-09-13  7:03     ` Taco Hoekwater
2007-09-13 10:24       ` Arthur Reutenauer
2007-09-13 11:38         ` Taco Hoekwater
2007-09-13 12:54           ` Thomas A. Schmitz
2007-09-13 18:36           ` Arthur Reutenauer
2007-09-13 18:49             ` Hans Hagen
2007-09-13 19:24             ` Hans Hagen
2007-09-13 19:45               ` Arthur Reutenauer
2007-09-13 20:20                 ` Hans Hagen
2007-09-14  0:24                   ` Arthur Reutenauer
2007-09-13 20:38                 ` Thomas A. Schmitz
2007-09-13 21:05                   ` Hans Hagen
2007-09-13 21:52                     ` Taco Hoekwater
2007-09-15 23:22                   ` Arthur Reutenauer
2007-09-16  6:56                     ` Taco Hoekwater
2007-09-16  8:22                     ` Taco Hoekwater
2007-09-16 13:01                       ` Thomas A. Schmitz
2007-09-16 23:12                         ` Hans Hagen
2007-09-16 13:08                       ` Arthur Reutenauer
2007-09-16 13:44                         ` Thomas A. Schmitz
2007-09-17  8:48                     ` Hans Hagen
2007-09-13 17:42       ` Hans Hagen
2007-09-13  9:45     ` Thomas A. Schmitz
2007-09-13 10:49       ` Arthur Reutenauer
2007-09-13 12:51         ` Thomas A. Schmitz
2007-09-13 14:25         ` Taco Hoekwater
2007-09-13 17:51       ` Hans Hagen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46E978AF.7030606@wxs.nl \
    --to=pragma@wxs.nl \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).