ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Arthur Reutenauer <arthur.reutenauer@normalesup.org>
To: Mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: active strings in luatex?
Date: Sun, 13 Jan 2008 04:31:42 +0100	[thread overview]
Message-ID: <20080113033142.GB12550@phare.normalesup.org> (raw)
In-Reply-To: <op.t315m8m9nx1yh1@your-b27fb1c401>

[-- Attachment #1: Type: text/plain, Size: 2942 bytes --]

	Hello Idris,

  I didn't see any reply to this e-mail you sent two weeks ago, so I
wanted to give it a try:

>          In luatex can I make a definition such that such that the string
> 
> U004C U0303 (l ̃)
> 
> is always treated as l with tilde above, taking into account italics and  
> without using \~l (which does not work in, eg, footnote)?

  What you want here is to support the Unicode combining characters,
which isn't straightforward in TeX because according to the Standard,
they come after the base letter they modify, while TeX's accent commands
are, of course, typed before.  So you can't simply make the combining
characters active and equivalent to the appropriate accent macros.

  In traditional TeX, it would have been tempting to make the base letter
active instead, but this has a lot of drawbacks, and LuaTeX offers many
other possibilities.  Here I've used a set of macros that Taco had
written a couple of months ago in response to a question by Thomas
Schmitz (see http://www.ntg.nl/pipermail/ntg-context/2007/027095.html).
The attached file implements the transformation of the sequence <LATIN
SMALL LETTER L, COMBINING TILDE> in "\buildtextaccent\texttilde l",
which I hope gives the expected result in every circumstance.  I've done
it only for the small letter, but of course it's easy to adapt to add
the capital letter as well.

  Finally, I wish to clarify a small misunderstanding: you quoted the
two lines below:

	LATIN CAPITAL LETTER L WITH TILDE;004C 0303
	LATIN SMALL LETTER L WITH TILDE;006C 0303

with the comment "The proposal is still under consideration for
Lithuanian and not yet in Unicode".  Actually it is already encoded in
Unicode; that is, all the characters you need are present with the
appropriate semantics, and you can accurately represent a small l with
tilde in Unicode; only, you have to use two characters (U+006C followed
by U+0303).  The only thing that will be added to Unicode in that
respect is the *name* of those strings (I guess you took those two lines
from the data files for Unicode version 5.1.0, in beta stage).  The
corresponding characters, though, will not be added to Unicode,
according to a decision which has been made several years ago (I could
trace it back to a discussion at the Unicode Technical Committee in
October 1999, but I don't know the details).  The idea is that it can
already be represented as a sequence of characters, and the Unicode
Consortium does not wish to make the set of alphabetic characters
explode with diacritics.

  In spite of this, Unicode still wishes to acknowledge that some
unencoded accented letters are important in some languages, and provides
names for the character sequences representing them, like it does for
all the encoded characters.  The relevant document that explains this is
Unicode Standard Annex #34 (http://www.unicode.org/reports/tr34/).

	Arthur

[-- Attachment #2: combining_tilde.tex --]
[-- Type: text/x-tex, Size: 2715 bytes --]

% engine=luatex

% Macros to handle a particular combining sequence of Unicode characters
% in ConTeXt Mark IV by modifying the token list.
% © A. Reutenauer, January 2008.
% This file is distributed under the terms of the WTF Public License
% (http://sam.zoy.org/wftpl/)

\usetypescript[iwona]
\setupbodyfont[iwona, 14pt]

% Convert the sequence <U+006C LATIN SMALL LETTER L, U+0303 COMBINING TILDE>
% to an appropriate ConTeXt representation (\buildtextaccent\texttilde l).
% Strongly influenced by macros by Taco.
% See http://www.ntg.nl/pipermail/ntg-context/2007/027095.html
\def\handletokens[#1][#2]{\ctxlua{collectors.handle("#1", #2)}}

\def\startcombining{\ctxlua{collectors.install("combining", "stopcombining")}}

\startluacode
  -- The actual conversion function: we loop over the characters in 'str'.
  function convert_combining(str)
    -- l is true if we have just read an 'l'.
    -- t is the list of tokens read thus far.
    local l, t = false, { }
    -- The following should check if we read ‘l’ and ‘combining tilde’
    -- consecutively.  A lot of overhead; it would be much prettier to
    -- implement a finite automaton :-)
    for _, v in ipairs(str) do
      if not l then
        if v[2] == 0x6c -- v is LATIN SMALL LETTER L: set l to true, and hold
        then l = true
        else t[#t+1] = v -- Otherwise, append v to the token list
        end
      else -- l is true
        if v[2] == 0x0303 then -- v is COMBINING TILDE
          -- Found!  Append the ConTeXt sequence for “l with tilde” to t!
          t[#t+1] = token.create('buildtextaccent')
          t[#t+1] = token.create('texttilde')
          t[#t+1] =  token.create(0x6c, 11)
          l = false -- Don't forget to set l back to false
        else -- This is annoying: we need to check if v is ‘l’ again.
          t[#t+1] = token.create(0x6c, 11) -- First append the previous ‘l’
          if v[2] == 0x6c -- v is LATIN SMALL LETTER L: start all over again
          then l = true
          else
            t[#t+1] = v
            l = false
          end
        end -- of "if l"
      end -- of "if not l"
    end -- of for loop
    return t
  end -- of function
\stopluacode

\def\stopcombining
  {\handletokens[combining][convert_combining]
   \flushtokens[combining]}

% Now we can use \start ... \stopcombining below.

\starttext

% There are two “l with tilde”: one on the second ‘l’ of “Hell̃o”, and the
% other one on “kal̃bame”.  (No, Idris, the Lithuanian radical kalb-
% doesn't mean “dog” ;-)
\startcombining
Hell̃o, world! Mẽs visì kal̃bame lietùviškai.
\stopcombining

\stoptext

[-- Attachment #3: combining_tilde.pdf --]
[-- Type: application/pdf, Size: 3872 bytes --]

[-- Attachment #4: Type: text/plain, Size: 487 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2008-01-13  3:31 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-28 18:29 Idris Samawi Hamid
2008-01-13  3:31 ` Arthur Reutenauer [this message]
2008-01-13 11:18   ` Taco Hoekwater
2008-01-13 14:38     ` Mojca Miklavec
2008-01-14  0:20       ` Arthur Reutenauer
2008-01-13 18:51     ` Arthur Reutenauer
2008-01-13 22:59   ` Hans Hagen
2008-01-13 23:25     ` Idris Samawi Hamid
2008-01-13 23:30     ` Arthur Reutenauer
2008-01-14  8:59       ` Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080113033142.GB12550@phare.normalesup.org \
    --to=arthur.reutenauer@normalesup.org \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).