ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Hans Hagen <pragma@wxs.nl>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: Problem with ConTeXt (MkIV), Hebrew and ligatures
Date: Mon, 01 Oct 2012 18:43:13 +0200	[thread overview]
Message-ID: <5069C821.3020004@wxs.nl> (raw)
In-Reply-To: <20121001162315.GA5059@phlegethon.router_intern>

On 1-10-2012 18:23, Philipp Gesang wrote:
> ···<date: 2012-10-01, Monday>···<from: Simo Ojala>···
>
>> On 09/29/2012 02:35 PM, Hans Hagen wrote:
>>> On 29-9-2012 01:41, Simo Ojala wrote:
>>>> Hans Hagen <pragma@wxs.nl>
>>>>
>>>> On 09/28/2012 11:46 AM, Hans Hagen wrote:
>>>>> On 27-9-2012 21:27, Simo Ojala wrote:
>>>>>> This is a problem originally posted in TeX/StackExchange. However,
>>>>>> since
>>>>>> I have not had any luck in finding a solution I post it here too. I am
>>>>>> confident that somebody here should know the answer.
>>>>>>
>>>>>>
>>>>>> http://tex.stackexchange.com/questions/73970/problem-with-context-mkiv-hebrew-and-ligatures
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> "Since I last played with the latest ConTeXt MkIV, there has been
>>>>>> introduced this new feature. It now seems to combine Hebrew characters
>>>>>> automatically when possible to ligatures. So for example. If I have a
>>>>>> word with following two characters:
>>>>>>
>>>>>> U+05D5 (HEBREW LETTER VAV)
>>>>>> U+05BC (HEBREW POINT DAGESH OR MAPIQ)
>>>>>>
>>>>>> ConTeXt will combine these to:
>>>>>>
>>>>>> U+FB35 (HEBREW LETTER VAV WITH DAGESH)
>>>>>>
>>>>>> However, I would need to disable this feature for a number of reasons.
>>>>>> For example, this breaks my little database query, because the query
>>>>>> key
>>>>>> is changed before(?) macro gets it.
>>>>>>
>>>>>> So if somebody would know how to turn this off and maybe also that what
>>>>>> has changed."
>>>>>
>>>>> It depends on the font ... normally you can disable this by *not* using
>>>>> the mark and mkmk features
>>>>>
>>>>> Hans
>>>>>
>>>>
>>>> Ok, I have now tried turning off all kinds of features without luck. So,
>>>> I tried putting together minimal test case. I suspect that there should
>>>> be done something more than just turn off some font features. However,
>>>> my ConTeXt skills are very limited so I can be wrong.
>>>>
>>>> The goal is that the word passed from ConTeXt file remains as it is
>>>> written and gives unicode characters U+5e1, U+5d5, U+5bc and U+5e1. This
>>>> is what already happens when the word is in the lua file.
>>>>
>>>> Simo
>>>>
>>>> PS: In case this matters. My ConTeXt MkIV version is "2012.09.23 12:40".
>>>> It should be the latest for Ubuntu 12.04 LTS Precise Pangolin that is in
>>>> the Adam Reviczky's PPA.
>>>>
>>>>
>>>> %% testcase.tex
>>>>
>>>> \definefontfeature[hebrew][arabic][script=hebr]
>>>> \definefont[dejavusans][name:dejavusans*hebrew at 26pt]
>>>> \setupdirections[bidi=global]
>>>>
>>>> \starttext
>>>> \dejavusans
>>>>
>>>> \def\Macro#1{\directlua{
>>>> dofile(resolvers.findfile("testcase.lua"))
>>>> userdata.testfunction("#1")
>>>> }}
>>>>
>>>> \Macro{סוּס}
>>>>
>>>> \blank[1cm]however, we can still color these independently\blank[0.5cm]
>>>>
>>>> \color[red]{ס}\color[green]{ו}\color[blue]{ּ}\color[yellow]{ס}
>>>>
>>>> \stoptext
>>>>
>>>>
>>>> -- testcase.lua
>>>>
>>>> userdata = userdata or {}
>>>>
>>>> function userdata.testfunction(word)
>>>>
>>>>      tex.sprint("\\blank[1cm]word passed by macro\\blank[0.5cm]")
>>>>
>>>>      for i = 1, unicode.utf8.len(word) do
>>>>          tex.sprint("U+" ..
>>>> string.format("%x",unicode.utf8.byte(word,i)) .. ": " ..
>>>> unicode.utf8.sub(word,i,i) .. "\\par" )
>>>>      end
>>>>
>>>>      tex.sprint("\\blank[1cm]word written in lua file\\blank[0.5cm]")
>>>>
>>>>      word = "סוּס"
>>>>
>>>>      for i = 1, unicode.utf8.len(word) do
>>>>          tex.sprint("U+" ..
>>>> string.format("%x",unicode.utf8.byte(word,i)) .. ": " ..
>>>> unicode.utf8.sub(word,i,i) .. "\\par" )
>>>>      end
>>>> end
>>>
>>> I see three characters next to each other so what exactly is the problem?
>>>
>>> (BTW, take a look at goodies-002.tex in the test suite ... you can
>>> define colored glyphs as a feature)
>>>
>>> Hans
>>>
>>
>> Sorry for being unclear, I try to clarify. The problem is:
>>
>> 1. I have tex file with which calls a macro with argument that has
>> characters U+5d5 and U+5bc.
>> 2. Macro passes argument further to lua code. When it gets there
>> characters have turned to U+fb35.
>
> Hi,
>
> I don’t have clue about hebrew but isn’t this a correct
> normalization[0], not a ligature? If so, the behavior of Luatex
> is perfectly fine. Lua otoh treats the string as a sequence of
> bytes, which is just how it treats strings everywhere.
>
> [0] http://www.unicode.org/charts/normalization/chart_Hebrew.html
>
> Regards
> Philipp

In that case you can try

utilities.sequencers.disableaction(resolvers.openers.helpers.textfileactions,"characters.filters.utf.collapse")

if this is needed, I can provide a directive for it

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2012-10-01 16:43 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-27 19:27 Simo Ojala
2012-09-28  8:46 ` Hans Hagen
2012-09-28 23:41   ` Simo Ojala
2012-09-29 11:35     ` Hans Hagen
2012-10-01 15:16       ` Simo Ojala
2012-10-01 16:23         ` Philipp Gesang
2012-10-01 16:43           ` Hans Hagen [this message]
2012-10-01 17:25             ` Philipp Gesang
2012-10-01 17:39               ` Hans Hagen
2012-10-01 20:18                 ` Philipp Gesang
2012-10-01 20:52                   ` Hans Hagen
2012-10-08 18:51       ` Simo Ojala
2012-10-08 19:10         ` Wolfgang Schuster
2012-10-10  0:17       ` Simo Ojala
2012-10-10  7:36         ` Sietse Brouwer
2012-10-11  0:52       ` Simo Ojala

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5069C821.3020004@wxs.nl \
    --to=pragma@wxs.nl \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).