···<date: 2012-10-01, Monday>···<from: Hans Hagen>···

> On 1-10-2012 18:23, Philipp Gesang wrote:
> >···<date: 2012-10-01, Monday>···<from: Simo Ojala>···
> >
> >>On 09/29/2012 02:35 PM, Hans Hagen wrote:
> >>>On 29-9-2012 01:41, Simo Ojala wrote:
> >>>>Hans Hagen <pragma@wxs.nl>
> >>>>
> >>>>On 09/28/2012 11:46 AM, Hans Hagen wrote:
> >>>>>On 27-9-2012 21:27, Simo Ojala wrote:
> >>>>>>This is a problem originally posted in TeX/StackExchange. However,
> >>>>>>since
> >>>>>>I have not had any luck in finding a solution I post it here too. I am
> >>>>>>confident that somebody here should know the answer.
> >>>>>>
> >>>>>>
> >>>>>>http://tex.stackexchange.com/questions/73970/problem-with-context-mkiv-hebrew-and-ligatures
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>"Since I last played with the latest ConTeXt MkIV, there has been
> >>>>>>introduced this new feature. It now seems to combine Hebrew characters
> >>>>>>automatically when possible to ligatures. So for example. If I have a
> >>>>>>word with following two characters:
> >>>>>>
> >>>>>>U+05D5 (HEBREW LETTER VAV)
> >>>>>>U+05BC (HEBREW POINT DAGESH OR MAPIQ)
> >>>>>>
> >>>>>>ConTeXt will combine these to:
> >>>>>>
> >>>>>>U+FB35 (HEBREW LETTER VAV WITH DAGESH)
> >>>>>>
> >>>>>>However, I would need to disable this feature for a number of reasons.
> >>>>>>For example, this breaks my little database query, because the query
> >>>>>>key
> >>>>>>is changed before(?) macro gets it.
> >>>>>>
> >>>>>>So if somebody would know how to turn this off and maybe also that what
> >>>>>>has changed."
> >>>>>
> >>>>>It depends on the font ... normally you can disable this by *not* using
> >>>>>the mark and mkmk features
> >>>>>
> >>>>>Hans
> >>>>>
> >>>>
> >>>>Ok, I have now tried turning off all kinds of features without luck. So,
> >>>>I tried putting together minimal test case. I suspect that there should
> >>>>be done something more than just turn off some font features. However,
> >>>>my ConTeXt skills are very limited so I can be wrong.
> >>>>
> >>>>The goal is that the word passed from ConTeXt file remains as it is
> >>>>written and gives unicode characters U+5e1, U+5d5, U+5bc and U+5e1. This
> >>>>is what already happens when the word is in the lua file.
> >>>>
> >>>>Simo
> >>>>
> >>>>PS: In case this matters. My ConTeXt MkIV version is "2012.09.23 12:40".
> >>>>It should be the latest for Ubuntu 12.04 LTS Precise Pangolin that is in
> >>>>the Adam Reviczky's PPA.
> >>>>
> >>>>
> >>>>%% testcase.tex
> >>>>
> >>>>\definefontfeature[hebrew][arabic][script=hebr]
> >>>>\definefont[dejavusans][name:dejavusans*hebrew at 26pt]
> >>>>\setupdirections[bidi=global]
> >>>>
> >>>>\starttext
> >>>>\dejavusans
> >>>>
> >>>>\def\Macro#1{\directlua{
> >>>>dofile(resolvers.findfile("testcase.lua"))
> >>>>userdata.testfunction("#1")
> >>>>}}
> >>>>
> >>>>\Macro{סוּס}
> >>>>
> >>>>\blank[1cm]however, we can still color these independently\blank[0.5cm]
> >>>>
> >>>>\color[red]{ס}\color[green]{ו}\color[blue]{ּ}\color[yellow]{ס}
> >>>>
> >>>>\stoptext
> >>>>
> >>>>
> >>>>-- testcase.lua
> >>>>
> >>>>userdata = userdata or {}
> >>>>
> >>>>function userdata.testfunction(word)
> >>>>
> >>>>     tex.sprint("\\blank[1cm]word passed by macro\\blank[0.5cm]")
> >>>>
> >>>>     for i = 1, unicode.utf8.len(word) do
> >>>>         tex.sprint("U+" ..
> >>>>string.format("%x",unicode.utf8.byte(word,i)) .. ": " ..
> >>>>unicode.utf8.sub(word,i,i) .. "\\par" )
> >>>>     end
> >>>>
> >>>>     tex.sprint("\\blank[1cm]word written in lua file\\blank[0.5cm]")
> >>>>
> >>>>     word = "סוּס"
> >>>>
> >>>>     for i = 1, unicode.utf8.len(word) do
> >>>>         tex.sprint("U+" ..
> >>>>string.format("%x",unicode.utf8.byte(word,i)) .. ": " ..
> >>>>unicode.utf8.sub(word,i,i) .. "\\par" )
> >>>>     end
> >>>>end
> >>>
> >>>I see three characters next to each other so what exactly is the problem?
> >>>
> >>>(BTW, take a look at goodies-002.tex in the test suite ... you can
> >>>define colored glyphs as a feature)
> >>>
> >>>Hans
> >>>
> >>
> >>Sorry for being unclear, I try to clarify. The problem is:
> >>
> >>1. I have tex file with which calls a macro with argument that has
> >>characters U+5d5 and U+5bc.
> >>2. Macro passes argument further to lua code. When it gets there
> >>characters have turned to U+fb35.
> >
> >Hi,
> >
> >I don’t have clue about hebrew but isn’t this a correct
> >normalization[0], not a ligature? If so, the behavior of Luatex
> >is perfectly fine. Lua otoh treats the string as a sequence of
> >bytes, which is just how it treats strings everywhere.
> >
> >[0] http://www.unicode.org/charts/normalization/chart_Hebrew.html
> >
> >Regards
> >Philipp
> 
> In that case you can try
> 
> utilities.sequencers.disableaction(resolvers.openers.helpers.textfileactions,"characters.filters.utf.collapse")

Doesn’t work. What helps is to comment out the “appendaction” in
char-utf.lua or the corresponding table for U0xfb35 in
char-def.lua. My guess is that this is the case because the .tex
file is processed *before* you can disable it.

Philipp


> 
> if this is needed, I can provide a directive for it
> 
> Hans
> 
> -----------------------------------------------------------------
>                                           Hans Hagen | PRAGMA ADE
>               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
>     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
>                                              | www.pragma-pod.nl
> -----------------------------------------------------------------
> ___________________________________________________________________________________
> If your question is of interest to others as well, please add an entry to the Wiki!
> 
> maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
> webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
> archive  : http://foundry.supelec.fr/projects/contextrev/
> wiki     : http://contextgarden.net
> ___________________________________________________________________________________

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments