········· > On 1-10-2012 18:23, Philipp Gesang wrote: > >········· > > > >>On 09/29/2012 02:35 PM, Hans Hagen wrote: > >>>On 29-9-2012 01:41, Simo Ojala wrote: > >>>>Hans Hagen > >>>> > >>>>On 09/28/2012 11:46 AM, Hans Hagen wrote: > >>>>>On 27-9-2012 21:27, Simo Ojala wrote: > >>>>>>This is a problem originally posted in TeX/StackExchange. However, > >>>>>>since > >>>>>>I have not had any luck in finding a solution I post it here too. I am > >>>>>>confident that somebody here should know the answer. > >>>>>> > >>>>>> > >>>>>>http://tex.stackexchange.com/questions/73970/problem-with-context-mkiv-hebrew-and-ligatures > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>"Since I last played with the latest ConTeXt MkIV, there has been > >>>>>>introduced this new feature. It now seems to combine Hebrew characters > >>>>>>automatically when possible to ligatures. So for example. If I have a > >>>>>>word with following two characters: > >>>>>> > >>>>>>U+05D5 (HEBREW LETTER VAV) > >>>>>>U+05BC (HEBREW POINT DAGESH OR MAPIQ) > >>>>>> > >>>>>>ConTeXt will combine these to: > >>>>>> > >>>>>>U+FB35 (HEBREW LETTER VAV WITH DAGESH) > >>>>>> > >>>>>>However, I would need to disable this feature for a number of reasons. > >>>>>>For example, this breaks my little database query, because the query > >>>>>>key > >>>>>>is changed before(?) macro gets it. > >>>>>> > >>>>>>So if somebody would know how to turn this off and maybe also that what > >>>>>>has changed." > >>>>> > >>>>>It depends on the font ... normally you can disable this by *not* using > >>>>>the mark and mkmk features > >>>>> > >>>>>Hans > >>>>> > >>>> > >>>>Ok, I have now tried turning off all kinds of features without luck. So, > >>>>I tried putting together minimal test case. I suspect that there should > >>>>be done something more than just turn off some font features. However, > >>>>my ConTeXt skills are very limited so I can be wrong. > >>>> > >>>>The goal is that the word passed from ConTeXt file remains as it is > >>>>written and gives unicode characters U+5e1, U+5d5, U+5bc and U+5e1. This > >>>>is what already happens when the word is in the lua file. > >>>> > >>>>Simo > >>>> > >>>>PS: In case this matters. My ConTeXt MkIV version is "2012.09.23 12:40". > >>>>It should be the latest for Ubuntu 12.04 LTS Precise Pangolin that is in > >>>>the Adam Reviczky's PPA. > >>>> > >>>> > >>>>%% testcase.tex > >>>> > >>>>\definefontfeature[hebrew][arabic][script=hebr] > >>>>\definefont[dejavusans][name:dejavusans*hebrew at 26pt] > >>>>\setupdirections[bidi=global] > >>>> > >>>>\starttext > >>>>\dejavusans > >>>> > >>>>\def\Macro#1{\directlua{ > >>>>dofile(resolvers.findfile("testcase.lua")) > >>>>userdata.testfunction("#1") > >>>>}} > >>>> > >>>>\Macro{סוּס} > >>>> > >>>>\blank[1cm]however, we can still color these independently\blank[0.5cm] > >>>> > >>>>\color[red]{ס}\color[green]{ו}\color[blue]{ּ}\color[yellow]{ס} > >>>> > >>>>\stoptext > >>>> > >>>> > >>>>-- testcase.lua > >>>> > >>>>userdata = userdata or {} > >>>> > >>>>function userdata.testfunction(word) > >>>> > >>>> tex.sprint("\\blank[1cm]word passed by macro\\blank[0.5cm]") > >>>> > >>>> for i = 1, unicode.utf8.len(word) do > >>>> tex.sprint("U+" .. > >>>>string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. > >>>>unicode.utf8.sub(word,i,i) .. "\\par" ) > >>>> end > >>>> > >>>> tex.sprint("\\blank[1cm]word written in lua file\\blank[0.5cm]") > >>>> > >>>> word = "סוּס" > >>>> > >>>> for i = 1, unicode.utf8.len(word) do > >>>> tex.sprint("U+" .. > >>>>string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. > >>>>unicode.utf8.sub(word,i,i) .. "\\par" ) > >>>> end > >>>>end > >>> > >>>I see three characters next to each other so what exactly is the problem? > >>> > >>>(BTW, take a look at goodies-002.tex in the test suite ... you can > >>>define colored glyphs as a feature) > >>> > >>>Hans > >>> > >> > >>Sorry for being unclear, I try to clarify. The problem is: > >> > >>1. I have tex file with which calls a macro with argument that has > >>characters U+5d5 and U+5bc. > >>2. Macro passes argument further to lua code. When it gets there > >>characters have turned to U+fb35. > > > >Hi, > > > >I don’t have clue about hebrew but isn’t this a correct > >normalization[0], not a ligature? If so, the behavior of Luatex > >is perfectly fine. Lua otoh treats the string as a sequence of > >bytes, which is just how it treats strings everywhere. > > > >[0] http://www.unicode.org/charts/normalization/chart_Hebrew.html > > > >Regards > >Philipp > > In that case you can try > > utilities.sequencers.disableaction(resolvers.openers.helpers.textfileactions,"characters.filters.utf.collapse") Doesn’t work. What helps is to comment out the “appendaction” in char-utf.lua or the corresponding table for U0xfb35 in char-def.lua. My guess is that this is the case because the .tex file is processed *before* you can disable it. Philipp > > if this is needed, I can provide a directive for it > > Hans > > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com > | www.pragma-pod.nl > ----------------------------------------------------------------- > ___________________________________________________________________________________ > If your question is of interest to others as well, please add an entry to the Wiki! > > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > ___________________________________________________________________________________ -- () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments