From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/78506 Path: news.gmane.org!not-for-mail From: Philipp Gesang Newsgroups: gmane.comp.tex.context Subject: Re: Problem with ConTeXt (MkIV), Hebrew and ligatures Date: Mon, 1 Oct 2012 19:25:31 +0200 Message-ID: <20121001172531.GB5059@phlegethon.router_intern> References: <5064A89F.6030402@gmail.com> <506563FE.9040607@wxs.nl> <506635C7.1090009@gmail.com> <5066DCFD.7040106@wxs.nl> <5069B3BE.7060301@gmail.com> <20121001162315.GA5059@phlegethon.router_intern> <5069C821.3020004@wxs.nl> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1338348730==" X-Trace: ger.gmane.org 1349112753 5072 80.91.229.3 (1 Oct 2012 17:32:33 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 1 Oct 2012 17:32:33 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Mon Oct 01 19:32:38 2012 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TIjki-0001jA-ET for gctc-ntg-context-518@m.gmane.org; Mon, 01 Oct 2012 19:25:56 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 7E133101EB; Mon, 1 Oct 2012 19:25:50 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id l--vYcXl8ran; Mon, 1 Oct 2012 19:25:44 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id 197C0101EF; Mon, 1 Oct 2012 19:25:44 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 1E4D0101EF for ; Mon, 1 Oct 2012 19:25:42 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id n0J2F2LjXF5E for ; Mon, 1 Oct 2012 19:25:40 +0200 (CEST) Original-Received: from filter1-til.mf.surf.net (filter1-til.mf.surf.net [194.171.167.217]) by balder.ntg.nl (Postfix) with ESMTP id 841E2101EB for ; Mon, 1 Oct 2012 19:25:40 +0200 (CEST) Original-Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by filter1-til.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id q91HPdHx001726 for ; Mon, 1 Oct 2012 19:25:39 +0200 Original-Received: from ix.urz.uni-heidelberg.de (cyrus-portal.urz.uni-heidelberg.de [129.206.100.176]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q91HPZZQ014606 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 1 Oct 2012 19:25:35 +0200 Original-Received: from extmail.urz.uni-heidelberg.de (extmail.urz.uni-heidelberg.de [129.206.100.140]) by ix.urz.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q91HPYhW014703 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 1 Oct 2012 19:25:34 +0200 Original-Received: from localhost (p50846F0D.dip.t-dialin.net [80.132.111.13]) (authenticated bits=0) by extmail.urz.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id q91HPWot018910 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO); Mon, 1 Oct 2012 19:25:33 +0200 Mail-Followup-To: mailing list for ConTeXt users In-Reply-To: <5069C821.3020004@wxs.nl> X-Operating-System: Linux phlegethon 3.5.3-1-ARCH X-License: "CC-BY-SA 3.0" User-Agent: Mutt/1.5.21 (2010-09-15) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=129.206.210.211; country=DE; region=01; city=Heidelberg; latitude=49.4167; longitude=8.7000; http://maps.google.com/maps?q=49.4167,8.7000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0RI5RpDPg - 8679dfd3e888 - 20121001 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) on 194.171.167.217 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.14 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:78506 Archived-At: --===============1338348730== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="xXmbgvnjoT4axfJE" Content-Disposition: inline --xXmbgvnjoT4axfJE Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 > On 1-10-2012 18:23, Philipp Gesang wrote: > >=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 > > > >>On 09/29/2012 02:35 PM, Hans Hagen wrote: > >>>On 29-9-2012 01:41, Simo Ojala wrote: > >>>>Hans Hagen > >>>> > >>>>On 09/28/2012 11:46 AM, Hans Hagen wrote: > >>>>>On 27-9-2012 21:27, Simo Ojala wrote: > >>>>>>This is a problem originally posted in TeX/StackExchange. However, > >>>>>>since > >>>>>>I have not had any luck in finding a solution I post it here too. I= am > >>>>>>confident that somebody here should know the answer. > >>>>>> > >>>>>> > >>>>>>http://tex.stackexchange.com/questions/73970/problem-with-context-m= kiv-hebrew-and-ligatures > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>"Since I last played with the latest ConTeXt MkIV, there has been > >>>>>>introduced this new feature. It now seems to combine Hebrew charact= ers > >>>>>>automatically when possible to ligatures. So for example. If I have= a > >>>>>>word with following two characters: > >>>>>> > >>>>>>U+05D5 (HEBREW LETTER VAV) > >>>>>>U+05BC (HEBREW POINT DAGESH OR MAPIQ) > >>>>>> > >>>>>>ConTeXt will combine these to: > >>>>>> > >>>>>>U+FB35 (HEBREW LETTER VAV WITH DAGESH) > >>>>>> > >>>>>>However, I would need to disable this feature for a number of reaso= ns. > >>>>>>For example, this breaks my little database query, because the query > >>>>>>key > >>>>>>is changed before(?) macro gets it. > >>>>>> > >>>>>>So if somebody would know how to turn this off and maybe also that = what > >>>>>>has changed." > >>>>> > >>>>>It depends on the font ... normally you can disable this by *not* us= ing > >>>>>the mark and mkmk features > >>>>> > >>>>>Hans > >>>>> > >>>> > >>>>Ok, I have now tried turning off all kinds of features without luck. = So, > >>>>I tried putting together minimal test case. I suspect that there shou= ld > >>>>be done something more than just turn off some font features. However, > >>>>my ConTeXt skills are very limited so I can be wrong. > >>>> > >>>>The goal is that the word passed from ConTeXt file remains as it is > >>>>written and gives unicode characters U+5e1, U+5d5, U+5bc and U+5e1. T= his > >>>>is what already happens when the word is in the lua file. > >>>> > >>>>Simo > >>>> > >>>>PS: In case this matters. My ConTeXt MkIV version is "2012.09.23 12:4= 0". > >>>>It should be the latest for Ubuntu 12.04 LTS Precise Pangolin that is= in > >>>>the Adam Reviczky's PPA. > >>>> > >>>> > >>>>%% testcase.tex > >>>> > >>>>\definefontfeature[hebrew][arabic][script=3Dhebr] > >>>>\definefont[dejavusans][name:dejavusans*hebrew at 26pt] > >>>>\setupdirections[bidi=3Dglobal] > >>>> > >>>>\starttext > >>>>\dejavusans > >>>> > >>>>\def\Macro#1{\directlua{ > >>>>dofile(resolvers.findfile("testcase.lua")) > >>>>userdata.testfunction("#1") > >>>>}} > >>>> > >>>>\Macro{=D7=A1=D7=95=D6=BC=D7=A1} > >>>> > >>>>\blank[1cm]however, we can still color these independently\blank[0.5c= m] > >>>> > >>>>\color[red]{=D7=A1}\color[green]{=D7=95}\color[blue]{=D6=BC}\color[ye= llow]{=D7=A1} > >>>> > >>>>\stoptext > >>>> > >>>> > >>>>-- testcase.lua > >>>> > >>>>userdata =3D userdata or {} > >>>> > >>>>function userdata.testfunction(word) > >>>> > >>>> tex.sprint("\\blank[1cm]word passed by macro\\blank[0.5cm]") > >>>> > >>>> for i =3D 1, unicode.utf8.len(word) do > >>>> tex.sprint("U+" .. > >>>>string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. > >>>>unicode.utf8.sub(word,i,i) .. "\\par" ) > >>>> end > >>>> > >>>> tex.sprint("\\blank[1cm]word written in lua file\\blank[0.5cm]") > >>>> > >>>> word =3D "=D7=A1=D7=95=D6=BC=D7=A1" > >>>> > >>>> for i =3D 1, unicode.utf8.len(word) do > >>>> tex.sprint("U+" .. > >>>>string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. > >>>>unicode.utf8.sub(word,i,i) .. "\\par" ) > >>>> end > >>>>end > >>> > >>>I see three characters next to each other so what exactly is the probl= em? > >>> > >>>(BTW, take a look at goodies-002.tex in the test suite ... you can > >>>define colored glyphs as a feature) > >>> > >>>Hans > >>> > >> > >>Sorry for being unclear, I try to clarify. The problem is: > >> > >>1. I have tex file with which calls a macro with argument that has > >>characters U+5d5 and U+5bc. > >>2. Macro passes argument further to lua code. When it gets there > >>characters have turned to U+fb35. > > > >Hi, > > > >I don=E2=80=99t have clue about hebrew but isn=E2=80=99t this a correct > >normalization[0], not a ligature? If so, the behavior of Luatex > >is perfectly fine. Lua otoh treats the string as a sequence of > >bytes, which is just how it treats strings everywhere. > > > >[0] http://www.unicode.org/charts/normalization/chart_Hebrew.html > > > >Regards > >Philipp >=20 > In that case you can try >=20 > utilities.sequencers.disableaction(resolvers.openers.helpers.textfileacti= ons,"characters.filters.utf.collapse") Doesn=E2=80=99t work. What helps is to comment out the =E2=80=9Cappendactio= n=E2=80=9D in char-utf.lua or the corresponding table for U0xfb35 in char-def.lua. My guess is that this is the case because the .tex file is processed *before* you can disable it. Philipp >=20 > if this is needed, I can provide a directive for it >=20 > Hans >=20 > ----------------------------------------------------------------- > Hans Hagen | PRAGMA ADE > Ridderstraat 27 | 8061 GH Hasselt | The Netherlands > tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com > | www.pragma-pod.nl > ----------------------------------------------------------------- > _________________________________________________________________________= __________ > If your question is of interest to others as well, please add an entry to= the Wiki! >=20 > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-co= ntext > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > _________________________________________________________________________= __________ --=20 () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments --xXmbgvnjoT4axfJE Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlBp0gsACgkQ02lYlJYWs9La0wCgh54DdEXoJJX2Gzos4AdKTEjg st4An2z9PujXZmT66OspZ+ImleJmQ+2B =IdYM -----END PGP SIGNATURE----- --xXmbgvnjoT4axfJE-- --===============1338348730== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============1338348730==--