From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/78504 Path: news.gmane.org!not-for-mail From: Philipp Gesang Newsgroups: gmane.comp.tex.context Subject: Re: Problem with ConTeXt (MkIV), Hebrew and ligatures Date: Mon, 1 Oct 2012 18:23:15 +0200 Message-ID: <20121001162315.GA5059@phlegethon.router_intern> References: <5064A89F.6030402@gmail.com> <506563FE.9040607@wxs.nl> <506635C7.1090009@gmail.com> <5066DCFD.7040106@wxs.nl> <5069B3BE.7060301@gmail.com> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1556445873==" X-Trace: ger.gmane.org 1349109191 4306 80.91.229.3 (1 Oct 2012 16:33:11 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Mon, 1 Oct 2012 16:33:11 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Mon Oct 01 18:33:16 2012 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([195.12.62.10]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1TIimL-0000Dx-Uq for gctc-ntg-context-518@m.gmane.org; Mon, 01 Oct 2012 18:23:34 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id BD1881020C; Mon, 1 Oct 2012 18:23:27 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 1L5VVCcW7n09; Mon, 1 Oct 2012 18:23:24 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id 2D4F5101EC; Mon, 1 Oct 2012 18:23:24 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id AC084101EC for ; Mon, 1 Oct 2012 18:23:22 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id pI+PBuCCcgzX for ; Mon, 1 Oct 2012 18:23:21 +0200 (CEST) Original-Received: from filter2-til.mf.surf.net (filter2-til.mf.surf.net [194.171.167.218]) by balder.ntg.nl (Postfix) with ESMTP id 1226D101EB for ; Mon, 1 Oct 2012 18:23:20 +0200 (CEST) Original-Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by filter2-til.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id q91GNJjZ017000 for ; Mon, 1 Oct 2012 18:23:20 +0200 Original-Received: from ix.urz.uni-heidelberg.de (cyrus-portal.urz.uni-heidelberg.de [129.206.100.176]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q91GNIOc022750 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 1 Oct 2012 18:23:19 +0200 Original-Received: from extmail.urz.uni-heidelberg.de (extmail.urz.uni-heidelberg.de [129.206.100.140]) by ix.urz.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id q91GNIuK029122 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Mon, 1 Oct 2012 18:23:18 +0200 Original-Received: from localhost (p50846F0D.dip.t-dialin.net [80.132.111.13]) (authenticated bits=0) by extmail.urz.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id q91GNG77011186 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES128-SHA bits=128 verify=NO) for ; Mon, 1 Oct 2012 18:23:17 +0200 Mail-Followup-To: mailing list for ConTeXt users In-Reply-To: <5069B3BE.7060301@gmail.com> X-Operating-System: Linux phlegethon 3.5.3-1-ARCH X-License: "CC-BY-SA 3.0" User-Agent: Mutt/1.5.21 (2010-09-15) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=129.206.210.211; country=DE; region=01; city=Heidelberg; latitude=49.4167; longitude=8.7000; http://maps.google.com/maps?q=49.4167,8.7000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0TI5QnkCl - c256c3526369 - 20121001 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) on 194.171.167.218 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.14 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:78504 Archived-At: --===============1556445873== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="cWoXeonUoKmBZSoM" Content-Disposition: inline --cWoXeonUoKmBZSoM Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 > On 09/29/2012 02:35 PM, Hans Hagen wrote: > >On 29-9-2012 01:41, Simo Ojala wrote: > >>Hans Hagen > >> > >>On 09/28/2012 11:46 AM, Hans Hagen wrote: > >>>On 27-9-2012 21:27, Simo Ojala wrote: > >>>>This is a problem originally posted in TeX/StackExchange. However, > >>>>since > >>>>I have not had any luck in finding a solution I post it here too. I am > >>>>confident that somebody here should know the answer. > >>>> > >>>> > >>>>http://tex.stackexchange.com/questions/73970/problem-with-context-mki= v-hebrew-and-ligatures > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>"Since I last played with the latest ConTeXt MkIV, there has been > >>>>introduced this new feature. It now seems to combine Hebrew characters > >>>>automatically when possible to ligatures. So for example. If I have a > >>>>word with following two characters: > >>>> > >>>>U+05D5 (HEBREW LETTER VAV) > >>>>U+05BC (HEBREW POINT DAGESH OR MAPIQ) > >>>> > >>>>ConTeXt will combine these to: > >>>> > >>>>U+FB35 (HEBREW LETTER VAV WITH DAGESH) > >>>> > >>>>However, I would need to disable this feature for a number of reasons. > >>>>For example, this breaks my little database query, because the query > >>>>key > >>>>is changed before(?) macro gets it. > >>>> > >>>>So if somebody would know how to turn this off and maybe also that wh= at > >>>>has changed." > >>> > >>>It depends on the font ... normally you can disable this by *not* using > >>>the mark and mkmk features > >>> > >>>Hans > >>> > >> > >>Ok, I have now tried turning off all kinds of features without luck. So, > >>I tried putting together minimal test case. I suspect that there should > >>be done something more than just turn off some font features. However, > >>my ConTeXt skills are very limited so I can be wrong. > >> > >>The goal is that the word passed from ConTeXt file remains as it is > >>written and gives unicode characters U+5e1, U+5d5, U+5bc and U+5e1. This > >>is what already happens when the word is in the lua file. > >> > >>Simo > >> > >>PS: In case this matters. My ConTeXt MkIV version is "2012.09.23 12:40". > >>It should be the latest for Ubuntu 12.04 LTS Precise Pangolin that is in > >>the Adam Reviczky's PPA. > >> > >> > >>%% testcase.tex > >> > >>\definefontfeature[hebrew][arabic][script=3Dhebr] > >>\definefont[dejavusans][name:dejavusans*hebrew at 26pt] > >>\setupdirections[bidi=3Dglobal] > >> > >>\starttext > >>\dejavusans > >> > >>\def\Macro#1{\directlua{ > >>dofile(resolvers.findfile("testcase.lua")) > >>userdata.testfunction("#1") > >>}} > >> > >>\Macro{=D7=A1=D7=95=D6=BC=D7=A1} > >> > >>\blank[1cm]however, we can still color these independently\blank[0.5cm] > >> > >>\color[red]{=D7=A1}\color[green]{=D7=95}\color[blue]{=D6=BC}\color[yell= ow]{=D7=A1} > >> > >>\stoptext > >> > >> > >>-- testcase.lua > >> > >>userdata =3D userdata or {} > >> > >>function userdata.testfunction(word) > >> > >> tex.sprint("\\blank[1cm]word passed by macro\\blank[0.5cm]") > >> > >> for i =3D 1, unicode.utf8.len(word) do > >> tex.sprint("U+" .. > >>string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. > >>unicode.utf8.sub(word,i,i) .. "\\par" ) > >> end > >> > >> tex.sprint("\\blank[1cm]word written in lua file\\blank[0.5cm]") > >> > >> word =3D "=D7=A1=D7=95=D6=BC=D7=A1" > >> > >> for i =3D 1, unicode.utf8.len(word) do > >> tex.sprint("U+" .. > >>string.format("%x",unicode.utf8.byte(word,i)) .. ": " .. > >>unicode.utf8.sub(word,i,i) .. "\\par" ) > >> end > >>end > > > >I see three characters next to each other so what exactly is the problem? > > > >(BTW, take a look at goodies-002.tex in the test suite ... you can > >define colored glyphs as a feature) > > > >Hans > > >=20 > Sorry for being unclear, I try to clarify. The problem is: >=20 > 1. I have tex file with which calls a macro with argument that has > characters U+5d5 and U+5bc. > 2. Macro passes argument further to lua code. When it gets there > characters have turned to U+fb35. Hi, I don=E2=80=99t have clue about hebrew but isn=E2=80=99t this a correct normalization[0], not a ligature? If so, the behavior of Luatex is perfectly fine. Lua otoh treats the string as a sequence of bytes, which is just how it treats strings everywhere. [0] http://www.unicode.org/charts/normalization/chart_Hebrew.html Regards Philipp > 3. When the lua code then compares the U+fb35 with xml file that has > the original forms U+5d5 and U+5bc it of course fails. >=20 > So, the problem is that there is this phase 2 that has not always > happened. If possible I would like to turn it off somehow. Of course > I could try to write some workaround code to countermeasure this > substitution or what it should be called. But that could be > complicated and lead to more problems. >=20 >=20 > Simo >=20 >=20 > PS: I attached my result of the test case in case this is problem > with my setup. Compiled with ConTeXt MkIV 2012.09.25 21:44. > _________________________________________________________________________= __________ > If your question is of interest to others as well, please add an entry to= the Wiki! >=20 > maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-co= ntext > webpage : http://www.pragma-ade.nl / http://tex.aanhet.net > archive : http://foundry.supelec.fr/projects/contextrev/ > wiki : http://contextgarden.net > _________________________________________________________________________= __________ --=20 () ascii ribbon campaign - against html e-mail /\ www.asciiribbon.org - against proprietary attachments --cWoXeonUoKmBZSoM Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iEYEARECAAYFAlBpw3MACgkQ02lYlJYWs9Jr2gCff6O70/OJfU45I4vE8SYNNwjW JDwAn0onG10YQpOLuAoP6Gl9Os660WJn =UAEb -----END PGP SIGNATURE----- --cWoXeonUoKmBZSoM-- --===============1556445873== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============1556445873==--