From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/83649 Path: news.gmane.org!not-for-mail From: Philipp Gesang Newsgroups: gmane.comp.tex.context Subject: Re: copy&paste from pdf bug (smallcaps, text figures) Date: Fri, 2 Aug 2013 13:37:41 +0200 Message-ID: <20130802113741.GA14425@tartaros> References: <20130801173329.GA5907@phlegethon> <13E355F9-2F54-42AC-8CA6-EB8C05A0B3CD@gmail.com> <20130801214616.GA29570@phlegethon> <20130801220108.GD8003@homerow> <20130801221237.GB29570@phlegethon> <51FB9357.6080000@wxs.nl> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: plane.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1673955994==" X-Trace: ger.gmane.org 1375443226 21023 80.91.229.3 (2 Aug 2013 11:33:46 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 2 Aug 2013 11:33:46 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Fri Aug 02 13:33:49 2013 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from balder.ntg.nl ([5.39.185.229]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1V5DcD-0004Xh-Eu for gctc-ntg-context-518@m.gmane.org; Fri, 02 Aug 2013 13:33:49 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 388BB101F0; Fri, 2 Aug 2013 13:33:15 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id j+dvQhU3h3td; Fri, 2 Aug 2013 13:33:13 +0200 (CEST) Original-Received: from balder.ntg.nl (localhost [IPv6:::1]) by balder.ntg.nl (Postfix) with ESMTP id C961D101E8; Fri, 2 Aug 2013 13:33:13 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by balder.ntg.nl (Postfix) with ESMTP id 8DC32101E8 for ; Fri, 2 Aug 2013 13:33:12 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at balder.ntg.nl Original-Received: from balder.ntg.nl ([127.0.0.1]) by localhost (balder.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id PP1IuwMjkQtK for ; Fri, 2 Aug 2013 13:33:02 +0200 (CEST) Original-Received: from filter5-til.mf.surf.net (filter5-til.mf.surf.net [194.171.167.221]) by balder.ntg.nl (Postfix) with ESMTP id 6C620101E5 for ; Fri, 2 Aug 2013 13:33:02 +0200 (CEST) Original-Received: from relay2.uni-heidelberg.de (relay2.uni-heidelberg.de [129.206.210.211]) by filter5-til.mf.surf.net (8.14.3/8.14.3/Debian-9.4) with ESMTP id r72BXYRB013557 for ; Fri, 2 Aug 2013 13:33:35 +0200 Original-Received: from ix.urz.uni-heidelberg.de (cyrus-portal.urz.uni-heidelberg.de [129.206.100.176]) by relay2.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r72BXXs2020819 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 2 Aug 2013 13:33:34 +0200 Original-Received: from extmail.urz.uni-heidelberg.de (extmail.urz.uni-heidelberg.de [129.206.100.140]) by ix.urz.uni-heidelberg.de (8.13.8/8.13.8) with ESMTP id r72BXXwi009990 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 2 Aug 2013 13:33:33 +0200 Original-Received: from localhost (p4FFFD543.dip0.t-ipconnect.de [79.255.213.67]) (authenticated bits=0) by extmail.urz.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id r72BXVRQ006101 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Fri, 2 Aug 2013 13:33:32 +0200 Mail-Followup-To: mailing list for ConTeXt users In-Reply-To: <51FB9357.6080000@wxs.nl> X-Operating-System: Linux tartaros 3.9.6-1-ARCH User-Agent: Mutt/1.5.21 (2010-09-15) X-Bayes-Prob: 0.0001 (Score 0, tokens from: @@RPTN) X-CanIt-Geo: ip=129.206.210.211; country=DE; region=01; city=Heidelberg; latitude=49.4167; longitude=8.7000; http://maps.google.com/maps?q=49.4167,8.7000&z=6 X-CanItPRO-Stream: uu:ntg-context@ntg.nl (inherits from uu:default, base:default) X-Canit-Stats-ID: 0WK7Lxy4l - 27d1be87463a - 20130802 (trained as not-spam) X-Scanned-By: CanIt (www . roaringpenguin . com) on 194.171.167.221 X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.14 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ntg-context-bounces@ntg.nl Original-Sender: ntg-context-bounces@ntg.nl Xref: news.gmane.org gmane.comp.tex.context:83649 Archived-At: --===============1673955994== Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="sm4nu43k4a2Rpi4c" Content-Disposition: inline --sm4nu43k4a2Rpi4c Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable =C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 > On 8/2/2013 12:12 AM, Philipp Gesang wrote: > >=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 > > > >>On 2013=E2=80=9308=E2=80=9301 Philipp Gesang wrote: > >> > >>>=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7=C2=B7 > >>> > >>>>I tested your example: no problem here on Mac OS X 10.8.4, with > >>>>either TeXShop, Adobe Redaer or Preview, with the latest beta > >>>>(ConTeXt ver: 2013.08.01 01:31 MKIV beta fmt: 2013.8.1 int: > >>>>english/english). > >>> > >>>x64 linux here, but it=E2=80=99s the same with the windows version in > >>>wine32. I get the bad output with okular (poppler), acroread, and > >>>mupdf, but strangely not with zathura (mupdf-based). > >> > >>Just to add to the list: > >> > >>x64 linux here, and it works with the following poppler based > >>viewers (zathura-poppler, xpdf, evince) I=E2=80=99m on a different machine now: the problem affects linux x86 and pdftotext as well. Also, in xpdf I get smallcaps copied as uppercase instead of lowercase. > i'm a bit puzzled >=20 > >For those who want to test the git version, the commits are: > > > > last good: a61813ccdd4b7bcc81932317e1360fda6c79962d > > first bad: 6b2f7c5fd7a3e465f4e2662b1e5bd2c9d5cce8f8 > > > >Don=E2=80=99t forget to delete the cache. > > > >I suspect I found the troublesome changes. The problem vanishes > >if I revert this modification to font-map.lua: > > > > -local separator =3D S("_.") > > -local other =3D C((1 - separator)^1) > > -local ligsplitter =3D Ct(other * (separator * other)^0) > > +local ligseparator =3D P("_") > > +local varseparator =3D P(".") > > +local namesplitter =3D Ct(C((1 - ligseparator - varseparator)^1) *= (ligseparator * C((1 - ligseparator - varseparator)^1))^0) > > > >and then further down: > > > > - local split =3D lpegmatch(ligsplitter,name) > > <...> > > + local split =3D lpegmatch(namesplitter,name) > > > >For convenience I repeat the link to the changeset: >=20 > what do you revert from ... the + things are already in the file I=E2=80=99m quoting from the changeset, so the =E2=80=9C-=E2=80=9D lines in= dicate the good version. > > http://repo.or.cz/w/context.git/commitdiff/6b2f7c5fd7a3e465f4e2662b= 1e5bd2c9d5cce8f8 >=20 > btw, this bit of code is evolving (was recently adapt to some border > case fonts that use their own rules) >=20 > anyhow, on my win8 system the beta works with sumatra, okular and > acrobat (indeed one might need to wipe the cache, but i can > increment the version number) Weird. Here=E2=80=99s a PDF of the code I posted compiled with version =E2=80=9C2013.08.01 01:31=E2=80=9D and how pdftotext renders it: https://phi-gamma.net/pdf/copypasta.pdf https://phi-gamma.net/files/copypasta.txt =20 I definitely get =EF=9C=B0=EF=9C=B1=EF=9C=B2=EF=9C=B3=EF=9C=B4=EF=9C=B5=EF= =9C=B6=EF=9C=B7=EF=9C=B8=EF=9C=B9 =EF=9D=A1=EF=9D=A2=EF=9D=A3=EF=9D=A4=EF= =9D=A5=EF=9D=A6=EF=9D=A7=EF=9D=A8=EF=9D=A9=EF=9D=AA=EF=9D=AB=EF=9D=AC=EF=9D= =AD=EF=9D=AE=EF=9D=AF=EF=9D=B0=EF=9D=B1=EF=9D=B2=EF=9D=B3=EF=9D=B4=EF=9D=B5= =EF=9D=B6=EF=9D=B7=EF=9D=B8=EF=9D=B9=EF=9D=BA from this one. The characters are mapped from the private use area: <...> 30 beginbfchar <0409> <0416> <0418> <0423> <042A> <0435> <...> Can someone reproduce it at all? Philipp --sm4nu43k4a2Rpi4c Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.20 (GNU/Linux) iQEcBAEBAgAGBQJR+5oFAAoJEI7yfcKNwM1+7ycH/2n/bGfyzAI2h4qU6tplUD1k FBTt6aw6f4TA/yhl1DnS3MvLaZT2jZ+pssvoMjRSk2TZTyhmL0UY6Xcp0Jxu2zOC cndFxlPYO/86xxAc2L2Arlnv47Wj5lxtfpLtEBabsuxDJsgOo8GEORgvE9+NV7No T2x4m9rF3xdnf7WuatWuh5/RnTXb8HC/rbctgGCEq8UAbJZcVa1w5B5PrQtCeedJ X1QSAYYbupWIEsj6+vpam2mC1NDmsiglrFOmlC9oxyjYUNkdRMQv8VP2y6sRdQp0 XPE4CStm+bLHDDZKjsU+RTfoFvfKEP3KDvjQcs8LMW5vc+0Idb6XJlOaH5ZCrQU= =Kb5y -----END PGP SIGNATURE----- --sm4nu43k4a2Rpi4c-- --===============1673955994== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============1673955994==--