ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: Philipp Gesang <Philipp.Gesang@alumni.uni-heidelberg.de>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Subject: Re: copy&paste from pdf bug (smallcaps, text figures)
Date: Fri, 2 Aug 2013 13:37:41 +0200	[thread overview]
Message-ID: <20130802113741.GA14425@tartaros> (raw)
In-Reply-To: <51FB9357.6080000@wxs.nl>


[-- Attachment #1.1: Type: text/plain, Size: 3250 bytes --]

···<date: 2013-08-02, Friday>···<from: Hans Hagen>···

> On 8/2/2013 12:12 AM, Philipp Gesang wrote:
> >···<date: 2013-08-02, Friday>···<from: Marco Patzer>···
> >
> >>On 2013–08–01 Philipp Gesang wrote:
> >>
> >>>···<date: 2013-08-01, Thursday>···<from: Otared Kavian>···
> >>>
> >>>>I tested your example: no problem here on Mac OS X 10.8.4, with
> >>>>either TeXShop, Adobe Redaer or Preview, with the latest beta
> >>>>(ConTeXt  ver: 2013.08.01 01:31 MKIV beta  fmt: 2013.8.1  int:
> >>>>english/english).
> >>>
> >>>x64 linux here, but it’s the same with the windows version in
> >>>wine32. I get the bad output with okular (poppler), acroread, and
> >>>mupdf, but strangely not with zathura (mupdf-based).
> >>
> >>Just to add to the list:
> >>
> >>x64 linux here, and it works with the following poppler based
> >>viewers (zathura-poppler, xpdf, evince)

I’m on a different machine now: the problem affects linux x86 and
pdftotext as well. Also, in xpdf I get smallcaps copied as
uppercase instead of lowercase.

> i'm a bit puzzled
> 
> >For those who want to test the git version, the commits are:
> >
> >     last good: a61813ccdd4b7bcc81932317e1360fda6c79962d
> >     first bad: 6b2f7c5fd7a3e465f4e2662b1e5bd2c9d5cce8f8
> >
> >Don’t forget to delete the cache.
> >
> >I suspect I found the troublesome changes. The problem vanishes
> >if I revert this modification to font-map.lua:
> >
> >     -local separator   = S("_.")
> >     -local other       = C((1 - separator)^1)
> >     -local ligsplitter = Ct(other * (separator * other)^0)
> >     +local ligseparator = P("_")
> >     +local varseparator = P(".")
> >     +local namesplitter = Ct(C((1 - ligseparator - varseparator)^1) * (ligseparator * C((1 - ligseparator - varseparator)^1))^0)
> >
> >and then further down:
> >
> >     -                local split = lpegmatch(ligsplitter,name)
> >     <...>
> >     +                local split = lpegmatch(namesplitter,name)
> >
> >For convenience I repeat the link to the changeset:
> 
> what do you revert from ... the + things are already in the file

I’m quoting from the changeset, so the “-” lines indicate the
good version.

> >     http://repo.or.cz/w/context.git/commitdiff/6b2f7c5fd7a3e465f4e2662b1e5bd2c9d5cce8f8
> 
> btw, this bit of code is evolving (was recently adapt to some border
> case fonts that use their own rules)
> 
> anyhow, on my win8 system the beta works with sumatra, okular and
> acrobat (indeed one might need to wipe the cache, but i can
> increment the version number)

Weird. Here’s a PDF of the code I posted compiled with version
“2013.08.01 01:31” and how pdftotext renders it:

    https://phi-gamma.net/pdf/copypasta.pdf
    https://phi-gamma.net/files/copypasta.txt
  
I definitely get   from this
one. The characters are mapped from the private use area:


    <...>
    30 beginbfchar
    <0409> <F761>
    <0416> <F762>
    <0418> <F763>
    <0423> <F764>
    <042A> <F765>
    <0435> <F738>
    <...>

Can someone reproduce it at all?

Philipp


[-- Attachment #1.2: Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 485 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2013-08-02 11:37 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-01 17:33 Philipp Gesang
2013-08-01 20:23 ` Otared Kavian
2013-08-01 21:46   ` Philipp Gesang
2013-08-01 22:01     ` Marco Patzer
2013-08-01 22:12       ` Philipp Gesang
2013-08-02 11:09         ` Hans Hagen
2013-08-02 11:37           ` Philipp Gesang [this message]
2013-08-02 12:02             ` Marco Patzer
2013-08-02 12:28               ` Philipp Gesang
2013-08-02 14:59                 ` Hans Hagen
2013-08-02 15:56                   ` Philipp Gesang
2013-08-02 16:04                     ` Arthur Reutenauer
2013-08-02 18:21                   ` Philipp Gesang
2013-08-01 21:38 ` Marco Patzer
2013-08-01 22:08 ` Jannik Voges

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130802113741.GA14425@tartaros \
    --to=philipp.gesang@alumni.uni-heidelberg.de \
    --cc=ntg-context@ntg.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).