ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
From: "Li Yanrui (李延瑞)" <liyanrui.m2@gmail.com>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Cc: Hans Hagen <pragma@wxs.nl>
Subject: Re: cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
Date: Wed, 17 Nov 2010 08:25:38 +0800	[thread overview]
Message-ID: <AANLkTimHBn8rNFcJxC83d6sdjezSH3umhhd2437ZBoQR@mail.gmail.com> (raw)
In-Reply-To: <AANLkTim_+ZBnsVzRKCQ5pZUPw0adxqAkKfJdjhqgB-A3@mail.gmail.com>

2010/9/20 Li Yanrui (李延瑞) <liyanrui.m2@gmail.com>:
> 2010/9/10 Taco Hoekwater <taco@elvenkind.com>:
>> On 09/10/2010 02:32 PM, Li Yanrui (李延瑞) wrote:
>>>
>>> 2010/9/10 Li Yanrui (李延瑞)<liyanrui.m2@gmail.com>:
>>>>
>>>> Hi all,
>>>>
>>>> For the pdf file which is generated frome the following example, two
>>>> Chinese characters can not be copied rightly when I use simsun.ttc
>>>> font and use *Adobe Reader* to view it. The copy text is displayed as
>>>> the wrong unicode text such as "􁨀".
>>
>> It looks like the reader is ignoring the ToUnicode entry in the
>> bad case. No idea why, though.
>>
>
> My friend sent a mail to Ken Lunde who works for adobe systems
> incorporated. He replied:
>
> [quote]
> I forwarded your email to the Acrobat team for investigation.
> When I copied the body text from the first file, and all of the text
> from the second file, the code points are PUA, specifically Unicode
> Plane 16. The heading text of the first file appears to be encoded
> correctly. I am guessing that this is a PDF producer issue.
> I will keep you posted about what the Acrobat team discovers.
> [/quote]
>
> In the above, the first file is the tex file of the pdf file which is
> the attachment for my previous mail; the second one is that
> attachment.
>

Recently Ken Lunde replied again:

> Gu Hua forwarded your email to the Acrobat, which investigated this issue.
>
> The evidence points toward a malformed or poorly-made ToUnicode table.
> It would be appropriate for Adobe Reader to ignore such a ToUnicode
> table.
>
> It would be useful to know how the ToUnicode table is made by this PDF
> producer. I know that when making a static ToUnicode mapping resource,
> which uses CMap resource syntax, it is very easy to make an invalid
> one, specifically because CID ranges, when expressed as two-byte
> values, cannot cross first-byte boundaries. Also, UTF-32 (and UTF-8)
> cannot be used directly, and such values must be converted to UTF-16.
> One or both of these issues, if not handles correctly, can result in
> an invalid ToUnicode table.
>

In fact the text with  fonts such as simsun.ttc can not be copied
rightly but the text with fonts such as AdobeSongStd-Light.otf can be
copied rightly. It seems like that this problem may be related to the
font loading script of mkiv.

-- 
Best regards,

Li Yanrui (李延瑞)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

  reply	other threads:[~2010-11-17  0:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-10 12:30 Li Yanrui (李延瑞)
2010-09-10 12:32 ` Li Yanrui (李延瑞)
2010-09-10 12:44   ` Taco Hoekwater
2010-09-20 11:20     ` Li Yanrui (李延瑞)
2010-11-17  0:25       ` Li Yanrui (李延瑞) [this message]
2010-11-17  9:32         ` Taco Hoekwater
2010-11-17  9:49           ` Hans Hagen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTimHBn8rNFcJxC83d6sdjezSH3umhhd2437ZBoQR@mail.gmail.com \
    --to=liyanrui.m2@gmail.com \
    --cc=ntg-context@ntg.nl \
    --cc=pragma@wxs.nl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).