From: "Li Yanrui (李延瑞)" <liyanrui.m2@gmail.com>
To: mailing list for ConTeXt users <ntg-context@ntg.nl>
Cc: Hans Hagen <pragma@wxs.nl>
Subject: Re: cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
Date: Wed, 17 Nov 2010 08:25:38 +0800 [thread overview]
Message-ID: <AANLkTimHBn8rNFcJxC83d6sdjezSH3umhhd2437ZBoQR@mail.gmail.com> (raw)
In-Reply-To: <AANLkTim_+ZBnsVzRKCQ5pZUPw0adxqAkKfJdjhqgB-A3@mail.gmail.com>
2010/9/20 Li Yanrui (李延瑞) <liyanrui.m2@gmail.com>:
> 2010/9/10 Taco Hoekwater <taco@elvenkind.com>:
>> On 09/10/2010 02:32 PM, Li Yanrui (李延瑞) wrote:
>>>
>>> 2010/9/10 Li Yanrui (李延瑞)<liyanrui.m2@gmail.com>:
>>>>
>>>> Hi all,
>>>>
>>>> For the pdf file which is generated frome the following example, two
>>>> Chinese characters can not be copied rightly when I use simsun.ttc
>>>> font and use *Adobe Reader* to view it. The copy text is displayed as
>>>> the wrong unicode text such as "".
>>
>> It looks like the reader is ignoring the ToUnicode entry in the
>> bad case. No idea why, though.
>>
>
> My friend sent a mail to Ken Lunde who works for adobe systems
> incorporated. He replied:
>
> [quote]
> I forwarded your email to the Acrobat team for investigation.
> When I copied the body text from the first file, and all of the text
> from the second file, the code points are PUA, specifically Unicode
> Plane 16. The heading text of the first file appears to be encoded
> correctly. I am guessing that this is a PDF producer issue.
> I will keep you posted about what the Acrobat team discovers.
> [/quote]
>
> In the above, the first file is the tex file of the pdf file which is
> the attachment for my previous mail; the second one is that
> attachment.
>
Recently Ken Lunde replied again:
> Gu Hua forwarded your email to the Acrobat, which investigated this issue.
>
> The evidence points toward a malformed or poorly-made ToUnicode table.
> It would be appropriate for Adobe Reader to ignore such a ToUnicode
> table.
>
> It would be useful to know how the ToUnicode table is made by this PDF
> producer. I know that when making a static ToUnicode mapping resource,
> which uses CMap resource syntax, it is very easy to make an invalid
> one, specifically because CID ranges, when expressed as two-byte
> values, cannot cross first-byte boundaries. Also, UTF-32 (and UTF-8)
> cannot be used directly, and such values must be converted to UTF-16.
> One or both of these issues, if not handles correctly, can result in
> an invalid ToUnicode table.
>
In fact the text with fonts such as simsun.ttc can not be copied
rightly but the text with fonts such as AdobeSongStd-Light.otf can be
copied rightly. It seems like that this problem may be related to the
font loading script of mkiv.
--
Best regards,
Li Yanrui (李延瑞)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!
maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage : http://www.pragma-ade.nl / http://tex.aanhet.net
archive : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___________________________________________________________________________________
next prev parent reply other threads:[~2010-11-17 0:25 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-10 12:30 Li Yanrui (李延瑞)
2010-09-10 12:32 ` Li Yanrui (李延瑞)
2010-09-10 12:44 ` Taco Hoekwater
2010-09-20 11:20 ` Li Yanrui (李延瑞)
2010-11-17 0:25 ` Li Yanrui (李延瑞) [this message]
2010-11-17 9:32 ` Taco Hoekwater
2010-11-17 9:49 ` Hans Hagen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AANLkTimHBn8rNFcJxC83d6sdjezSH3umhhd2437ZBoQR@mail.gmail.com \
--to=liyanrui.m2@gmail.com \
--cc=ntg-context@ntg.nl \
--cc=pragma@wxs.nl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).