ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
@ 2010-09-10 12:30 Li Yanrui (李延瑞)
  2010-09-10 12:32 ` Li Yanrui (李延瑞)
  0 siblings, 1 reply; 7+ messages in thread
From: Li Yanrui (李延瑞) @ 2010-09-10 12:30 UTC (permalink / raw)
  To: mailing list for ConTeXt users

[-- Attachment #1: Type: text/plain, Size: 803 bytes --]

Hi all,

For the pdf file which is generated frome the following example, two
Chinese characters can not be copied rightly when I use simsun.ttc
font and use *Adobe Reader* to view it. The copy text is displayed as
the wrong unicode text such as "􁨀". However these glyphs can be
copied rightly if I use AdobeSongStd.otf font or I use other pdf
reader such as evince.

\definefont[song][file:simsun]
\starttext
\song 描 $\overline{q}$ 提
\stoptext

The attachment for this mail is that pdf file with simsun.ttc font
which can be downloaded from
http://code.google.com/p/way2ctx/downloads/list

This situation is not general. For example, if I change one Chinese
character the text can be copied normally even using simsun.ttc font.

-- 
Best regards,

Li Yanrui (李延瑞)

[-- Attachment #2: a.pdf --]
[-- Type: application/pdf, Size: 11549 bytes --]

[-- Attachment #3: Type: text/plain, Size: 486 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
  2010-09-10 12:30 cjk texts in the pdf file which MkIV output sometimes can not be copy rightly Li Yanrui (李延瑞)
@ 2010-09-10 12:32 ` Li Yanrui (李延瑞)
  2010-09-10 12:44   ` Taco Hoekwater
  0 siblings, 1 reply; 7+ messages in thread
From: Li Yanrui (李延瑞) @ 2010-09-10 12:32 UTC (permalink / raw)
  To: mailing list for ConTeXt users

2010/9/10 Li Yanrui (李延瑞) <liyanrui.m2@gmail.com>:
> Hi all,
>
> For the pdf file which is generated frome the following example, two
> Chinese characters can not be copied rightly when I use simsun.ttc
> font and use *Adobe Reader* to view it. The copy text is displayed as
> the wrong unicode text such as "􁨀". However these glyphs can be
> copied rightly if I use AdobeSongStd.otf font or I use other pdf
> reader such as evince.
>
> \definefont[song][file:simsun]
> \starttext
> \song 描 $\overline{q}$ 提
> \stoptext
>
> The attachment for this mail is that pdf file with simsun.ttc font
> which can be downloaded from
> http://code.google.com/p/way2ctx/downloads/list
>
> This situation is not general. For example, if I change one Chinese
> character the text can be copied normally even using simsun.ttc font.
>
> --
> Best regards,
>
> Li Yanrui (李延瑞)
>

My test enviorment is the beta 2010.09.09 23:45 in linux x86.

-- 
Best regards,

Li Yanrui (李延瑞)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
  2010-09-10 12:32 ` Li Yanrui (李延瑞)
@ 2010-09-10 12:44   ` Taco Hoekwater
  2010-09-20 11:20     ` Li Yanrui (李延瑞)
  0 siblings, 1 reply; 7+ messages in thread
From: Taco Hoekwater @ 2010-09-10 12:44 UTC (permalink / raw)
  To: mailing list for ConTeXt users

On 09/10/2010 02:32 PM, Li Yanrui (李延瑞) wrote:
> 2010/9/10 Li Yanrui (李延瑞)<liyanrui.m2@gmail.com>:
>> Hi all,
>>
>> For the pdf file which is generated frome the following example, two
>> Chinese characters can not be copied rightly when I use simsun.ttc
>> font and use *Adobe Reader* to view it. The copy text is displayed as
>> the wrong unicode text such as "􁨀".

It looks like the reader is ignoring the ToUnicode entry in the
bad case. No idea why, though.

Best wishes,
Taco
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
  2010-09-10 12:44   ` Taco Hoekwater
@ 2010-09-20 11:20     ` Li Yanrui (李延瑞)
  2010-11-17  0:25       ` Li Yanrui (李延瑞)
  0 siblings, 1 reply; 7+ messages in thread
From: Li Yanrui (李延瑞) @ 2010-09-20 11:20 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Taco Hoekwater

2010/9/10 Taco Hoekwater <taco@elvenkind.com>:
> On 09/10/2010 02:32 PM, Li Yanrui (李延瑞) wrote:
>>
>> 2010/9/10 Li Yanrui (李延瑞)<liyanrui.m2@gmail.com>:
>>>
>>> Hi all,
>>>
>>> For the pdf file which is generated frome the following example, two
>>> Chinese characters can not be copied rightly when I use simsun.ttc
>>> font and use *Adobe Reader* to view it. The copy text is displayed as
>>> the wrong unicode text such as "􁨀".
>
> It looks like the reader is ignoring the ToUnicode entry in the
> bad case. No idea why, though.
>

My friend sent a mail to Ken Lunde who works for adobe systems
incorporated. He replied:

[quote]
I forwarded your email to the Acrobat team for investigation.
When I copied the body text from the first file, and all of the text
from the second file, the code points are PUA, specifically Unicode
Plane 16. The heading text of the first file appears to be encoded
correctly. I am guessing that this is a PDF producer issue.
I will keep you posted about what the Acrobat team discovers.
[/quote]

In the above, the first file is the tex file of the pdf file which is
the attachment for my previous mail; the second one is that
attachment.


-- 
Best regards,

Li Yanrui (李延瑞)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
  2010-09-20 11:20     ` Li Yanrui (李延瑞)
@ 2010-11-17  0:25       ` Li Yanrui (李延瑞)
  2010-11-17  9:32         ` Taco Hoekwater
  0 siblings, 1 reply; 7+ messages in thread
From: Li Yanrui (李延瑞) @ 2010-11-17  0:25 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Hans Hagen

2010/9/20 Li Yanrui (李延瑞) <liyanrui.m2@gmail.com>:
> 2010/9/10 Taco Hoekwater <taco@elvenkind.com>:
>> On 09/10/2010 02:32 PM, Li Yanrui (李延瑞) wrote:
>>>
>>> 2010/9/10 Li Yanrui (李延瑞)<liyanrui.m2@gmail.com>:
>>>>
>>>> Hi all,
>>>>
>>>> For the pdf file which is generated frome the following example, two
>>>> Chinese characters can not be copied rightly when I use simsun.ttc
>>>> font and use *Adobe Reader* to view it. The copy text is displayed as
>>>> the wrong unicode text such as "􁨀".
>>
>> It looks like the reader is ignoring the ToUnicode entry in the
>> bad case. No idea why, though.
>>
>
> My friend sent a mail to Ken Lunde who works for adobe systems
> incorporated. He replied:
>
> [quote]
> I forwarded your email to the Acrobat team for investigation.
> When I copied the body text from the first file, and all of the text
> from the second file, the code points are PUA, specifically Unicode
> Plane 16. The heading text of the first file appears to be encoded
> correctly. I am guessing that this is a PDF producer issue.
> I will keep you posted about what the Acrobat team discovers.
> [/quote]
>
> In the above, the first file is the tex file of the pdf file which is
> the attachment for my previous mail; the second one is that
> attachment.
>

Recently Ken Lunde replied again:

> Gu Hua forwarded your email to the Acrobat, which investigated this issue.
>
> The evidence points toward a malformed or poorly-made ToUnicode table.
> It would be appropriate for Adobe Reader to ignore such a ToUnicode
> table.
>
> It would be useful to know how the ToUnicode table is made by this PDF
> producer. I know that when making a static ToUnicode mapping resource,
> which uses CMap resource syntax, it is very easy to make an invalid
> one, specifically because CID ranges, when expressed as two-byte
> values, cannot cross first-byte boundaries. Also, UTF-32 (and UTF-8)
> cannot be used directly, and such values must be converted to UTF-16.
> One or both of these issues, if not handles correctly, can result in
> an invalid ToUnicode table.
>

In fact the text with  fonts such as simsun.ttc can not be copied
rightly but the text with fonts such as AdobeSongStd-Light.otf can be
copied rightly. It seems like that this problem may be related to the
font loading script of mkiv.

-- 
Best regards,

Li Yanrui (李延瑞)
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
  2010-11-17  0:25       ` Li Yanrui (李延瑞)
@ 2010-11-17  9:32         ` Taco Hoekwater
  2010-11-17  9:49           ` Hans Hagen
  0 siblings, 1 reply; 7+ messages in thread
From: Taco Hoekwater @ 2010-11-17  9:32 UTC (permalink / raw)
  To: mailing list for ConTeXt users; +Cc: Hans Hagen

On 11/17/2010 01:25 AM, Li Yanrui (李延瑞) wrote:
>>
>
> In fact the text with  fonts such as simsun.ttc can not be copied
> rightly but the text with fonts such as AdobeSongStd-Light.otf can be
> copied rightly. It seems like that this problem may be related to the
> font loading script of mkiv.

That is useful to know. So we should create a pdf file without 
compression (\nopdfcompression) that contains a small bit of text
in both AdobeSongStd and simsun.  Can you create an example for
that? Then I can compare the two generated CMAPs.

Best wishes,
Taci


___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: cjk texts in the pdf file which MkIV output sometimes can not be copy rightly
  2010-11-17  9:32         ` Taco Hoekwater
@ 2010-11-17  9:49           ` Hans Hagen
  0 siblings, 0 replies; 7+ messages in thread
From: Hans Hagen @ 2010-11-17  9:49 UTC (permalink / raw)
  To: Taco Hoekwater; +Cc: mailing list for ConTeXt users

On 17-11-2010 10:32, Taco Hoekwater wrote:
> On 11/17/2010 01:25 AM, Li Yanrui (李延瑞) wrote:
>>>
>>
>> In fact the text with fonts such as simsun.ttc can not be copied
>> rightly but the text with fonts such as AdobeSongStd-Light.otf can be
>> copied rightly. It seems like that this problem may be related to the
>> font loading script of mkiv.
>
> That is useful to know. So we should create a pdf file without
> compression (\nopdfcompression) that contains a small bit of text
> in both AdobeSongStd and simsun. Can you create an example for
> that? Then I can compare the two generated CMAPs.

ha, I was keying in the same ... indeed we need an example

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
     tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-17  9:49 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-09-10 12:30 cjk texts in the pdf file which MkIV output sometimes can not be copy rightly Li Yanrui (李延瑞)
2010-09-10 12:32 ` Li Yanrui (李延瑞)
2010-09-10 12:44   ` Taco Hoekwater
2010-09-20 11:20     ` Li Yanrui (李延瑞)
2010-11-17  0:25       ` Li Yanrui (李延瑞)
2010-11-17  9:32         ` Taco Hoekwater
2010-11-17  9:49           ` Hans Hagen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).