ntg-context - mailing list for ConTeXt users
 help / color / mirror / Atom feed
* searchable cyrillic in PDF files
@ 2008-06-27 23:34 Oleg Kolosov
  2008-06-28 12:10 ` Hans Hagen
  0 siblings, 1 reply; 3+ messages in thread
From: Oleg Kolosov @ 2008-06-27 23:34 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 2530 bytes --]

Hans Hagen wrote:
> Oleg Kolosov wrote:
>   
>> Hans Hagen wrote:
>>     
>>> Oleg Kolosov wrote:
>>>  
>>>       
>>>> Hello!
>>>>
>>>> I'm trying to generate searchable pdf with cyrrillic glyphs with the 
>>>> following:
>>>>
>>>> \enableregime[utf]
>>>> \mainlanguage[ru]
>>>> \setupencoding[default=t2a]
>>>> \useencoding[pfr]
>>>> \usepdffontresource t2a
>>>> \usetypescript[pscyr][\defaultencoding] % type-pscyr is my own 
>>>> typescript file
>>>> \setupbodyfont[pscyr,14pt]
>>>>
>>>> also tried with:
>>>>
>>>> \startencoding[t2a]
>>>> \usepdffontresource t2a
>>>> \stopencoding
>>>>
>>>> It seems that \usepdffontresource does nothing. I see pdfr-def loaded 
>>>> in log, but not pdfr-t2a. \input pdfr-t2a (or pdfr-ec) says that 
>>>> \startpdffontresource is undefined command. I've created pdfr-t2a.tex 
>>>> by replacing definitions in pdfr-ec with ones from cmap latex package 
>>>> (found in file t2a.cmap). I'm using ConTeXt mkII since mkIV is in 
>>>> active development. Tried also with ec as default encoding with the 
>>>> same result (pdfr-ec.tex is not loaded).
>>>>
>>>> Please help me create header for minimal file which will generate 
>>>> searchable PDF.
>>>>     
>>>>         
>>> pdftex does it itself (i.e. create the vectors) using pdfr-def.tex 
>>> (unless i did something wrong)
>>>
>>> Hans
>>>
>>>   
>>>       
>> It's unlikely. I've tested it with minimal file and english text is 
>> indeed searchable, but cyrillic is not, with copy-paste I get some 
>> strange symbols. I'm using type1 fonts from PSCyr package with my own 
>> typescript, does this matter? I've attached  typescript file just in 
>> case (it's still incomplete but works fine for me). Maybe I miss some 
>> definition or option? BTW cyrillic in PDF TOC works fine (with inclusion 
>> of spec-tst.tex).
>>     
>
> can you check if the file has the right entris for your font?
>
>     pdfr-def.tex
>
> the old mechanism is obsolete so pdfr-t2a will not do anything
>
> Hans
>
>   
Codes seem to be in place but doesn't match actual font in T2A encoding. 
For ex. I have cyrillic capital a in font on 00C1 where in pdfr-def this 
is Aacute. According to enco-utf my 00C1 in font should map to 0410 
position (I hope this is understandable description). Maybe there is 
some switch to enable such mapping? I don't understand these 
encoding/mapping issues enough to create necessary table myself. Maybe 
you can provide some example, so I will be able to help?

P. S. Sorry for late response.

-- 
Best Regards,
Oleg Kolosov


[-- Attachment #1.2: Type: text/html, Size: 2870 bytes --]

[-- Attachment #2: Type: text/plain, Size: 487 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: searchable cyrillic in PDF files
  2008-06-27 23:34 searchable cyrillic in PDF files Oleg Kolosov
@ 2008-06-28 12:10 ` Hans Hagen
  2008-06-28 21:59   ` Oleg Kolosov
  0 siblings, 1 reply; 3+ messages in thread
From: Hans Hagen @ 2008-06-28 12:10 UTC (permalink / raw)
  To: mailing list for ConTeXt users

Oleg Kolosov wrote:

> Codes seem to be in place but doesn't match actual font in T2A encoding. 
> For ex. I have cyrillic capital a in font on 00C1 where in pdfr-def this 
> is Aacute. According to enco-utf my 00C1 in font should map to 0410 
> position (I hope this is understandable description). Maybe there is 
> some switch to enable such mapping? I don't understand these 
> encoding/mapping issues enough to create necessary table myself. Maybe 
> you can provide some example, so I will be able to help?

it si unrelated to the font encoding. so, in your font 00C1 has name 
(say CyrillicWhatever) and pdftex needs to associate the name with the
unicode number

it looks like in pdfr-def cyrillic is just missing

i uploaded a beta with more mappings

Hans

-----------------------------------------------------------------
                                           Hans Hagen | PRAGMA ADE
               Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
      tel: 038 477 53 69 | fax: 038 477 53 74 | www.pragma-ade.com
                                              | www.pragma-pod.nl
-----------------------------------------------------------------
___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: searchable cyrillic in PDF files
  2008-06-28 12:10 ` Hans Hagen
@ 2008-06-28 21:59   ` Oleg Kolosov
  0 siblings, 0 replies; 3+ messages in thread
From: Oleg Kolosov @ 2008-06-28 21:59 UTC (permalink / raw)
  To: mailing list for ConTeXt users


[-- Attachment #1.1: Type: text/plain, Size: 1078 bytes --]

Hans Hagen wrote:
> Oleg Kolosov wrote:
>
>   
>> Codes seem to be in place but doesn't match actual font in T2A encoding. 
>> For ex. I have cyrillic capital a in font on 00C1 where in pdfr-def this 
>> is Aacute. According to enco-utf my 00C1 in font should map to 0410 
>> position (I hope this is understandable description). Maybe there is 
>> some switch to enable such mapping? I don't understand these 
>> encoding/mapping issues enough to create necessary table myself. Maybe 
>> you can provide some example, so I will be able to help?
>>     
>
> it si unrelated to the font encoding. so, in your font 00C1 has name 
> (say CyrillicWhatever) and pdftex needs to associate the name with the
> unicode number
>
> it looks like in pdfr-def cyrillic is just missing
>
> i uploaded a beta with more mappings
>
> Hans
>   
I've tested with the new beta as well as the latest stable version 
(2008.05.21) and found an issue with font map file not loaded properly, 
so this was error on my side. Sorry. Now copy/paste works in both versions.

-- 
Best Regards,
Oleg Kolosov


[-- Attachment #1.2: Type: text/html, Size: 1497 bytes --]

[-- Attachment #2: Type: text/plain, Size: 487 bytes --]

___________________________________________________________________________________
If your question is of interest to others as well, please add an entry to the Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : https://foundry.supelec.fr/projects/contextrev/
wiki     : http://contextgarden.net
___________________________________________________________________________________

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2008-06-28 21:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-06-27 23:34 searchable cyrillic in PDF files Oleg Kolosov
2008-06-28 12:10 ` Hans Hagen
2008-06-28 21:59   ` Oleg Kolosov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).