From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/41953 Path: news.gmane.org!not-for-mail From: Oleg Kolosov Newsgroups: gmane.comp.tex.context Subject: searchable cyrillic in PDF files Date: Sat, 28 Jun 2008 03:34:10 +0400 Message-ID: <486578F2.2050809@mail.ru> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0171021038==" X-Trace: ger.gmane.org 1214609775 4724 80.91.229.12 (27 Jun 2008 23:36:15 GMT) X-Complaints-To: usenet@ger.gmane.org NNTP-Posting-Date: Fri, 27 Jun 2008 23:36:15 +0000 (UTC) To: mailing list for ConTeXt users Original-X-From: ntg-context-bounces@ntg.nl Sat Jun 28 01:36:54 2008 Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1KCNUr-0006QU-HW for gctc-ntg-context-518@m.gmane.org; Sat, 28 Jun 2008 01:36:53 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id AA3EF1FD38; Sat, 28 Jun 2008 01:35:59 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 13933-03-9; Sat, 28 Jun 2008 01:35:06 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id D0BF31FCA1; Sat, 28 Jun 2008 01:35:06 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 8987D1FC93 for ; Sat, 28 Jun 2008 01:35:05 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 14625-04-5 for ; Sat, 28 Jun 2008 01:34:30 +0200 (CEST) Original-Received: from mx33.mail.ru (mx33.mail.ru [194.67.23.194]) by ronja.ntg.nl (Postfix) with ESMTP id 31F701FCA1 for ; Sat, 28 Jun 2008 01:34:30 +0200 (CEST) Original-Received: from [87.117.41.245] (port=26567 helo=[172.16.6.237]) by mx33.mail.ru with asmtp id 1KCNSW-000GMn-00 for ntg-context@ntg.nl; Sat, 28 Jun 2008 03:34:28 +0400 User-Agent: Thunderbird 2.0.0.14 (X11/20080621) X-Spam: Not detected X-Mras: OK X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.9 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:41953 Archived-At: This is a multi-part message in MIME format. --===============0171021038== Content-Type: multipart/alternative; boundary="------------050502060806040108080101" This is a multi-part message in MIME format. --------------050502060806040108080101 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Hans Hagen wrote: > Oleg Kolosov wrote: > >> Hans Hagen wrote: >> >>> Oleg Kolosov wrote: >>> >>> >>>> Hello! >>>> >>>> I'm trying to generate searchable pdf with cyrrillic glyphs with the >>>> following: >>>> >>>> \enableregime[utf] >>>> \mainlanguage[ru] >>>> \setupencoding[default=t2a] >>>> \useencoding[pfr] >>>> \usepdffontresource t2a >>>> \usetypescript[pscyr][\defaultencoding] % type-pscyr is my own >>>> typescript file >>>> \setupbodyfont[pscyr,14pt] >>>> >>>> also tried with: >>>> >>>> \startencoding[t2a] >>>> \usepdffontresource t2a >>>> \stopencoding >>>> >>>> It seems that \usepdffontresource does nothing. I see pdfr-def loaded >>>> in log, but not pdfr-t2a. \input pdfr-t2a (or pdfr-ec) says that >>>> \startpdffontresource is undefined command. I've created pdfr-t2a.tex >>>> by replacing definitions in pdfr-ec with ones from cmap latex package >>>> (found in file t2a.cmap). I'm using ConTeXt mkII since mkIV is in >>>> active development. Tried also with ec as default encoding with the >>>> same result (pdfr-ec.tex is not loaded). >>>> >>>> Please help me create header for minimal file which will generate >>>> searchable PDF. >>>> >>>> >>> pdftex does it itself (i.e. create the vectors) using pdfr-def.tex >>> (unless i did something wrong) >>> >>> Hans >>> >>> >>> >> It's unlikely. I've tested it with minimal file and english text is >> indeed searchable, but cyrillic is not, with copy-paste I get some >> strange symbols. I'm using type1 fonts from PSCyr package with my own >> typescript, does this matter? I've attached typescript file just in >> case (it's still incomplete but works fine for me). Maybe I miss some >> definition or option? BTW cyrillic in PDF TOC works fine (with inclusion >> of spec-tst.tex). >> > > can you check if the file has the right entris for your font? > > pdfr-def.tex > > the old mechanism is obsolete so pdfr-t2a will not do anything > > Hans > > Codes seem to be in place but doesn't match actual font in T2A encoding. For ex. I have cyrillic capital a in font on 00C1 where in pdfr-def this is Aacute. According to enco-utf my 00C1 in font should map to 0410 position (I hope this is understandable description). Maybe there is some switch to enable such mapping? I don't understand these encoding/mapping issues enough to create necessary table myself. Maybe you can provide some example, so I will be able to help? P. S. Sorry for late response. -- Best Regards, Oleg Kolosov --------------050502060806040108080101 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Hans Hagen wrote:
Oleg Kolosov wrote:
  
Hans Hagen wrote:
    
Oleg Kolosov wrote:
 
      
Hello!

I'm trying to generate searchable pdf with cyrrillic glyphs with the 
following:

\enableregime[utf]
\mainlanguage[ru]
\setupencoding[default=t2a]
\useencoding[pfr]
\usepdffontresource t2a
\usetypescript[pscyr][\defaultencoding] % type-pscyr is my own 
typescript file
\setupbodyfont[pscyr,14pt]

also tried with:

\startencoding[t2a]
\usepdffontresource t2a
\stopencoding

It seems that \usepdffontresource does nothing. I see pdfr-def loaded 
in log, but not pdfr-t2a. \input pdfr-t2a (or pdfr-ec) says that 
\startpdffontresource is undefined command. I've created pdfr-t2a.tex 
by replacing definitions in pdfr-ec with ones from cmap latex package 
(found in file t2a.cmap). I'm using ConTeXt mkII since mkIV is in 
active development. Tried also with ec as default encoding with the 
same result (pdfr-ec.tex is not loaded).

Please help me create header for minimal file which will generate 
searchable PDF.
    
        
pdftex does it itself (i.e. create the vectors) using pdfr-def.tex 
(unless i did something wrong)

Hans

  
      
It's unlikely. I've tested it with minimal file and english text is 
indeed searchable, but cyrillic is not, with copy-paste I get some 
strange symbols. I'm using type1 fonts from PSCyr package with my own 
typescript, does this matter? I've attached  typescript file just in 
case (it's still incomplete but works fine for me). Maybe I miss some 
definition or option? BTW cyrillic in PDF TOC works fine (with inclusion 
of spec-tst.tex).
    

can you check if the file has the right entris for your font?

    pdfr-def.tex

the old mechanism is obsolete so pdfr-t2a will not do anything

Hans

  
Codes seem to be in place but doesn't match actual font in T2A encoding. For ex. I have cyrillic capital a in font on 00C1 where in pdfr-def this is Aacute. According to enco-utf my 00C1 in font should map to 0410 position (I hope this is understandable description). Maybe there is some switch to enable such mapping? I don't understand these encoding/mapping issues enough to create necessary table myself. Maybe you can provide some example, so I will be able to help?

P. S. Sorry for late response.

-- 
Best Regards,
Oleg Kolosov
--------------050502060806040108080101-- --===============0171021038== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________ --===============0171021038==--