From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/21657 Path: news.gmane.org!not-for-mail From: Vit Zyka Newsgroups: gmane.comp.tex.context Subject: Re: Ligature handling for PDF searching. Date: Wed, 27 Jul 2005 10:04:17 +0200 Message-ID: <42E74001.8010501@seznam.cz> References: <4.3.1.2.20050726204553.01d32a58@cits1.stanford.edu> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1122451538 470 80.91.229.2 (27 Jul 2005 08:05:38 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 27 Jul 2005 08:05:38 +0000 (UTC) Original-X-From: ntg-context-bounces@ntg.nl Wed Jul 27 10:05:27 2005 Return-path: Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1Dxgui-00050G-L4 for gctc-ntg-context-518@m.gmane.org; Wed, 27 Jul 2005 10:05:16 +0200 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 42D1D127DC; Wed, 27 Jul 2005 10:05:16 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 31030-01-2; Wed, 27 Jul 2005 10:05:11 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 6738F12810; Wed, 27 Jul 2005 10:05:11 +0200 (CEST) Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 9BBCE12810 for ; Wed, 27 Jul 2005 10:05:10 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 31030-01 for ; Wed, 27 Jul 2005 10:05:09 +0200 (CEST) Original-Received: from smtp.seznam.cz (smtp.seznam.cz [212.80.76.43]) by ronja.ntg.nl (Postfix) with SMTP id 7FDD2127DC for ; Wed, 27 Jul 2005 10:04:20 +0200 (CEST) Original-Received: (qmail 25020 invoked from network); 27 Jul 2005 08:04:14 -0000 Original-Received: from unknown (HELO ?127.0.0.1?) (vit.zyka@62.240.166.134) by smtp.seznam.cz with SMTP; 27 Jul 2005 08:04:14 -0000 User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en Original-To: mailing list for ConTeXt users In-Reply-To: <4.3.1.2.20050726204553.01d32a58@cits1.stanford.edu> X-Antivirus: avast! (VPS 0529-2, 21.07.2005), Outbound message X-Antivirus-Status: Clean X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.5 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on smtp.ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:21657 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:21657 Brooks Moses wrote: > (This came up on comp.text.tex in a question about LaTeX, but it also > applies to ConTeXt, and the proposed solution for LaTeX doesn't apply.) > > Consider the following document: > > \starttext > Some ligature tests: ff, fi, ffi, fl, ffl. > \stoptext > > If I process that with texexex -pdf, load it into Acrobat 5, and then > copy-and-paste the text from the PDF into a text editor, the fi and fl > ligatures are correctly treated as two letters, but the ff, ffi, and ffl > ligatures are treated as single (unknown) characters. Similarly, > searching for "f" within the document only finds the fi and fl > ligatures; it doesn't find the others. Searching for "ff" finds nothing. > > This is a fairly significant problem in the on-screen usability of > ConTeXt-created documents. > > In LaTeX, there is apparently a solution in the cmap.sty package (though > it currently only works for T1 encoding): > http://www.ctan.org/tex-archive/macros/latex/contrib/cmap/ > > Is there a similar solution for ConTeXt? (Has this perhaps been solved > with a later version of ConTeXt than I have on my computer?) Yes, but IFAIK only for one or two encodings (CMAP files). I have to remember ... the keyword is \usepdffontresource. See source enco-pfr.tex for more info. vit