From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/21646 Path: news.gmane.org!not-for-mail From: Brooks Moses Newsgroups: gmane.comp.tex.context Subject: Ligature handling for PDF searching. Date: Tue, 26 Jul 2005 20:52:13 -0700 Message-ID: <4.3.1.2.20050726204553.01d32a58@cits1.stanford.edu> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: main.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Trace: sea.gmane.org 1122436374 2237 80.91.229.2 (27 Jul 2005 03:52:54 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 27 Jul 2005 03:52:54 +0000 (UTC) Original-X-From: ntg-context-bounces@ntg.nl Wed Jul 27 05:52:51 2005 Return-path: Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by ciao.gmane.org with esmtp (Exim 4.43) id 1Dxcy3-0007aB-6x for gctc-ntg-context-518@m.gmane.org; Wed, 27 Jul 2005 05:52:27 +0200 Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 19FE412802; Wed, 27 Jul 2005 05:52:26 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 28424-04; Wed, 27 Jul 2005 05:52:20 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 55782127DA; Wed, 27 Jul 2005 05:52:20 +0200 (CEST) Original-Received: from localhost (localhost.localdomain [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 3C3F6127DA for ; Wed, 27 Jul 2005 05:52:18 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 28479-02 for ; Wed, 27 Jul 2005 05:52:17 +0200 (CEST) Original-Received: from smtp2.Stanford.EDU (smtp2.Stanford.EDU [171.67.16.125]) by ronja.ntg.nl (Postfix) with ESMTP id E4175127D7 for ; Wed, 27 Jul 2005 05:52:16 +0200 (CEST) Original-Received: from mindolluin.stanford.edu (DNab42a636.Stanford.EDU [171.66.166.54]) by smtp2.Stanford.EDU (8.12.11/8.12.11) with ESMTP id j6R3qCYe010441 for ; Tue, 26 Jul 2005 20:52:13 -0700 X-Sender: brooks@cits1.stanford.edu X-Mailer: QUALCOMM Windows Eudora Version 4.3.1 Original-To: ConTeXt users list X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.5 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on smtp.ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:21646 X-Report-Spam: http://spam.gmane.org/gmane.comp.tex.context:21646 (This came up on comp.text.tex in a question about LaTeX, but it also applies to ConTeXt, and the proposed solution for LaTeX doesn't apply.) Consider the following document: \starttext Some ligature tests: ff, fi, ffi, fl, ffl. \stoptext If I process that with texexex -pdf, load it into Acrobat 5, and then copy-and-paste the text from the PDF into a text editor, the fi and fl ligatures are correctly treated as two letters, but the ff, ffi, and ffl ligatures are treated as single (unknown) characters. Similarly, searching for "f" within the document only finds the fi and fl ligatures; it doesn't find the others. Searching for "ff" finds nothing. This is a fairly significant problem in the on-screen usability of ConTeXt-created documents. In LaTeX, there is apparently a solution in the cmap.sty package (though it currently only works for T1 encoding): http://www.ctan.org/tex-archive/macros/latex/contrib/cmap/ Is there a similar solution for ConTeXt? (Has this perhaps been solved with a later version of ConTeXt than I have on my computer?) Thanks, - Brooks