From mboxrd@z Thu Jan 1 00:00:00 1970 X-Msuck: nntp://news.gmane.io/gmane.comp.tex.context/35290 Path: news.gmane.org!not-for-mail From: Patrick Gundlach Newsgroups: gmane.comp.tex.context Subject: Re: Post-processing ConTeXt's output for text search Date: Wed, 04 Jul 2007 12:05:06 +0200 Organization: chaos Message-ID: References: <20070704082830.GA3281@axent> Reply-To: mailing list for ConTeXt users NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Trace: sea.gmane.org 1183543830 7052 80.91.229.12 (4 Jul 2007 10:10:30 GMT) X-Complaints-To: usenet@sea.gmane.org NNTP-Posting-Date: Wed, 4 Jul 2007 10:10:30 +0000 (UTC) To: ntg-context@ntg.nl Original-X-From: ntg-context-bounces@ntg.nl Wed Jul 04 12:10:28 2007 connect(): Connection refused Return-path: Envelope-to: gctc-ntg-context-518@m.gmane.org Original-Received: from ronja.vet.uu.nl ([131.211.172.88] helo=ronja.ntg.nl) by lo.gmane.org with esmtp (Exim 4.50) id 1I61oa-0000Bh-7d for gctc-ntg-context-518@m.gmane.org; Wed, 04 Jul 2007 12:10:28 +0200 Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 8CC04201CA; Wed, 4 Jul 2007 12:10:23 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 15520-04-2; Wed, 4 Jul 2007 12:10:06 +0200 (CEST) Original-Received: from ronja.vet.uu.nl (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id BE6EB201A9; Wed, 4 Jul 2007 12:10:05 +0200 (CEST) Original-Received: from localhost (localhost [127.0.0.1]) by ronja.ntg.nl (Postfix) with ESMTP id 696A3201C3 for ; Wed, 4 Jul 2007 12:10:00 +0200 (CEST) Original-Received: from ronja.ntg.nl ([127.0.0.1]) by localhost (smtp.ntg.nl [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 16211-02 for ; Wed, 4 Jul 2007 12:09:54 +0200 (CEST) Original-Received: from unimail.uni-dortmund.de (mx1.HRZ.Uni-Dortmund.DE [129.217.128.51]) by ronja.ntg.nl (Postfix) with ESMTP id F2D5C20176 for ; Wed, 4 Jul 2007 12:08:19 +0200 (CEST) Original-Received: from silver.local (i53879463.versanet.de [83.135.148.99]) (authenticated bits=0) by unimail.uni-dortmund.de (8.14.0/8.14.0) with ESMTP id l64A57v7006969 for ; Wed, 4 Jul 2007 12:05:13 +0200 (CEST) X-Lieblings-Musik: the_capricorns In-Reply-To: <20070704082830.GA3281@axent> (Piotr Kopszak's message of "Wed, 4 Jul 2007 10:28:31 +0200") User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.0.50 (darwin) X-MailScanner-Information: UniDo-UniMail X-MailScanner: Found to be clean X-MailScanner-SpamCheck: not spam, SpamAssassin (nicht zwischen gespeichert, Wertung=-3.672, benoetigt 5, autolearn=not spam, ALL_TRUSTED, AWL, BAYES_00, DKIM_POLICY_SIGNSOME) X-MailScanner-From: patrick@gundla.ch X-Virus-Scanned: amavisd-new at ntg.nl X-BeenThere: ntg-context@ntg.nl X-Mailman-Version: 2.1.9 Precedence: list List-Id: mailing list for ConTeXt users List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: ntg-context-bounces@ntg.nl Errors-To: ntg-context-bounces@ntg.nl X-Virus-Scanned: amavisd-new at ntg.nl Xref: news.gmane.org gmane.comp.tex.context:35290 Archived-At: Hi, you could split your pdf into separate pages, use an full-text search engine such as swish-e to index each of the pages and store the results together with the pagenumber (from the splitted pages). IMO this would be the easiest thing. Patrick ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : https://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________