From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id C4CEFBC48 for ; Fri, 15 Apr 2005 12:35:58 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j3FAZwSc031028 for ; Fri, 15 Apr 2005 12:35:58 +0200 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id MAA16584 for ; Fri, 15 Apr 2005 12:35:57 +0200 (MET DST) Received: from relay.felk.cvut.cz (relay.felk.cvut.cz [147.32.80.7]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j3FAZv95031025 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 15 Apr 2005 12:35:57 +0200 Received: from k333.felk.cvut.cz (k333.felk.cvut.cz [147.32.87.5]) by relay.felk.cvut.cz (8.12.11/8.12.11) with ESMTP id j3FAZnBA043537; Fri, 15 Apr 2005 12:35:49 +0200 (CEST) (envelope-from kybic@fel.cvut.cz) Received: from K333/SpoolDir by k333.felk.cvut.cz (Mercury 1.48); 15 Apr 05 12:35:49 +0100 Received: from SpoolDir by K333 (Mercury 1.48); 15 Apr 05 12:35:47 +0100 Received: from d600-jk (147.32.84.2) by k333.felk.cvut.cz (Mercury 1.48) with ESMTP; 15 Apr 05 12:35:44 +0100 To: Radu Grigore Cc: caml-list Subject: Re: [Caml-list] ANN: cfind 0.0.0 References: <7f8e92aa0504150033198cd1b1@mail.gmail.com> Reply-To: Jan Kybic From: Jan Kybic In-Reply-To: <7f8e92aa0504150033198cd1b1@mail.gmail.com> Date: 15 Apr 2005 12:35:43 +0200 Message-ID: User-Agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Scanned-By: milter-sender/0.62.837 (relay.felk.cvut.cz [147.32.80.7]); Fri, 15 Apr 2005 12:35:49 +0200 X-MailScanner-felk: Found to be clean X-MailScanner-SpamCheck-felk: not spam, SpamAssassin (score=-4.9, required 5, BAYES_00 -4.90) X-Miltered: at concorde with ID 425F990E.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 425F990D.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocaml:01 sourceforge:01 ocaml:01 lexer:01 parsers:01 lexer:01 indexing:01 indexing:01 lazy:01 haskell:01 unix:01 defining:01 tex:01 functions:01 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.2 X-Spam-Level: > Description: cfind is a UNIX tool that provides functionality similar > to that of Google Desktop from the command line. It is written > entirely in OCaml. > > Homepage: http://cfind.sourceforge.net/ > > I'll appreciate any input from the OCaml community. It looks definitely very useful. Proposed extensions and changes: - configurable choise of a lexer. For example there could be a table (read from a configuration file) with regular expressions matching path and file names, association them to parsers. - If I understand your code correctly, in TeX files only command names are indexed, is it correct? Then I might prefer a different lexer, which ignores comments and command names and indexes the words in the text. - It should be also possible to apply other configurable filters to the files before indexing. An example would be to decompress all "*.gz" or "*.bz2" files before indexing - More complicate logical expressions defining match, in the spirit of: "functional" AND "lazy" AND NOT "Haskell" - It would be nice to be able to break files into smaller units and to find the units which match, not the whole file. A typical example would be email in mbox format, or perhaps functions in a program. Good luck, Jan -- ------------------------------------------------------------------------- Jan Kybic tel. +420 2 2435 5721 http://cmp.felk.cvut.cz/~kybic ICQ 200569450