From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 18CC4BB84 for ; Mon, 8 May 2006 17:36:12 +0200 (CEST) Received: from efil.de (efil.de [81.3.25.37]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k48FaBcR013013 for ; Mon, 8 May 2006 17:36:11 +0200 Received: by efil.de (Postfix, from userid 1000) id DEC98EFA574; Mon, 8 May 2006 17:34:00 +0200 (CEST) Date: Mon, 8 May 2006 17:35:55 +0200 From: Ingo Bormuth To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] ocamlagrep anybody ? Message-ID: <20060508153555.GA27052@kruemel> Reply-To: Ingo Bormuth References: <20060508075730.GA19558@kruemel> <445F354F.1060301@inria.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <445F354F.1060301@inria.fr> X-Attribution: Ingo Bormuth X-Url: http://ibormuth.efil.de/contact User-Agent: Mutt/1.5.11 X-Miltered: at concorde with ID 445F656B.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocamlagrep:01 afaik:01 agrep:01 val:01 agrep:01 binary:01 tokens:01 ingo:98 efil:98 thank's:98 2006:98 ingo:98 efil:98 wrote:01 wrote:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 Thank's for the prompt reply! Thanks for the language anyway. On 2006-05-08 14:10, Xavier Leroy wrote: > It's been a long time since I wrote this library, but AFAIK > Agrep stops at the first (approximate) match found. Okay, that seems reasonable. Just the manual is a bit misleading: val errors_substring_match ... Same as Agrep.substring_match, but return the smallest number of errors such that the substring matches the pattern. > If that's what you want, you can obtain that number by repeated calls > to errors_substring_match using binary search on the value of numerrs. That's what I did. Just performance on millions of strings sucks :) Probably I will tokenize the strings and just match tokens. - Ingo -- Ingo Bormuth, voicebox & telefax: +49-12125-10226517 '(~o-o~)' public key 86326EC9, http://ibormuth.efil.de/contact --ooO--(.)--Ooo--