From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id E6881BC69 for ; Fri, 31 Aug 2007 10:25:04 +0200 (CEST) Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l7V8P4EP011120 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Fri, 31 Aug 2007 10:25:04 +0200 X-IronPort-AV: E=Sophos;i="4.19,329,1183327200"; d="scan'208";a="15941369" Received: from eltanin.irisa.fr (HELO [131.254.14.123]) ([131.254.14.123]) by mail4-relais-sop.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 31 Aug 2007 10:25:04 +0200 Message-ID: <46D7D05F.6040907@irisa.fr> Date: Fri, 31 Aug 2007 10:25:03 +0200 From: Sebastien Ferre Organization: IRISA User-Agent: Thunderbird 1.5.0.12 (X11/20070530) MIME-Version: 1.0 To: Jon Harrop , caml-list@inria.fr Subject: Re: [Caml-list] Suffix trees References: <200708131017.47871.jon@ffconsultancy.com> In-Reply-To: <200708131017.47871.jon@ffconsultancy.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Miltered: at concorde with ID 46D7D060.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; irisa:01 ocaml:01 irisa:01 substrings:01 substrings:01 cheers:01 wrote:01 caml-list:01 computation:01 computation:01 motivation:02 tree:02 tree:02 suffix:02 suffix:02 Hi, Jon Harrop wrote: > Anyone got any high-performance suffix tree implementations in OCaml? (I've > seen Jean-Christophe's) you can find the implementation I made for my own needs at http://www.irisa.fr/LIS/ferre/software.en.html . I hope it will fit your needs. I thrived to make it efficient. It is possible to make it more efficient by removing the code for the computation of maximal substrings that makes the computation of the suffix tree in O(n.ln(n)) instead of O(n). The motivation for the computation of these maximal substrings can be found in the following paper: http://www.springerlink.com/content/a2p06g831821054g/ Do not hesitate to ask me explanations about the interface as the comments are rather limited :-). Cheers, Sébastien Ferré