From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id TAA28428; Fri, 25 Jun 2004 19:07:08 +0200 (MET DST) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id TAA28152 for ; Fri, 25 Jun 2004 19:07:07 +0200 (MET DST) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i5PH76SH028025 for ; Fri, 25 Jun 2004 19:07:06 +0200 Received: from ibm3.cicrp.jussieu.fr (ibm3.cicrp.jussieu.fr [134.157.15.3]) by shiva.jussieu.fr (8.12.11/jtpda-5.4) with ESMTP id i5PH760J090554 ; Fri, 25 Jun 2004 19:07:06 +0200 (CEST) X-Ids: 167 Received: from ibm1.cicrp.jussieu.fr (ibm1.cicrp.jussieu.fr [134.157.15.1]) by ibm3.cicrp.jussieu.fr (8.8.8/jtpda/mob-V8) with ESMTP id TAA176324 ; Fri, 25 Jun 2004 19:05:14 +0200 Received: from localhost (fernande@localhost) by ibm1.cicrp.jussieu.fr (8.8.8/jtpda/mob-v8) with ESMTP id TAA680058 ; Fri, 25 Jun 2004 19:05:13 +0200 X-Authentication-Warning: ibm1.cicrp.jussieu.fr: fernande owned process doing -bs Date: Fri, 25 Jun 2004 19:05:13 +0200 (DST) From: Diego Olivier Fernandez Pons X-X-Sender: fernande@ibm1 To: Shawn Wagner cc: caml-list@inria.fr Subject: Re: [Caml-list] Substring search on an array of strings In-Reply-To: <20040625165512.GG595@speakeasy.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Miltered: at concorde with ID 40DC5BBA.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at shiva.jussieu.fr with ID 40DC5BBA.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Antivirus: scanned by sophie at shiva.jussieu.fr X-Loop: caml-list@inria.fr X-Spam: no; 0.00; pons:01 pons:01 etu:99 caml-list:01 fernandez:01 fernandez:01 olivier:02 olivier:02 string:03 string:03 suffix:03 suppose:03 suppose:03 array:04 array:04 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Bonjour, > If you need to find every substring in every element of the array > then, yes, you'll have to look at every element of the array to see > if that substring appears or not in it. No, or at least only in the worst case. In the same way the Knuth-Morris-Pratt suffix automaton allows you not to search for every position in a string if it is not necessary, a specific procedure for searching in an array of strings could avoid you searching on every string. I will try to give some simple examples to get some intuition : - Suppose there are many strings that only contains a few different letters. Then you could build first a map from subsets of the alphabet to string indexes and only search in those strings - Suppose that the strings have a particular pattern (say "processortype.memory.operatingsystem"). Then you could build indexes on every processor type, memory or operating system, and search only in the corresponding strings Diego Olivier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners