From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA17997; Mon, 15 Jul 2002 17:52:46 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id RAA17996 for ; Mon, 15 Jul 2002 17:52:45 +0200 (MET DST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id g6FFqij01629 for ; Mon, 15 Jul 2002 17:52:44 +0200 (MET DST) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id RAA17988 for caml-list@inria.fr; Mon, 15 Jul 2002 17:52:44 +0200 (MET DST) Date: Mon, 15 Jul 2002 17:52:44 +0200 From: Xavier Leroy To: caml-list@inria.fr Subject: [Caml-list] Re: Regular expression library: a poll on features Message-ID: <20020715175244.B15691@pauillac.inria.fr> References: <20020705161302.C20462@pauillac.inria.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 1.0i In-Reply-To: <20020705161302.C20462@pauillac.inria.fr>; from xleroy@pauillac.inria.fr on Fri, Jul 05, 2002 at 04:13:02PM +0200 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk As promised, here is a summary of the replies I got concerning the "feature poll" for OCaml's regexp library. Feature 1: back-references in regexps (e.g. "([a-z]+)\1", meaning any sequence of lowercase letters followed by another occurrence of the same sequence) use often 4 use occasionally 3 no use 5 Feature 2: partial string matching as per Str.string_partial_match, i.e. the ability to recognize that a string is a prefix of a string that match a regexp. has already used 0 could use in some cases 6 no use 8 A few more remarks on these two features, and why I asked about them in particular. Feature 1 is standard in many regexp packages because it's trivial to implement in a backtracking regexp matcher. However, it's essentially impossible to implement if the regexp is compiled down to a deterministic automata. Thus, supporting feature 1 precludes DFA-based implementations such as Jérôme Vouillon's RE library. Feature 2 is unusual and I haven't heard from anyone that uses it :-) I got two replies suggesting one plausible scenario where partial matching could come handy: find delimiters in a piece of text that is being read block by block. However, I'm not sure Str.string_partial_match is adequate here, it looks like a "search forward for a partial match" operation is needed, which Str doesn't provide... It was also suggested to me that the effect of partial matching against a regexp R can be achieved by exact matching against a regexp R' derived from R. This is true for "textbook regexps", e.g. if R is "ab*c", then R' would be "epsilon|a(epsilon|b*(epsilon|c))", but doesn't work for more complex regexps languages, especially if back-references are supported. (Consider R = "(a+)\1".) Thanks for your input, - Xavier Leroy ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners