From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 649CDBB81 for ; Wed, 22 Dec 2004 16:57:34 +0100 (CET) Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id iBMFvWGP015000 for ; Wed, 22 Dec 2004 16:57:33 +0100 Received: from [192.168.1.200] (ppp194-89.lns1.syd2.internode.on.net [203.122.194.89]) by smtp1.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id iBMFvPRF096603; Thu, 23 Dec 2004 02:27:26 +1030 (CST) Subject: Re: [Caml-list] Str.string_match incorrect From: skaller Reply-To: skaller@users.sourceforge.net To: William Lovas Cc: caml-list@yquem.inria.fr In-Reply-To: <20041222080009.GA4501@force.stwing.upenn.edu> References: <1103687369.6979.50.camel@pelican.wigram> <20041222074455.GA81342@trout> <20041222080009.GA4501@force.stwing.upenn.edu> Content-Type: text/plain Organization: Message-Id: <1103731044.6979.109.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 23 Dec 2004 02:57:25 +1100 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 41C9996C.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 sourceforge:01 lovas:01 wrote:01 wrote:01 semantics:01 dfa:01 regexp:01 glebe:01 061:98 lexical:01 expression:01 expression:01 nsw:01 partial:01 X-Spam-Checker-Version: SpamAssassin 3.0.0 (2004-09-13) on yquem.inria.fr X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.0 X-Spam-Level: On Wed, 2004-12-22 at 19:00, William Lovas wrote: > On Tue, Dec 21, 2004 at 11:44:55PM -0800, Evan Martin wrote: > > This is consistent with the docs, which say: > > [string_match r s start] tests whether the characters in s starting at > > position start match the regular expression r. > > and in general with how regular expression systems work. Then they're simply wrong. The fundamental operation is to check if a string is in a regular set of strings. Plainly 'aa' is not in the set { 'a' }. string_match is actually testing if some prefix of the argument is in the regular set -- this is core operation of a lexical analyser. I'm not against having that operation -- Felix does -- but it leaves us without the most fundamental and important operation -- validation. > I concur with your assessment, but i think you're characterization of the > semantics of string_partial_match is inaccurate: Looks like partial_match runs thru the whole string, and if it doesn't encounter an error returns true. This is the same automaton in which all non-error states are considered accepting states. An error state is actually a state where an accepting state is not reachable. There is a transformation: construct the DFA, mark non-error states accepting, then generate the corresponding regexp. Whether that is a 'simple' transformation is another issue :)) -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net