From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id BAA20906; Thu, 16 Sep 2004 01:35:49 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id BAA20179 for ; Thu, 16 Sep 2004 01:35:48 +0200 (MET DST) Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id i8FNZkaN000769 for ; Thu, 16 Sep 2004 01:35:47 +0200 Received: from [192.168.1.200] (ppp202-133.lns1.syd3.internode.on.net [203.122.202.133]) by smtp1.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id i8FNZf4Y056493 for ; Thu, 16 Sep 2004 09:05:43 +0930 (CST) Subject: Re: [Caml-list] Need NFA/DFA conversion help From: skaller Reply-To: skaller@users.sourceforge.net To: caml-list In-Reply-To: <1095260758.27775.1047.camel@pelican.wigram> References: <1095260758.27775.1047.camel@pelican.wigram> Content-Type: text/plain Message-Id: <1095291341.27775.1164.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 16 Sep 2004 09:35:41 +1000 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 4148D1D2.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 sourceforge:01 2004:99 matcher:01 dus:99 dus:99 regexps:01 char:01 char:01 seq:01 seq:01 regexps:01 argv:01 matcher:01 endline:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Thu, 2004-09-16 at 01:05, skaller wrote: > I need some help understanding how to convert > an NFA with *labelled* arcs into an equivalent DFA. Thx to those who responded. Just for motivation: > Ocamllex functionality is already available, This already works: (* test matcher *) (* tokens *) type codes_t = [ | `Word | `Ident | `Num | `White | `Dus ] let string_of_code = function | `Word -> "Word" | `Ident -> "Ident" | `Num -> "Num" | `White -> "White" | `Dus -> "Dus" | `Reserved -> "Reserved" let string_of_result f = function | `Coded xs -> String.concat ", " (List.map f xs) | `Error _ -> "Match failure" (* regexps *) let digit = regexp_of_charset "0123456789" let lower = regexp_of_inclusive_char_range 'a' 'z' let upper = regexp_of_inclusive_char_range 'A' 'Z' let letter = alt upper lower let underscore = regexp_of_char '_' let word = many letter let idstart = alt letter underscore let idrest = alt idstart digit let space = regexp_of_char ' ' let white = many space let ident = seq idstart (aster idrest) let num = many digit let us2 = seq underscore underscore (* associate regexps with tokens *) let tk_word = seq word (code `Word) let tk_ident = seq ident (code `Ident) let tk_num = seq num (code `Num) let tk_white = seq white (code `White) let tk_us2 = seq (seq us2 (code `Dus)) (* gather the alternatives *) let re = alts [tk_word; tk_ident; tk_num; tk_white; tk_us2] let alphabet, states, codes, matrix = process_regexp re;; (* print all the token associated with command line argument *) let data = Sys.argv.(1) ;; let matcher = make_matcher matrix codes data;; let result = recognize matcher;; print_endline ("Result: " ^ string_of_result string_of_code result);; --------------------------- With some minor fiddling this is a tokeniser like Ocamllex, except you don't need any external tool or library, you can define everything 'in-caml' and dynamically, as shown. (The core of the matcher engine is also purely functional :) Then you can either process some data, or you can save the compiled automaton (possibly as Ocaml or C sourcecode which can be compiled directly to executable code like (Ocaml)lex output). What doesn't work is sub-group matching, look ahead, etc -- things requiring actions in the middle of a regexp (rather than at the end). If that could be made to work, we'd have a nice pure Ocaml only library that could replace Ocamllex, Str, and PCRE all at once -- and also allow a lot of other things to be built on top, such as an RTN parser. [It turns out polymorphic variants are quite useful in extending the regular expression ASTs, as well as being used as tokens.] -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners