From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA12526; Wed, 15 Sep 2004 21:15:38 +0200 (MET DST) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA12475 for ; Wed, 15 Sep 2004 21:15:37 +0200 (MET DST) Received: from nef.ens.fr (nef.ens.fr [129.199.96.32]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id i8FJFbR1004404 for ; Wed, 15 Sep 2004 21:15:37 +0200 Received: from clipper.ens.fr (clipper-gw.ens.fr [129.199.1.22]) by nef.ens.fr (8.12.11/1.01.28121999) with ESMTP id i8FJFZf1072035 ; Wed, 15 Sep 2004 21:15:36 +0200 (CEST) Received: from localhost (frisch@localhost) by clipper.ens.fr (8.13.1/jb-1.1) id i8FJFSQF017371 ; Wed, 15 Sep 2004 21:15:32 +0200 (MEST) X-Authentication-Warning: clipper.ens.fr: frisch owned process doing -bs Date: Wed, 15 Sep 2004 21:15:28 +0200 (MEST) From: Alain Frisch X-X-Sender: frisch@clipper.ens.fr Reply-To: Alain Frisch To: skaller cc: caml-list Subject: Re: [Caml-list] Need NFA/DFA conversion help In-Reply-To: <20040915154137.GB24717@yquem.inria.fr> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Scanned: by amavisd-milter (http://amavis.org/) X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.3.3 (nef.ens.fr [129.199.96.32]); Wed, 15 Sep 2004 21:15:36 +0200 (CEST) X-Miltered: at concorde with ID 414894D9.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; alain:01 frisch:01 alain:01 frisch:01 caml-list:01 ville:99 kleene:01 stars:99 citing:01 cardelli:01 2004:99 cduce:01 cduce:01 regexp:01 regexp:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk > > I need some help understanding how to convert > > an NFA with *labelled* arcs into an equivalent DFA. > > As regards ocamllex, I worked from various recent publications by > Ville Larikari. Lexer generators (and regexp packages) are usually pretty weak for what concerns extraction; you can only extract very unstructured information (a finite number of groups), and the semantics of capture groups below Kleene stars is not always satisfactory. The algorithm presented in this paper (sorry for citing myself): Greedy regular expression matching. A. Frisch, and L. Cardelli. In ICALP 2004 http://www.cduce.org/papers/icalp04.pdf (or the older and extended version: http://www.cduce.org/papers/greedy.pdf) makes it possible to write a lexer generator (and a regexp package) with a more structured semantics, where, for instance, a capture inside a star returns a list of bindings, a capture below a ? returns a(n?) 'a option. The (quite straightforward) algorithm in the paper returns a parse tree with a specified disambiguation policy, using two linear traversals of the input word. The first traversal is just running a very standard automaton (without marks), keeping the intermediate states for each position; the second traversal is driven by the syntax of the regexp and builds the parse tree. Let me know (off-list) if you are interested in implementing this kind of structured capture semantics. -- Alain ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners