From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 0E791BC2B for ; Fri, 5 Aug 2005 08:15:52 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j756Fp9h024149 for ; Fri, 5 Aug 2005 08:15:51 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id IAA32572 for ; Fri, 5 Aug 2005 08:15:51 +0200 (MET DST) Received: from wetware.wetware.com (wetware.wetware.com [209.218.58.1]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j756Fooo024143 for ; Fri, 5 Aug 2005 08:15:50 +0200 Received: from [69.12.155.90] (helo=[10.0.1.5]) by wetware.wetware.com with esmtp (Exim 4.43) id 1E0vUj-0006QE-91 for caml-list@inria.fr; Thu, 04 Aug 2005 23:15:49 -0700 Mime-Version: 1.0 (Apple Message framework v733) In-Reply-To: <1123203791.6720.63.camel@localhost.localdomain> References: <1123203791.6720.63.camel@localhost.localdomain> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: james woodyatt Subject: Re: [Caml-list] ocamllex problem Date: Thu, 4 Aug 2005 23:15:48 -0700 To: Ocaml Trade X-Mailer: Apple Mail (2.733) X-Miltered: at nez-perce with ID 42F30417.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 42F30416.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; woodyatt:01 jhw:01 wetware:01 caml-list:01 ocamllex:01 frisch:01 regexp:01 nfa:01 dfa:01 dfa:01 ocaml:01 multi-stage:01 ocamllex:01 parser:01 non-trivial:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On 04 Aug 2005, at 18:03, skaller wrote: > > Alain Frisch pointed me at some nasty papers on this, one with a > regexp -> NFA conversion and the other with a NFA-> DFA conversion, > but I couldn't figure out how to do the direct regexp->DFA > conversion, I'd sure like to find an algorithm for that.. In my OCaml NAE Core Foundation, there is a something you may find interesting. See the [Cf_lex] module and its subordinate [Cf_dfa]. Since it isn't trying to be a multi-stage programming tool like [ocamllex], it produces a parser monad that executes a Lazy-DFA, instead of a fully space-time optimized DFA. At some point, I may implement a [study] function that fully evaluates the Lazy-DFA and optimizes it, but I don't yet see a compelling need for that. One thing: the pattern [':'((letter|' ')* as s)] is interesting. You're definitely right that something non-trivial is happening inside the DFA. My [Cf_dfa] module does not keep a stack of backtracking sequences because I did something else to resolve the problem. Look at the ( $@ ) operators, which allow you to use a parser monad on the recognized input sequence to obtain the result of a lexical rule. Using this, you can implement something like the feature you're interested in by defining a nested hierarchy of parsers. I know. This is probably not what you're looking for. To get what you're looking for, I'd have to extend [Cf_dfa] to handle marker nodes in the NFA. I thought that would be more appropriate for [ocamllex] and similar tools, so I didn't do it. Nice to see [ocamllex] did. -- j h woodyatt that's my village calling... no doubt, they want their idiot back.