From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id NAA23248; Thu, 16 Sep 2004 13:53:19 +0200 (MET DST) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id NAA22538 for ; Thu, 16 Sep 2004 13:53:18 +0200 (MET DST) Received: from shiva.jussieu.fr (shiva.jussieu.fr [134.157.0.129]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id i8GBrIhU005751 for ; Thu, 16 Sep 2004 13:53:18 +0200 Received: from ibm3.cicrp.jussieu.fr (ibm3.cicrp.jussieu.fr [134.157.15.3]) by shiva.jussieu.fr (8.12.11/jtpda-5.4) with ESMTP id i8GBpjFV079181 ; Thu, 16 Sep 2004 13:51:45 +0200 (CEST) X-Ids: 168 Received: from ibm1.cicrp.jussieu.fr (ibm1.cicrp.jussieu.fr [134.157.15.1]) by ibm3.cicrp.jussieu.fr (8.8.8/jtpda/mob-V8) with ESMTP id NAA36622 ; Thu, 16 Sep 2004 13:51:02 +0200 Received: from localhost (fernande@localhost) by ibm1.cicrp.jussieu.fr (8.8.8/jtpda/mob-v8) with ESMTP id NAA688184 ; Thu, 16 Sep 2004 13:50:48 +0200 X-Authentication-Warning: ibm1.cicrp.jussieu.fr: fernande owned process doing -bs Date: Thu, 16 Sep 2004 13:50:48 +0200 (DST) From: Diego Olivier Fernandez Pons X-X-Sender: fernande@ibm1 To: skaller cc: caml-list Subject: Re: [Caml-list] Need NFA/DFA conversion help In-Reply-To: <1095291341.27775.1164.camel@pelican.wigram> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Miltered: at concorde with ID 41497EAE.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at shiva.jussieu.fr with ID 41497E51.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Antivirus: scanned by sophie at shiva.jussieu.fr X-Loop: caml-list@inria.fr X-Spam: no; 0.00; pons:01 pons:01 etu:99 caml-list:01 dynamically:01 matcher:01 interprets:01 'does':99 generic:01 intepreter:01 source-code:01 pcre:01 context-free:99 context-free:99 bison:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Bonjour, It seems there are two different problems addressed in your request : the first one is purely technical and the second one is more theoretical. > like Ocamllex, except you don't need any external tool or library, > you can define everything 'in-caml' and dynamically, as shown. > (The core of the matcher engine is also purely functional) Tools like Lex or Yacc build an 'interpreter' (NFA, FDA, push-down automata or transducers) which interprets your input language and 'does' something. What you are asking is this set of tools to be done in a dynamic and pure-caml way instead of a 2-stages mixed C/Caml. This is a technical issue : just take the same principles that were used to build this tools and replace the code generation by data structures construction (sounds simple, doesn'it ?) > Then you can either process some data, or you can save the compiled > automaton (possibly as Ocaml or C sourcecode which can be compiled > directly to executable code like (Ocaml)lex output). Yes, one can imagine a generic framework that would allow you to choose how your intepreter will be generated (source-code/data structure, C/Caml, etc.) > What doesn't work is sub-group matching, look ahead, etc -- things > requiring actions in the middle of a regexp (rather than at the > end). Here is the second problem. You did exactly what was expected (taking a textbook on formal languages and implement what is said). The problem is that tools like PCRE, Lex, Yacc, etc. do not work as textbook say. They do not accept rational or context-free languages, they do not use DFA, NFA or LR(k) automata as claimed. Yacc input language is not a context-free language but some kind of restricted attribute grammar that lies somewhere between algebraic transducers and contextual grammars. Hence tools like Bison will contain a lot of strange hacks to handle all these extensions while keeping the push-down automata machinery. In the same way a grammar that admits 'actions in the middle of a regexp [] rather than at the end' could hardly qualify as a 'rational language' which is what regexp theoretically describe. As far as I know, there is no simple way to handle all these useful 'extensions' and theoretically satisfactory ways tend to be very hard. - general textbook Parsing techniques (Dick Grune, Ceriel J.H. Jacobs) (electronic version on free download) http://www.cs.vu.nl/~dick/PTAPG.html Dick Grune 'compilers' book has a chapter on attribute grammars - attribute grammars FNC2 (strongly non circular attribute grammars) http://www-rocq.inria.fr/oscar/www/fnc2/ An attribute grammar system in Haskell http://www.cs.uu.nl/groups/ST/twiki/bin/view/Center/AttributeGrammarSystem - Yacc like tools Jacke, Yacc-like parser generator with SML-consistent sytax http://www.ps.uni-sb.de/~jan/jacke/jacke.html Diego Olivier ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners