From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id C954CBB9C for ; Tue, 6 Dec 2005 01:41:39 +0100 (CET) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id jB60fbsA014843 for ; Tue, 6 Dec 2005 01:41:38 +0100 Received: from rosella (ppp33-4.lns1.syd6.internode.on.net [59.167.33.4]) by smtp3.adl2.internode.on.net (8.12.9/8.12.6) with ESMTP id jB60fZCV061132 for ; Tue, 6 Dec 2005 11:11:35 +1030 (CST) (envelope-from skaller@users.sourceforge.net) Subject: Re: [Caml-list] Another nasty with ocamlyacc: magic needed? From: skaller To: caml-list@yquem.inria.fr In-Reply-To: <1133760933.8957.125.camel@rosella> References: <1133760933.8957.125.camel@rosella> Content-Type: text/plain Date: Tue, 06 Dec 2005 11:41:35 +1100 Message-Id: <1133829695.9871.97.camel@rosella> Mime-Version: 1.0 X-Mailer: Evolution 2.4.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 4394DE41.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocamlyacc:01 ocamlyacc:01 expr:01 expr:01 forall:01 forall:01 lalr:01 parser:01 recursive:01 parser:01 non-terminal:01 grammar:01 token:01 terminating:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 Thank you to all who helped on this and my other ocamlyacc problem. A proof-of-principle implementation is now working!! ------USER DEFINED STATEMENTS DEMO------- #statement while expr do statements done ; #keyword whenever #statement repeat statements whenever expr ; #statement forall ident in expr do statements done ; repeat f x; g k; whenever 1>0; while 1>0 do f x; g x; done; forall i in (1,2,3) do f i; done; -------------------------------------- The code integrates Ocamlyacc LALR1 parser for a complex language (Felix) including expressions and statements, with a hand written recursive descent parser engine driven by a table of extensions to the statement non-terminal of the Ocamlyacc grammar. With respect to Felix these extensions can be added on the fly (dynamically) as illustrated in the example. To make this all work I had to cast two spells, however the code is free of the evil Obj.magic and is reentrant: no global variables! Spell 1: to parse expressions, I had to add a special entry point: %type exprx exprx: expr expr_terminating_token { $1, $2 } expr_terminating_token: | TOK1 { TOK1 $1 } | TOK2 { TOK2 $1 } ... and the function parsing an expression has to 'put back' the gratuitous lookahead token into the token source. Having to list, by hand, all the tokens which terminate an expression is a maintenance problem .. but it does work. I formed the list by adding ALL tokens, and then using ocamlyacc -v output to remove tokens that caused a conflict. Spell 2: In order to parse anything, the token source must be available, however it is defined *after* the ocamlyacc parser and so cannot be embedded into the token, because there is no way then to specify the type of the token. In addition, it cannot be placed in the token at the point of defining the grammar extensions either, because that is handled by the pre-processor, whose job is to *construct* the token source -- it doesn't exist until after the preprocessor has run. So I used a two phase approach. A new USER_STATEMENT_KEYWORD token is used to capture the grammar production in the preprocessor -- the production is just a list of tokens, and so it can be defined in the ocamlyacc parser, it has type: %token USER_STATEMENT_KEYWORD Then, I modified the token source function itself, so that when this token is seen it is replaced by: %token Flx_ast.statement_t)> USER_STATEMENT_DRIVER The function here is a closure, here is the actual code: method token_src (dummy :Lexing.lexbuf) = let tmp = List.hd tokens in tokens <- List.tl tokens; current_token_index <- current_token_index + 1; match tmp with | Flx_parse.USER_STATEMENT_KEYWORD (sr,s,tks) -> let f = fun () -> self#parse_user_statement s tks in Flx_parse.USER_STATEMENT_DRIVER (sr,s,f) | _ -> tmp and so, by delaying the binding until the token source object is known, and then currying it into the new driver token, I was able to hide all the gritty details defined after the ocamlyacc parser from it. The Ocamlyacc parser then calls back the function stored in the token, which calls the RD parser, which in turn can recursively call ocamlyacc parser entry points: the actual ocamlyacc production used is just: user_statement: | USER_STATEMENT_DRIVER { let srt, kw, f = $1 in f () } In effect, this is 'covariant open recursion' closed by the dynamically extensible driver tables. The LALR1/RD combination is pleasing because it is expected to be faster and more reliable than a pure RD parser, yet many of the advantages, including the ease of extension remain available: in particular the compiler writer can extend the Ocamlyacc grammar and the user can extend the RD grammar: the coupling is fairly well isolated. Again thanks to those who encouraged my to continue seeking a solution! -- John Skaller Felix, successor to C++: http://felix.sf.net