From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 74BFABB9A for ; Sat, 29 Oct 2005 10:34:35 +0200 (CEST) Received: from ash25e.internode.on.net (ash25e.internode.on.net [203.16.214.182]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j9T8YXjO013210 for ; Sat, 29 Oct 2005 10:34:34 +0200 Received: from rosella (ppp7-104.lns1.syd7.internode.on.net [59.167.7.104]) by ash25e.internode.on.net (8.12.9/8.12.6) with ESMTP id j9T8YRnW012211 for ; Sat, 29 Oct 2005 18:04:27 +0930 (CST) (envelope-from skaller@users.sourceforge.net) Subject: Ocamlyacc question From: skaller To: caml-list@yquem.inria.fr Content-Type: text/plain Date: Sat, 29 Oct 2005 18:34:26 +1000 Message-Id: <1130574866.18013.53.camel@rosella> Mime-Version: 1.0 X-Mailer: Evolution 2.2.1.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 43633419.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocamlyacc:01 grammar:01 non-terminal:01 parses:01 token:01 token:01 recursive:01 parser:01 parser:01 ocamlyacc:01 parsing:01 parsers:01 grammar:01 elif:01 compilation:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 Suppose I have a typical grammar with a non-terminal 'expr' which parses an expression. I would like to parse the longest prefix of an input token stream leaving the first 'bad' token in the stream, so I could also use recursive descent. Specifically, my problem is to take my existing 'whole language' parser and split it up so I can do somthing like: 1. if next is one of a set of (dynamically determined) keywords, execute the parser for the corresponding statement, otherwise apply the ocamlyacc default parsing rule for one statement. 2. Statement parsers can call a function to parse expressions. The expression parser also has to be extensible, using some hybrid of the ocamlyacc grammar and a table driven precedence parser. Some expression constructs cannot be done with a simple precedence parser (if/then/elif/else, let/in, match .. etc). The problem, as usual, is to make the Ocamlyacc grammar obey the open/closed principle (open for extension, closed for compilation) which I guess has to be done with recursion, and I just can't see how to do this so that 'expr' will process a + b ) and return AST for a+b, leaving ) in the input. Can an 'error' production be used to achieve this? An LALR pushdown machine seems quite powerful: already a FSA based pushdown machine (called a Recursive Transition Network or RTN) can parse any context free language, and be driven by a meta grammar (a grammar enhanced by Kleene closure operator *, such as EBNF). I believe Python actually uses such a system. It is appealing because it can be very efficient (compared to an RD parser), but is still extensible (though not as much as pure RD). It also allows opening up 'more parts of the language' to extension, such as user defined operators and keyword introduced constructions, whilst still enforcing some structure. My pragmatic problem is to modify my existing parser without too much impact (I prefer not to rewrite the whole thing for another tool). -- John Skaller Felix, successor to C++: http://felix.sf.net