From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39])
	by yquem.inria.fr (Postfix) with ESMTP id 74BFABB9A
	for <caml-list@yquem.inria.fr>; Sat, 29 Oct 2005 10:34:35 +0200 (CEST)
Received: from ash25e.internode.on.net (ash25e.internode.on.net [203.16.214.182])
	by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id j9T8YXjO013210
	for <caml-list@yquem.inria.fr>; Sat, 29 Oct 2005 10:34:34 +0200
Received: from rosella (ppp7-104.lns1.syd7.internode.on.net [59.167.7.104])
	by ash25e.internode.on.net (8.12.9/8.12.6) with ESMTP id j9T8YRnW012211
	for <caml-list@yquem.inria.fr>; Sat, 29 Oct 2005 18:04:27 +0930 (CST)
	(envelope-from skaller@users.sourceforge.net)
Subject: Ocamlyacc question
From: skaller <skaller@users.sourceforge.net>
To: caml-list@yquem.inria.fr
Content-Type: text/plain
Date: Sat, 29 Oct 2005 18:34:26 +1000
Message-Id: <1130574866.18013.53.camel@rosella>
Mime-Version: 1.0
X-Mailer: Evolution 2.2.1.1 
Content-Transfer-Encoding: 7bit
X-Miltered: at concorde with ID 43633419.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; ocamlyacc:01 grammar:01 non-terminal:01 parses:01 token:01 token:01 recursive:01 parser:01 parser:01 ocamlyacc:01 parsing:01 parsers:01 grammar:01 elif:01 compilation:01 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.3

Suppose I have a typical grammar with a non-terminal 'expr' which
parses an expression. I would like to parse the longest prefix
of an input token stream leaving the first 'bad' token in the
stream, so I could also use recursive descent.

Specifically, my problem is to take my existing 'whole
language' parser and split it up so I can do somthing like:

1. if next is one of a set of (dynamically determined)
keywords, execute the parser for the corresponding statement,
otherwise apply the ocamlyacc default parsing rule for one
statement.

2. Statement parsers can call a function to parse expressions.
The expression parser also has to be extensible, using some
hybrid of the ocamlyacc grammar and a table driven precedence
parser. Some expression constructs cannot be done with a simple
precedence parser (if/then/elif/else, let/in, match  .. etc).

The problem, as usual, is to make the Ocamlyacc grammar
obey the open/closed principle (open for extension, closed
for compilation) which I guess has to be done with recursion,
and I just can't see how to do this so that 'expr' will process

	a + b )

and return AST for a+b, leaving ) in the input. Can an
'error' production be used to achieve this?

An LALR pushdown machine seems quite powerful: already
a FSA based pushdown machine (called a Recursive Transition
Network or RTN) can parse any context free language,
and be driven by a meta grammar (a grammar enhanced by
Kleene closure operator *, such as EBNF). I believe 
Python actually uses such a system. It is appealing because
it can be very efficient (compared to an RD parser), but is
still extensible (though not as much as pure RD). It also
allows opening up 'more parts of the language' to extension,
such as user defined operators and keyword introduced
constructions, whilst still enforcing some structure.

My pragmatic problem is to modify my existing parser
without too much impact (I prefer not to rewrite the whole
thing for another tool).

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net