From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id GAA27965; Fri, 13 Aug 2004 06:45:28 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id GAA28522 for <caml-list@pauillac.inria.fr>; Fri, 13 Aug 2004 06:45:27 +0200 (MET DST)
Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203])
	by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i7D4jNRM000597
	for <caml-list@inria.fr>; Fri, 13 Aug 2004 06:45:25 +0200
Received: from [192.168.1.200] (ppp197-3.lns1.syd2.internode.on.net [203.122.197.3])
	by smtp3.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id i7D4jCHY005958;
	Fri, 13 Aug 2004 14:15:18 +0930 (CST)
Subject: Re: [Caml-list] Context Free Grammars?
From: skaller <skaller@users.sourceforge.net>
Reply-To: skaller@users.sourceforge.net
To: David McClain <David.McClain@Avisere.com>
Cc: caml-list <caml-list@inria.fr>
In-Reply-To: <93BB4D7C-EC83-11D8-9939-000A95C19BAA@Avisere.com>
References: <93BB4D7C-EC83-11D8-9939-000A95C19BAA@Avisere.com>
Content-Type: text/plain
Message-Id: <1092372311.29139.49.camel@pelican.wigram>
Mime-Version: 1.0
X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) 
Date: 13 Aug 2004 14:45:11 +1000
Content-Transfer-Encoding: 7bit
X-Miltered: at concorde with ID 411C4763.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; caml-list:01 sourceforge:01 2004:99 mcclain:01 intrinsic:01 synthetic:99 lalr:01 lalr:01 unambiguous:01 annotations:01 expr:01 expr:01 vastly:01 python:01 trickery:01 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

On Fri, 2004-08-13 at 03:18, David McClain wrote:

> Anyone have any hints about syntax transformations so that CFG's can 
> really be used here? 

Yeah, I've been through this pain and still bump into it
a lot.

There are two tricks. The first is to change your thinking
to *bottom up*. The grammar names 'syntactic fragments'
of increasing size and complexity as it sees them,
rather than going searching for some wholistic shape.
So name your fragments uniquely; think in terms:

"the terms I'm parsing have intrinsic (synthetic) structure" 
that is passed upwards, rather than "I'm looking down
the tree for something I expect, if I don't find it
I will try something else"

The second trick is: make your grammar coarse grained
and far too general. Don't use the LALR(1) parser to
enforce constraints. Do that in the { executable code }
part or in post processing. Type checking is the obvious
example of that.

In the Felix grammar (which is LALR(1) and entirely
unambiguous), I use exactly the same coarse syntax for
executable expressions and for type annotations. 

In both cases I allow x + y * z (yup, Felix has anonymous
sum types). So when I parse

	(x + y * z) : ( t + u * v)
	// executable  : type annotation

I use (the moral equivalent of):

	expr COLON expr {
		let x = $1 in
		let t = expr_as_type $3 in
		`Coercion (x,y)
	}


Since not all executable expressions are type expressions,
I trap that in the function 'expr_as_type' and throw
an exception -- which produces a vastly superior error message
to 'Syntax Error' that is the best the parser can produce
automatically.

Finally: if you are parsing a *nasty* language, such as Python,
that isn't even remotely LALR(1), you can still use a LALR(1)
grammar with some care are trickery, to do a lot of the work.
To parse Python, I wrote  multi-stage 'token filter' to preprocess
the token stream, generating 'INDENT' and 'UNDENT' tokens,
for example (Python uses indentation to specify block structure).
Another nastiness in Python is (a,) for a unary tuple: that trailing
comma is allowed in pairs too: (a,b,) and it really screws up
LALR(1) parsing.

It took around 10 separate passes to generate a list of tokens
that I could more easily parse with Ocamlyacc.


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners