From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id GAA27965; Fri, 13 Aug 2004 06:45:28 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id GAA28522 for ; Fri, 13 Aug 2004 06:45:27 +0200 (MET DST) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i7D4jNRM000597 for ; Fri, 13 Aug 2004 06:45:25 +0200 Received: from [192.168.1.200] (ppp197-3.lns1.syd2.internode.on.net [203.122.197.3]) by smtp3.adl2.internode.on.net (8.12.9/8.12.9) with ESMTP id i7D4jCHY005958; Fri, 13 Aug 2004 14:15:18 +0930 (CST) Subject: Re: [Caml-list] Context Free Grammars? From: skaller Reply-To: skaller@users.sourceforge.net To: David McClain Cc: caml-list In-Reply-To: <93BB4D7C-EC83-11D8-9939-000A95C19BAA@Avisere.com> References: <93BB4D7C-EC83-11D8-9939-000A95C19BAA@Avisere.com> Content-Type: text/plain Message-Id: <1092372311.29139.49.camel@pelican.wigram> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.2.2 (1.2.2-4) Date: 13 Aug 2004 14:45:11 +1000 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 411C4763.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 sourceforge:01 2004:99 mcclain:01 intrinsic:01 synthetic:99 lalr:01 lalr:01 unambiguous:01 annotations:01 expr:01 expr:01 vastly:01 python:01 trickery:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk On Fri, 2004-08-13 at 03:18, David McClain wrote: > Anyone have any hints about syntax transformations so that CFG's can > really be used here? Yeah, I've been through this pain and still bump into it a lot. There are two tricks. The first is to change your thinking to *bottom up*. The grammar names 'syntactic fragments' of increasing size and complexity as it sees them, rather than going searching for some wholistic shape. So name your fragments uniquely; think in terms: "the terms I'm parsing have intrinsic (synthetic) structure" that is passed upwards, rather than "I'm looking down the tree for something I expect, if I don't find it I will try something else" The second trick is: make your grammar coarse grained and far too general. Don't use the LALR(1) parser to enforce constraints. Do that in the { executable code } part or in post processing. Type checking is the obvious example of that. In the Felix grammar (which is LALR(1) and entirely unambiguous), I use exactly the same coarse syntax for executable expressions and for type annotations. In both cases I allow x + y * z (yup, Felix has anonymous sum types). So when I parse (x + y * z) : ( t + u * v) // executable : type annotation I use (the moral equivalent of): expr COLON expr { let x = $1 in let t = expr_as_type $3 in `Coercion (x,y) } Since not all executable expressions are type expressions, I trap that in the function 'expr_as_type' and throw an exception -- which produces a vastly superior error message to 'Syntax Error' that is the best the parser can produce automatically. Finally: if you are parsing a *nasty* language, such as Python, that isn't even remotely LALR(1), you can still use a LALR(1) grammar with some care are trickery, to do a lot of the work. To parse Python, I wrote multi-stage 'token filter' to preprocess the token stream, generating 'INDENT' and 'UNDENT' tokens, for example (Python uses indentation to specify block structure). Another nastiness in Python is (a,) for a unary tuple: that trailing comma is allowed in pairs too: (a,b,) and it really screws up LALR(1) parsing. It took around 10 separate passes to generate a list of tokens that I could more easily parse with Ocamlyacc. -- John Skaller, mailto:skaller@users.sf.net voice: 061-2-9660-0850, snail: PO BOX 401 Glebe NSW 2037 Australia Checkout the Felix programming language http://felix.sf.net ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners