caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: David McClain <David.McClain@Avisere.com>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Context Free Grammars?
Date: 13 Aug 2004 14:45:11 +1000	[thread overview]
Message-ID: <1092372311.29139.49.camel@pelican.wigram> (raw)
In-Reply-To: <93BB4D7C-EC83-11D8-9939-000A95C19BAA@Avisere.com>

On Fri, 2004-08-13 at 03:18, David McClain wrote:

> Anyone have any hints about syntax transformations so that CFG's can 
> really be used here? 

Yeah, I've been through this pain and still bump into it
a lot.

There are two tricks. The first is to change your thinking
to *bottom up*. The grammar names 'syntactic fragments'
of increasing size and complexity as it sees them,
rather than going searching for some wholistic shape.
So name your fragments uniquely; think in terms:

"the terms I'm parsing have intrinsic (synthetic) structure" 
that is passed upwards, rather than "I'm looking down
the tree for something I expect, if I don't find it
I will try something else"

The second trick is: make your grammar coarse grained
and far too general. Don't use the LALR(1) parser to
enforce constraints. Do that in the { executable code }
part or in post processing. Type checking is the obvious
example of that.

In the Felix grammar (which is LALR(1) and entirely
unambiguous), I use exactly the same coarse syntax for
executable expressions and for type annotations. 

In both cases I allow x + y * z (yup, Felix has anonymous
sum types). So when I parse

	(x + y * z) : ( t + u * v)
	// executable  : type annotation

I use (the moral equivalent of):

	expr COLON expr {
		let x = $1 in
		let t = expr_as_type $3 in
		`Coercion (x,y)
	}


Since not all executable expressions are type expressions,
I trap that in the function 'expr_as_type' and throw
an exception -- which produces a vastly superior error message
to 'Syntax Error' that is the best the parser can produce
automatically.

Finally: if you are parsing a *nasty* language, such as Python,
that isn't even remotely LALR(1), you can still use a LALR(1)
grammar with some care are trickery, to do a lot of the work.
To parse Python, I wrote  multi-stage 'token filter' to preprocess
the token stream, generating 'INDENT' and 'UNDENT' tokens,
for example (Python uses indentation to specify block structure).
Another nastiness in Python is (a,) for a unary tuple: that trailing
comma is allowed in pairs too: (a,b,) and it really screws up
LALR(1) parsing.

It took around 10 separate passes to generate a list of tokens
that I could more easily parse with Ocamlyacc.


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


      parent reply	other threads:[~2004-08-13  4:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2004-08-12 17:18 David McClain
2004-08-12 18:02 ` Joshua Smith
2004-08-12 18:14   ` David McClain
2004-08-12 19:25     ` Paul Snively
2004-08-12 21:47       ` Erik de Castro Lopo
2004-08-13  5:22       ` skaller
2004-08-13  5:59         ` David Brown
2004-08-13 14:20         ` Brian Hurt
2004-08-13  4:45 ` skaller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1092372311.29139.49.camel@pelican.wigram \
    --to=skaller@users.sourceforge.net \
    --cc=David.McClain@Avisere.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).