From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA11432; Thu, 12 Aug 2004 21:25:59 +0200 (MET DST)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA10851 for <caml-list@pauillac.inria.fr>; Thu, 12 Aug 2004 21:25:58 +0200 (MET DST)
Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.46])
	by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i7CJPuRM007587
	for <caml-list@inria.fr>; Thu, 12 Aug 2004 21:25:57 +0200
Received: from mac.com (smtpin08-en2 [10.13.10.153])
	by smtpout.mac.com (8.12.6/MantshX 2.0) with ESMTP id i7CJPstR022882;
	Thu, 12 Aug 2004 12:25:54 -0700 (PDT)
Received: from [192.168.1.100] (dsl081-080-123.lax1.dsl.speakeasy.net [64.81.80.123])
	(authenticated bits=0)
	by mac.com (Xserve/smtpin08/MantshX 4.0) with ESMTP id i7CJPjth005880;
	Thu, 12 Aug 2004 12:25:46 -0700 (PDT)
In-Reply-To: <7D827B04-EC8B-11D8-9939-000A95C19BAA@Avisere.com>
References: <93BB4D7C-EC83-11D8-9939-000A95C19BAA@Avisere.com> <411BB09A.7010307@trdlnk.com> <7D827B04-EC8B-11D8-9939-000A95C19BAA@Avisere.com>
Mime-Version: 1.0 (Apple Message framework v619)
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
Message-Id: <55D5BF02-EC95-11D8-915C-000A27DEEC20@mac.com>
Content-Transfer-Encoding: 7bit
Cc: Joshua Smith <josh@trdlnk.com>, caml-list@inria.fr
From: Paul Snively <psnively@mac.com>
Subject: Re: [Caml-list] Context Free Grammars?
Date: Thu, 12 Aug 2004 12:25:14 -0700
To: David McClain <David.McClain@Avisere.com>
X-Pgp-Agent: GPGMail 1.0.2
X-Mailer: Apple Mail (2.619)
X-Miltered: at concorde with ID 411BC444.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Loop: caml-list@inria.fr
X-Spam: no; 0.00; caml-list:01 hash:01 mcclain:01 lalr:01 reductions:01 descent:01 hand-written:01 reductions:01 recursion:01 recursion:01 lalr:01 camlp:01 camlp:01 o'caml's:01 expressive:01 
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi David!

On Aug 12, 2004, at 11:14 AM, David McClain wrote:

> I am not a YACC expert (no kidding!?) but I always had the notion that  
> nonterminal reduction trees would somehow guide the reduce decisions.  
> Now, however, it appears that YACC is simply looking blindly for  
> syntactic token chains and reducing on them with absolutely no notion  
> of who might be interested in the result. Kind of takes the wind out  
> of my ancient understandings of YACC.
>
Quite right. YACC, as you no doubt know, is LALR(1), which expands to  
"Look Ahead Left-Right 1, which is a fancy way of saying basically two  
things:

1) Only one token at a time is taken into account in reductions.
2) Right-recursion will screw you over.

More in a moment.

> [So I guess I was writing my grammars as though there were some kind  
> of recursive descent engine in control. Not apparently so...]
>
Right! And hand-written recursive-descent parsers are almost always  
LL(1):

1) Only one token at a time is taken into account in reductions.
2) Left-recursion will screw you over.

If you're accustomed to writing LL(1) grammars, you'll need to factor  
right recursion to left recursion in order to go LALR(1). But you're  
still left with that pesky (1), which really seems to be what your post  
is about. So forget about LALR vs. LL and let's look at some solutions  
to the (1) problem.

O'Caml as a suite already has some quite nice parsing and lexing tools,  
and third parties have made others. Let me first recommend that you  
have a look at the Camlp4 tutorial at  
<http://caml.inria.fr/camlp4/tutorial>. Camlp4 is O'Caml's  
"preprocessor," but unfortunately that term has become one of rather  
ill repute thanks to the C/C++ communities. Camlp4 is much closer in  
expressive power and safety to Scheme's hygenic macros or to a good  
LL-based parser-generator system, as the tutorial makes abundantly  
clear.

However, as nice as Camlp4 undeniably is, it still leaves you with your  
(1) problem barring lexer hacks as described at  
<http://caml.inria.fr/camlp4/tutorial/tutorial003.html>: "The input 'A  
C' raises Stream.Error. There is no simple solution to this problem,  
but there is a solution (not very clean, actually): create a entry from  
a parser (it is possible via the function Grammar.Entry.of_parser).  
This parser must scan the stream using Stream.npeek until the number of  
necessary tokens allows to make the difference; return unit or raise  
Stream.Failure according to the cases." This is the traditional "smart  
lexer" approach to handling ambiguous (i.e. real-world) grammars in  
either an LALR(1) (ocamlyacc) or LL(1) (Camlp4) framework.

A lot of work has been done on relaxing the (1) restriction while still  
allowing efficient parsing. I won't go into the history and  
alternatives, but rather will just refer to  
<http://www.cs.berkeley.edu/~smcpeak/elkhound>. Elkhound is a GLR  
parser generator that will generate either C++ or O'Caml parsers. Being  
GLR, it deals nicely with ambiguities, and will even let you defer  
implementing resolution of them until after you've prototyped your  
entire grammar if you wish. There's a tutorial at  
<http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elkhound/ 
tutorial.html> that may be helpful, although it's in C++ terms rather  
than O'Caml. Simply putting:

option lang_OCaml;

at the beginning of your grammar will fix that. :-)

> David McClain
> Senior Corporate Scientist
> Avisere, Inc.
>
> david.mcclain@avisere.com
> +1.520.390.7738 (USA)
>
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives:  
> http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ:  
> http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
>
I hope this is helpful,
Paul Snively

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (Darwin)

iEYEARECAAYFAkEbxDUACgkQbot1wzHBQBXfXgCdHCoAXU3CrE/VhmHozY5+z0sx
GfwAn2pgYKqTnksbdHjK6bqM0V/Dq1gN
=Pddt
-----END PGP SIGNATURE-----

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners