From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA11432; Thu, 12 Aug 2004 21:25:59 +0200 (MET DST) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA10851 for ; Thu, 12 Aug 2004 21:25:58 +0200 (MET DST) Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.46]) by concorde.inria.fr (8.12.10/8.12.10) with ESMTP id i7CJPuRM007587 for ; Thu, 12 Aug 2004 21:25:57 +0200 Received: from mac.com (smtpin08-en2 [10.13.10.153]) by smtpout.mac.com (8.12.6/MantshX 2.0) with ESMTP id i7CJPstR022882; Thu, 12 Aug 2004 12:25:54 -0700 (PDT) Received: from [192.168.1.100] (dsl081-080-123.lax1.dsl.speakeasy.net [64.81.80.123]) (authenticated bits=0) by mac.com (Xserve/smtpin08/MantshX 4.0) with ESMTP id i7CJPjth005880; Thu, 12 Aug 2004 12:25:46 -0700 (PDT) In-Reply-To: <7D827B04-EC8B-11D8-9939-000A95C19BAA@Avisere.com> References: <93BB4D7C-EC83-11D8-9939-000A95C19BAA@Avisere.com> <411BB09A.7010307@trdlnk.com> <7D827B04-EC8B-11D8-9939-000A95C19BAA@Avisere.com> Mime-Version: 1.0 (Apple Message framework v619) Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <55D5BF02-EC95-11D8-915C-000A27DEEC20@mac.com> Content-Transfer-Encoding: 7bit Cc: Joshua Smith , caml-list@inria.fr From: Paul Snively Subject: Re: [Caml-list] Context Free Grammars? Date: Thu, 12 Aug 2004 12:25:14 -0700 To: David McClain X-Pgp-Agent: GPGMail 1.0.2 X-Mailer: Apple Mail (2.619) X-Miltered: at concorde with ID 411BC444.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Loop: caml-list@inria.fr X-Spam: no; 0.00; caml-list:01 hash:01 mcclain:01 lalr:01 reductions:01 descent:01 hand-written:01 reductions:01 recursion:01 recursion:01 lalr:01 camlp:01 camlp:01 o'caml's:01 expressive:01 Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi David! On Aug 12, 2004, at 11:14 AM, David McClain wrote: > I am not a YACC expert (no kidding!?) but I always had the notion that > nonterminal reduction trees would somehow guide the reduce decisions. > Now, however, it appears that YACC is simply looking blindly for > syntactic token chains and reducing on them with absolutely no notion > of who might be interested in the result. Kind of takes the wind out > of my ancient understandings of YACC. > Quite right. YACC, as you no doubt know, is LALR(1), which expands to "Look Ahead Left-Right 1, which is a fancy way of saying basically two things: 1) Only one token at a time is taken into account in reductions. 2) Right-recursion will screw you over. More in a moment. > [So I guess I was writing my grammars as though there were some kind > of recursive descent engine in control. Not apparently so...] > Right! And hand-written recursive-descent parsers are almost always LL(1): 1) Only one token at a time is taken into account in reductions. 2) Left-recursion will screw you over. If you're accustomed to writing LL(1) grammars, you'll need to factor right recursion to left recursion in order to go LALR(1). But you're still left with that pesky (1), which really seems to be what your post is about. So forget about LALR vs. LL and let's look at some solutions to the (1) problem. O'Caml as a suite already has some quite nice parsing and lexing tools, and third parties have made others. Let me first recommend that you have a look at the Camlp4 tutorial at . Camlp4 is O'Caml's "preprocessor," but unfortunately that term has become one of rather ill repute thanks to the C/C++ communities. Camlp4 is much closer in expressive power and safety to Scheme's hygenic macros or to a good LL-based parser-generator system, as the tutorial makes abundantly clear. However, as nice as Camlp4 undeniably is, it still leaves you with your (1) problem barring lexer hacks as described at : "The input 'A C' raises Stream.Error. There is no simple solution to this problem, but there is a solution (not very clean, actually): create a entry from a parser (it is possible via the function Grammar.Entry.of_parser). This parser must scan the stream using Stream.npeek until the number of necessary tokens allows to make the difference; return unit or raise Stream.Failure according to the cases." This is the traditional "smart lexer" approach to handling ambiguous (i.e. real-world) grammars in either an LALR(1) (ocamlyacc) or LL(1) (Camlp4) framework. A lot of work has been done on relaxing the (1) restriction while still allowing efficient parsing. I won't go into the history and alternatives, but rather will just refer to . Elkhound is a GLR parser generator that will generate either C++ or O'Caml parsers. Being GLR, it deals nicely with ambiguities, and will even let you defer implementing resolution of them until after you've prototyped your entire grammar if you wish. There's a tutorial at that may be helpful, although it's in C++ terms rather than O'Caml. Simply putting: option lang_OCaml; at the beginning of your grammar will fix that. :-) > David McClain > Senior Corporate Scientist > Avisere, Inc. > > david.mcclain@avisere.com > +1.520.390.7738 (USA) > > ------------------- > To unsubscribe, mail caml-list-request@inria.fr Archives: > http://caml.inria.fr > Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: > http://caml.inria.fr/FAQ/ > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > I hope this is helpful, Paul Snively -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (Darwin) iEYEARECAAYFAkEbxDUACgkQbot1wzHBQBXfXgCdHCoAXU3CrE/VhmHozY5+z0sx GfwAn2pgYKqTnksbdHjK6bqM0V/Dq1gN =Pddt -----END PGP SIGNATURE----- ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners