From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 69A08BB81; Wed, 14 Dec 2005 07:08:30 +0100 (CET) Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id jBE68S3c031753; Wed, 14 Dec 2005 07:08:29 +0100 Received: from rosella (ppp33-4.lns1.syd2.internode.on.net [59.167.33.4] (may be forged)) by smtp1.adl2.internode.on.net (8.12.9/8.12.6) with ESMTP id jBE68FCY038151; Wed, 14 Dec 2005 16:38:15 +1030 (CST) (envelope-from skaller@users.sourceforge.net) Subject: Re: [Caml-list] [ANNOUNCE] Alpha release of Menhir, an LR(1) parser generator for ocaml From: skaller To: Nathaniel Gray Cc: Francois.Pottier@inria.fr, Caml Mailing List In-Reply-To: References: <20051212175838.GA8502@yquem.inria.fr> Content-Type: text/plain Date: Wed, 14 Dec 2005 17:08:15 +1100 Message-Id: <1134540495.8980.63.camel@rosella> Mime-Version: 1.0 X-Mailer: Evolution 2.4.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 439FB6DC.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 parser:01 ocaml:01 ocamlyacc:01 parsing:01 distro:01 gpl:01 functor:01 parser:01 parsers:01 ocamlyacc:01 val:01 token:01 lexbuf:01 val:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Tue, 2005-12-13 at 13:07 -0800, Nathaniel Gray wrote: > This is pretty nice! Every time I use ocamlyacc I think "somebody > should write something better." Now it looks like somebody has! I > can't tell you how many times I've wanted parameterized rules and > simple "library" rules for parsing delimiter-separated lists and > such... Yes, it is pretty nice! However it still appears to have some problems. Any comments appreciated. 0. The licence. Q public licence for the generator???? Please NO NO NO!! Not unless it is distributed as part of the official distro. Is there any chance of that? If not even GPL would be better ;( 1. Generating a functor is cute, but it doesn't seem to allow arguments to parser functions. Perhaps I missed something? Is there a way to use the functorisation with closures to add an argument? In particular, can the parser be generated *inside* an environment such a function or let binding? [Felix allows that, which means an extra argument is not required, a variable in the environment can be used instead] 2. The signature of parsers is still wrong? Ocamlyacc usesthe typing val parser: (lexbuf->token) -> lexbuf -> 'a which is just bad. A better signature is val parser: ( unit -> token ) -> 'a There is no need to provide location information: the correct solution is to throw an exception, which is caught in a context which can determine the location. It would be nice to be able to generate this signature with a command line switch, pragma, or some other mechanism, even if the default is chosen for ocamlyacc compatibility. 3. I have doubts about the claim that parsers can 'share' token types. I do not see how this is possible. It is contradicted by the compilation model description, which explains how it is necessary to join separate files making up a grammar specification. In this case, the joined system is going to generate a single token type, and any type generated by another joining is certain to generate a distinct type because (a) the type is defined in a distinct ocaml module (mli file) (b) the typing of normal variants is nominal This problem would go away if polymorphic variants were used instead, because the typenames are then simply abbreviations, since pm-variants are structurally, not nominally, typed. Perhaps a command line switch, pragma, or whatever, to use polymorphic variants instead of ordinary ones? Actually, I personally find the 'yacc' technique of generating tokens to be rather lame. Felix does this much better -- the parser simply expects a token type which is a variant, the type can be defined wherever you like. In particular, the lexer and parser can share that definition. As far as I can see Menhir COULD do this, except of course one would use %token as a special way of generating the variant. All that would be required I think is the syntax %import_tokens "filename" which refers to the token definition file -- as an alternative to inlining these token definitions. (if pm-variants are used you could probably support both, though I'm not sure). A token definition file then generates two files, an ordinary mli file with the token variant type, and, a special information file for the parser generator (with the same information, but in a more useful form). In Felix none of this is necessary because parsing is built in, so the compiler can find the information required for the parser generator directly from the token variant type. 4. Just curious, but how practical is LR(1) in terms of generated code sizes? Felix is using Elkhound as its parser which is a GLR parser with an LALR(1) core. In theory there is an option for choosing the core automaton, which also allows LR(1) however I recall Scott McPeak commenting it wasn't worth supporting because it generated tables which were far too big. I'm curious how one would be able to predict the size of the generated code since I don't really understand the additional constraints LALR(1) introduces .. -- John Skaller Felix, successor to C++: http://felix.sf.net