From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39])
	by yquem.inria.fr (Postfix) with ESMTP id 69A08BB81;
	Wed, 14 Dec 2005 07:08:30 +0100 (CET)
Received: from smtp1.adl2.internode.on.net (smtp1.adl2.internode.on.net [203.16.214.181])
	by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id jBE68S3c031753;
	Wed, 14 Dec 2005 07:08:29 +0100
Received: from rosella (ppp33-4.lns1.syd2.internode.on.net [59.167.33.4] (may be forged))
	by smtp1.adl2.internode.on.net (8.12.9/8.12.6) with ESMTP id jBE68FCY038151;
	Wed, 14 Dec 2005 16:38:15 +1030 (CST)
	(envelope-from skaller@users.sourceforge.net)
Subject: Re: [Caml-list] [ANNOUNCE] Alpha release of Menhir, an LR(1)
	parser generator for ocaml
From: skaller <skaller@users.sourceforge.net>
To: Nathaniel Gray <n8gray@gmail.com>
Cc: Francois.Pottier@inria.fr,
	Caml Mailing List <caml-list@yquem.inria.fr>
In-Reply-To: <aee06c9e0512131307k3fc494a5k3591d549d552f1b@mail.gmail.com>
References: <20051212175838.GA8502@yquem.inria.fr>
	 <aee06c9e0512131307k3fc494a5k3591d549d552f1b@mail.gmail.com>
Content-Type: text/plain
Date: Wed, 14 Dec 2005 17:08:15 +1100
Message-Id: <1134540495.8980.63.camel@rosella>
Mime-Version: 1.0
X-Mailer: Evolution 2.4.1 
Content-Transfer-Encoding: 7bit
X-Miltered: at concorde with ID 439FB6DC.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; caml-list:01 parser:01 ocaml:01 ocamlyacc:01 parsing:01 distro:01 gpl:01 functor:01 parser:01 parsers:01 ocamlyacc:01 val:01 token:01 lexbuf:01 val:01 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.3

On Tue, 2005-12-13 at 13:07 -0800, Nathaniel Gray wrote:
> This is pretty nice!  Every time I use ocamlyacc I think "somebody
> should write something better."  Now it looks like somebody has!  I
> can't tell you how many times I've wanted parameterized rules and
> simple "library" rules for parsing delimiter-separated lists and
> such... 

Yes, it is pretty nice! However it still appears to have some
problems. Any comments appreciated.

0. The licence. Q public licence for the generator????
Please NO NO NO!! Not unless it is distributed
as part of the official distro. Is there any chance of that?
If not even GPL would be better ;(


1. Generating a functor is cute, but it doesn't seem to
allow arguments to parser functions. Perhaps I missed something?
Is there a way to use the functorisation with closures to
add an argument?

In particular, can the parser be generated *inside*
an environment such a function or let binding?
[Felix allows that, which means an extra argument is
not required, a variable in the environment can be used
instead]

2. The signature of parsers is still wrong? 
Ocamlyacc usesthe typing

	val parser: (lexbuf->token) -> lexbuf -> 'a

which is just bad. A better signature is

	val parser: ( unit -> token ) -> 'a

There is no need to provide location information: the correct
solution is to throw an exception, which is caught in a 
context which can determine the location.

It would be nice to be able to generate this signature 
with a command line switch, pragma, or some other mechanism,
even if the default is chosen for ocamlyacc compatibility.

3. I have doubts about the claim that parsers can 'share'
token types. I do not see how this is possible. It is
contradicted by the compilation model description, which
explains how it is necessary to join separate files making
up a grammar specification. In this case, the joined system
is going to generate a single token type, and any type
generated by another joining is certain to generate
a distinct type because

(a) the type is defined in a distinct ocaml module (mli file)
(b) the typing of normal variants is nominal

This problem would go away if polymorphic variants
were used instead, because the typenames are then simply
abbreviations, since pm-variants are structurally, not
nominally, typed.

Perhaps a command line switch, pragma, or whatever, to use
polymorphic variants instead of ordinary ones?

Actually, I personally find the 'yacc' technique of
generating tokens to be rather lame. Felix does this
much better -- the parser simply expects a token type
which is a variant, the type can be defined wherever
you like. In particular, the lexer and parser can
share that definition.

As far as I can see Menhir COULD do this, except of
course one would use %token as a special way
of generating the variant. All that would be required
I think is the syntax

%import_tokens "filename"

which refers to the token definition file -- as an
alternative to inlining these token definitions.
(if pm-variants are used you could probably support both,
though I'm not sure).

A token definition file then generates two files,
an ordinary mli file with the token variant type,
and, a special information file for the parser generator
(with the same information, but in a more useful form).

In Felix none of this is necessary because parsing is
built in, so the compiler can find the information required
for the parser generator directly from the token variant type.

4. Just curious, but how practical is LR(1) in terms of
generated code sizes? Felix is using Elkhound as its 
parser which is a GLR parser with an LALR(1) core. In theory
there is an option for choosing the core automaton, which
also allows LR(1) however I recall Scott McPeak commenting
it wasn't worth supporting because it generated tables
which were far too big. 

I'm curious how one would be able to predict the size of the 
generated code since I don't  really understand the 
additional constraints LALR(1) introduces .. 

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net