From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39])
	by yquem.inria.fr (Postfix) with ESMTP id 1C42EBC6B
	for <caml-list@yquem.inria.fr>; Sat, 28 Apr 2007 12:32:26 +0200 (CEST)
Received: from ipmail01.adl2.internode.on.net (ipmail01.adl2.internode.on.net [203.16.214.140])
	by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l3SAWNhE004419
	for <caml-list@inria.fr>; Sat, 28 Apr 2007 12:32:25 +0200
X-IronPort-AV: E=Sophos;i="4.14,464,1170595800"; 
   d="scan'208";a="120573123"
Received: from ppp8-148.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.8.148])
  by ipmail01.adl2.internode.on.net with ESMTP; 28 Apr 2007 20:02:20 +0930
Subject: menhir
From: skaller <skaller@users.sourceforge.net>
To: caml-list@inria.fr
Content-Type: text/plain
Date: Sat, 28 Apr 2007 20:32:16 +1000
Message-Id: <1177756336.11923.18.camel@rosella.wigram>
Mime-Version: 1.0
X-Mailer: Evolution 2.10.1 
Content-Transfer-Encoding: 7bit
X-Miltered: at concorde with ID 463322B7.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; parser:01 ocamlyacc:01 ocamlopt:01 bug:01 ocamlopt:01 compilation:01 ocamlyacc:01 parser:01 syntax:01 hacked:01 expr:01 expr:01 tokens:01 semantics:01 lalr:01 

Just a note I just built the Felix parser with Menhir.
First, it detected some duplicate definitions Ocamlyacc didn't! Good!

Second, I got a "rather a lot" of states have end-of-stream conflicts.
What's that about?

Third the generated ml file was 4.5 Meg. 
Ocamlopt on amd64 hung for so long I almost posted a bug report
for Ocamlopt, but finally it finished. This was a 95% CPU, 25% memory
job, so no paging. I'd guess it took 100x times longer than
compilation of the ocamlyacc file (which is just a bunch of numbers :)
I didn't measure it .. no biggie for me now I know, but my
box is a LOT faster than some of the boxes my product gets built on.

After that, Felix built ok, and the parser worked for
'pure' code. However it failed when Felix preprocessor
syntax extensions were used (which is 90% of all programs).

Now, most of the system data transport for this is properly
built so it can't cause any problems. The one thing which 
is hacked is the pushback detection.

Basically: when Ocamlyacc reduces a production, it sometimes
ends on the last token, and sometimes it overshoots by 1.

My grammar uses a system like:

exprx:
  expr expr_terminator { $1,$2 }

statementsx:
  | statement_aster statements_terminator { $1, $2 }

This is saying: a 'special expression' exprx is an expression
PLUS one of the tokens which will solidly terminate an 
expression AND NOT ITSELF BE OVERSHOT by the parser.

In other words when exprx is parsed the reduction must
leave the next token unread.

The semantics used are: when the exprx is processed the
action arranges to push the terminator token back
into the token stream.

Perhaps because Menhir is LR(1) not LALR(1), this technique
is failing. Or Menhir may simply be looking ahead further
than required in the token stream.

Whichever way, I am depending on this particular implementation
detail of Ocamlyacc, and Mehir is using a different implementation.

Any suggestions how to 'fix' this?

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net