From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 1C42EBC6B for ; Sat, 28 Apr 2007 12:32:26 +0200 (CEST) Received: from ipmail01.adl2.internode.on.net (ipmail01.adl2.internode.on.net [203.16.214.140]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l3SAWNhE004419 for ; Sat, 28 Apr 2007 12:32:25 +0200 X-IronPort-AV: E=Sophos;i="4.14,464,1170595800"; d="scan'208";a="120573123" Received: from ppp8-148.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.8.148]) by ipmail01.adl2.internode.on.net with ESMTP; 28 Apr 2007 20:02:20 +0930 Subject: menhir From: skaller To: caml-list@inria.fr Content-Type: text/plain Date: Sat, 28 Apr 2007 20:32:16 +1000 Message-Id: <1177756336.11923.18.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 463322B7.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; parser:01 ocamlyacc:01 ocamlopt:01 bug:01 ocamlopt:01 compilation:01 ocamlyacc:01 parser:01 syntax:01 hacked:01 expr:01 expr:01 tokens:01 semantics:01 lalr:01 Just a note I just built the Felix parser with Menhir. First, it detected some duplicate definitions Ocamlyacc didn't! Good! Second, I got a "rather a lot" of states have end-of-stream conflicts. What's that about? Third the generated ml file was 4.5 Meg. Ocamlopt on amd64 hung for so long I almost posted a bug report for Ocamlopt, but finally it finished. This was a 95% CPU, 25% memory job, so no paging. I'd guess it took 100x times longer than compilation of the ocamlyacc file (which is just a bunch of numbers :) I didn't measure it .. no biggie for me now I know, but my box is a LOT faster than some of the boxes my product gets built on. After that, Felix built ok, and the parser worked for 'pure' code. However it failed when Felix preprocessor syntax extensions were used (which is 90% of all programs). Now, most of the system data transport for this is properly built so it can't cause any problems. The one thing which is hacked is the pushback detection. Basically: when Ocamlyacc reduces a production, it sometimes ends on the last token, and sometimes it overshoots by 1. My grammar uses a system like: exprx: expr expr_terminator { $1,$2 } statementsx: | statement_aster statements_terminator { $1, $2 } This is saying: a 'special expression' exprx is an expression PLUS one of the tokens which will solidly terminate an expression AND NOT ITSELF BE OVERSHOT by the parser. In other words when exprx is parsed the reduction must leave the next token unread. The semantics used are: when the exprx is processed the action arranges to push the terminator token back into the token stream. Perhaps because Menhir is LR(1) not LALR(1), this technique is failing. Or Menhir may simply be looking ahead further than required in the token stream. Whichever way, I am depending on this particular implementation detail of Ocamlyacc, and Mehir is using a different implementation. Any suggestions how to 'fix' this? -- John Skaller Felix, successor to C++: http://felix.sf.net