From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 97A06BB81; Fri, 18 Nov 2005 16:59:23 +0100 (CET) Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id jAIFxLXI025375; Fri, 18 Nov 2005 16:59:22 +0100 Received: from rosella (ppp7-104.lns1.syd7.internode.on.net [59.167.7.104]) by smtp3.adl2.internode.on.net (8.12.9/8.12.6) with ESMTP id jAIFxHii032181; Sat, 19 Nov 2005 02:29:17 +1030 (CST) (envelope-from skaller@users.sourceforge.net) Subject: Re: [Caml-list] ocamlyacc -- can i tell it to be quiet? From: skaller To: Sebastian Egner Cc: caml-list@yquem.inria.fr, caml-list-bounces@yquem.inria.fr In-Reply-To: References: Content-Type: text/plain Date: Sat, 19 Nov 2005 02:59:16 +1100 Message-Id: <1132329556.10869.51.camel@rosella> Mime-Version: 1.0 X-Mailer: Evolution 2.4.1 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 437DFA59.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; caml-list:01 ocamlyacc:01 ctype:01 lalr:01 parser:01 lexer:01 tokens:01 argg:01 lexer:01 tokens:01 preprocessed:01 rec:01 trivial:01 multi-word:01 preprocess:01 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 On Fri, 2005-11-18 at 15:16 +0100, Sebastian Egner wrote: > > > The following leads to shift reduce conflict: > > > > ctype_name: > > | LONG LONG > > | LONG > > > > Yacc is very weird -- I can parse a list of LONG without > > a conflict .. but not two of them?? > > > > Is there any way to tell it to shut up? > > Rather than trying to solve this in the LALR parser .. > the easiest way is to adapt the _lexer_ to produce two > different tokens for "long" and for "long long" Argg. I feel dumb! VERY dumb!! You are right! My lexer produces a list of tokens, which are then preprocessed to make them easier to parse: Felix only has one filter, to strip out whitespace and comments: (* 1: remove comments *) let filter_comments x = let rec filter x' result = match x' with | COMMENT_NEWLINE _ :: t | COMMENT _ :: t | NEWLINE :: t | WHITE _ :: t -> filter t result | h :: t -> filter t (h::result) | [] -> List.rev result in filter x [] let translate ts = let filters = [ (* 1 *) filter_comments ] and reverse_apply dat fn = fn dat in List.fold_left reverse_apply ts filters but it is trivial to add another one to compress multi-word C type names (such as long long). Originally, this code was used in Vyper to preprocess tokens: Vyper was an Ocaml based Python interpreter, and Python is a bit nasty to parse with an LALR1 parser -- it took 13 or so prepasses on the token stream to prepare it (indent/dedent processing, and the weird Pythonism allowing a trailing comma in tuples like (1,2,) being the hardest to manage). So actually .. I don't even have to modify the Ocamllex lexer at all, not even to make these names keywords, all the technology is in place already -- thanks for reminding why its there!! -- John Skaller Felix, successor to C++: http://felix.sf.net