From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78])
	by yquem.inria.fr (Postfix) with ESMTP id 97A06BB81;
	Fri, 18 Nov 2005 16:59:23 +0100 (CET)
Received: from smtp3.adl2.internode.on.net (smtp3.adl2.internode.on.net [203.16.214.203])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id jAIFxLXI025375;
	Fri, 18 Nov 2005 16:59:22 +0100
Received: from rosella (ppp7-104.lns1.syd7.internode.on.net [59.167.7.104])
	by smtp3.adl2.internode.on.net (8.12.9/8.12.6) with ESMTP id jAIFxHii032181;
	Sat, 19 Nov 2005 02:29:17 +1030 (CST)
	(envelope-from skaller@users.sourceforge.net)
Subject: Re: [Caml-list] ocamlyacc -- can i tell it to be quiet?
From: skaller <skaller@users.sourceforge.net>
To: Sebastian Egner <sebastian.egner@philips.com>
Cc: caml-list@yquem.inria.fr, caml-list-bounces@yquem.inria.fr
In-Reply-To: <OFC3EA1194.47FD1C2C-ONC12570BD.004E0D15-C12570BD.004E88BD@philips.com>
References: 
	 <OFC3EA1194.47FD1C2C-ONC12570BD.004E0D15-C12570BD.004E88BD@philips.com>
Content-Type: text/plain
Date: Sat, 19 Nov 2005 02:59:16 +1100
Message-Id: <1132329556.10869.51.camel@rosella>
Mime-Version: 1.0
X-Mailer: Evolution 2.4.1 
Content-Transfer-Encoding: 7bit
X-Miltered: at nez-perce with ID 437DFA59.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; caml-list:01 ocamlyacc:01 ctype:01 lalr:01 parser:01 lexer:01 tokens:01 argg:01 lexer:01 tokens:01 preprocessed:01 rec:01 trivial:01 multi-word:01 preprocess:01 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.3

On Fri, 2005-11-18 at 15:16 +0100, Sebastian Egner wrote:
> 
> > The following leads to shift reduce conflict:
> > 
> > ctype_name:
> >   | LONG LONG 
> >   | LONG 
> > 
> > Yacc is very weird -- I can parse a list of LONG without
> > a conflict .. but not two of them?? 
> > 
> > Is there any way to tell it to shut up?
> 
> Rather than trying to solve this in the LALR parser
..
> the easiest way is to adapt the _lexer_ to produce two 
> different tokens for "long" and for "long long"

Argg. I feel dumb! VERY dumb!! You are right!

My lexer produces a list of tokens, which are then 
preprocessed to make them easier to parse: Felix only 
has one filter, to strip out whitespace and comments:

(* 1: remove comments *)

let filter_comments x =
  let rec filter x' result  =
    match x' with 
    | COMMENT_NEWLINE _ :: t
    | COMMENT _ :: t 
    | NEWLINE :: t
    | WHITE _ :: t -> filter t result
    | h :: t -> filter t (h::result)
    | [] -> List.rev result
  in filter x []

let translate ts = 
  let filters = [
    (* 1 *) filter_comments
    ] 
  and reverse_apply dat fn = fn dat 
  in List.fold_left reverse_apply ts filters

but it is trivial to add another one to compress
multi-word C type names (such as long long).

Originally, this code was used in Vyper to preprocess
tokens: Vyper was an Ocaml based Python interpreter, and Python
is a bit nasty to parse with an LALR1 parser -- it took
13 or so prepasses on the token stream to prepare it
(indent/dedent processing, and the weird Pythonism
allowing a trailing comma in tuples like (1,2,) being the
hardest to manage).

So actually .. I don't even have to modify the Ocamllex
lexer at all, not even to make these names keywords,
all the technology is in place already -- thanks
for reminding why its there!!

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net