From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 1049ABBAF for ; Wed, 11 Mar 2009 01:46:40 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtkBAGumtklCbwQZk2dsb2JhbACMc4g7AQEBAQkJCgkRBLxKhA0G X-IronPort-AV: E=Sophos;i="4.38,338,1233529200"; d="scan'208";a="25382747" Received: from out1.smtp.messagingengine.com ([66.111.4.25]) by mail1-smtp-roc.national.inria.fr with ESMTP; 11 Mar 2009 01:46:39 +0100 Received: from compute1.internal (compute1.internal [10.202.2.41]) by out1.messagingengine.com (Postfix) with ESMTP id C6C5F2EE879; Tue, 10 Mar 2009 20:46:38 -0400 (EDT) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute1.internal (MEProxy); Tue, 10 Mar 2009 20:46:38 -0400 X-Sasl-enc: akmUBVjI6/vn7PpSgxpHZGB9YQRyNgR6Uj0BM/sYrmRZ 1236732398 Received: from [192.168.1.10] (ALyon-157-1-71-91.w81-251.abo.wanadoo.fr [81.251.158.91]) by mail.messagingengine.com (Postfix) with ESMTPSA id 267C328B31; Tue, 10 Mar 2009 20:46:38 -0400 (EDT) Message-ID: <49B7094C.7040403@ens-lyon.org> Date: Wed, 11 Mar 2009 01:43:56 +0100 From: Martin Jambon User-Agent: Thunderbird 2.0.0.17 (X11/20081008) MIME-Version: 1.0 To: Robert Muller Cc: O'Caml Mailing List Subject: Re: [Caml-list] ocamllex question References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; ens-lyon:01 ocamllex:01 ocamllex:01 ocamlyacc:01 subset:01 lexer:01 tokens:01 pointer:01 lexer:01 tokens:01 def:01 ocamlyacc:01 pointer:01 beginner's:01 ocaml:01 Robert Muller wrote: > I am attempting to use ocamllex together with ocamlyacc to parse a > subset of python. Python uses indentation to denote > statement blocks so a lexer is sometimes required to return a sequence > of tokens without advancing the input pointer. In > particular, a lexer for python should return a sequence of so-called > DEDENT tokens when indented fragments > end. E.g., > > def f(x): > statement1; > statement2; > statement3; > statement4; > A > > the lexer should return two consecutive DEDENT tokens between the '\n' > at the end of statement4 and the token for A. What I would do is: 1. pass an argument to each "rule" function, containing the stack of indentation information (current block and parent blocks, with first line number and indentation). 2. let each rule produce as many tokens as necessary and return lists of tokens 3. create a token stream for ocamlyacc/menhir that would call the ocamllex-generated functions as needed; these would put the tokens into a queue. Refill when the queue is empty. 4. Figure how make good error reports :-) Martin > Looking at the documentation and examples, it isn't clear how to > convince the generated lexer to not advance the input pointer > so that two consecutive DEDENT tokens can be returned before the token > for A is returned. > > Any ocamllex perts out there? > > Thanks, > Bob Muller > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- http://mjambon.com/