From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 5FB3EBC37; Wed, 11 Nov 2009 12:04:24 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq8AAO8o+kpCbwQckWdsb2JhbACEc5cgAQEBAQkLCgcTBK44kQoGgTCCOFQ X-IronPort-AV: E=Sophos;i="4.44,722,1249250400"; d="scan'208";a="36552387" Received: from out4.smtp.messagingengine.com ([66.111.4.28]) by mail2-smtp-roc.national.inria.fr with ESMTP; 11 Nov 2009 12:04:23 +0100 Received: from compute1.internal (compute1.internal [10.202.2.41]) by gateway1.messagingengine.com (Postfix) with ESMTP id 72140BEF8A; Wed, 11 Nov 2009 06:04:22 -0500 (EST) Received: from heartbeat1.messagingengine.com ([10.202.2.160]) by compute1.internal (MEProxy); Wed, 11 Nov 2009 06:04:22 -0500 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=message-id:date:from:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; s=smtpout; bh=9Ccni/ClvkEKxuQJtUKCD0HNCiU=; b=uoQKT1mdg2JD0lJK1imOwZB5+Y1RjDtK7uK36fosUVrJbC5aNtQ0lf/SpgLWrt1BWEVWfZWcmAsRGasUPZcymhXYuD2XPCbGZbI3AD2d46g3PpsNJldfYiaz+UYn/TS9upMGvcGrQjxkweAe0HgpS+UJbxmcO6wy3UZq0Vqm9Qs= X-Sasl-enc: ff1lynSqlUfKGnUONlHzP8Zvi762NPLe1QaHA5kcqFnm 1257937462 Received: from [192.168.1.14] (ALyon-157-1-135-142.w90-42.abo.wanadoo.fr [90.42.30.142]) by mail.messagingengine.com (Postfix) with ESMTPSA id 702CB4ABE45; Wed, 11 Nov 2009 06:04:21 -0500 (EST) Message-ID: <4AFA9A1A.1040808@ens-lyon.org> Date: Wed, 11 Nov 2009 12:03:54 +0100 From: Martin Jambon User-Agent: Thunderbird 2.0.0.22 (X11/20090908) MIME-Version: 1.0 To: Dario Teixeira Cc: Francois.Pottier@inria.fr, Jeff Shaw , caml-list@inria.fr Subject: Re: [Caml-list] Re: The lexer hack References: <415678.66831.qm@web111512.mail.gq1.yahoo.com> In-Reply-To: <415678.66831.qm@web111512.mail.gq1.yahoo.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam: no; 0.00; ens-lyon:01 lexer:01 parser:01 lexer:01 tokens:01 lexbuf:01 lexbuf:01 token:01 token:01 wrote:01 verb:01 verb:01 caml-list:01 jambon:01 jambon:01 Dario Teixeira wrote: > Hi, > >> Interesting. Have you confirmed that this works? I am slightly >> worried by the fact that an LR parser reads one token ahead, >> i.e. one token past BEGIN_VERB might already have been read >> before the enter_verb semantic action is executed. If that is >> so, then this token would be read while the lexer is still in >> the wrong mode. > > Yes, I was just thinking about that as well... :-) > I think I can pile another hack on top of the dummy action: > dummy tokens to take care of the readahead issue. Though > this has the potential to get comically silly pretty quickly! > > I'll report later... If the lexer to use can be determined by only one token (BEGIN_VERB), I think you can change the state in the lexer like this: rule token state = parse "" { match !state with `Normal -> normal_token state lexbuf | `Verbatim -> verbatim_token state lexbuf } and normal_token state = parse ... | "\\begin{verbatim}" { state := `Verbatim; BEGIN_VERB } and verbatim_token state = parse ... { RAW (...) } | "\\end{verbatim}" { state := `Normal; END_VERB } An even simpler option, if possible in your case, is to use a single token for the whole verbatim section: rule token = parse ... | "\\begin{verbatim}" { finish_verbatim lexbuf } and finish_verbatim = shortest _* as s "\\end{verbatim}" { RAW s } Martin -- http://mjambon.com/