From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <martin.jambon@ens-lyon.org>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by yquem.inria.fr (Postfix) with ESMTP id 5FB3EBC37;
	Wed, 11 Nov 2009 12:04:24 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq8AAO8o+kpCbwQckWdsb2JhbACEc5cgAQEBAQkLCgcTBK44kQoGgTCCOFQ
X-IronPort-AV: E=Sophos;i="4.44,722,1249250400"; 
   d="scan'208";a="36552387"
Received: from out4.smtp.messagingengine.com ([66.111.4.28])
  by mail2-smtp-roc.national.inria.fr with ESMTP; 11 Nov 2009 12:04:23 +0100
Received: from compute1.internal (compute1.internal [10.202.2.41])
	by gateway1.messagingengine.com (Postfix) with ESMTP id 72140BEF8A;
	Wed, 11 Nov 2009 06:04:22 -0500 (EST)
Received: from heartbeat1.messagingengine.com ([10.202.2.160])
  by compute1.internal (MEProxy); Wed, 11 Nov 2009 06:04:22 -0500
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=messagingengine.com; h=message-id:date:from:mime-version:to:cc:subject:references:in-reply-to:content-type:content-transfer-encoding; s=smtpout; bh=9Ccni/ClvkEKxuQJtUKCD0HNCiU=; b=uoQKT1mdg2JD0lJK1imOwZB5+Y1RjDtK7uK36fosUVrJbC5aNtQ0lf/SpgLWrt1BWEVWfZWcmAsRGasUPZcymhXYuD2XPCbGZbI3AD2d46g3PpsNJldfYiaz+UYn/TS9upMGvcGrQjxkweAe0HgpS+UJbxmcO6wy3UZq0Vqm9Qs=
X-Sasl-enc: ff1lynSqlUfKGnUONlHzP8Zvi762NPLe1QaHA5kcqFnm 1257937462
Received: from [192.168.1.14] (ALyon-157-1-135-142.w90-42.abo.wanadoo.fr [90.42.30.142])
	by mail.messagingengine.com (Postfix) with ESMTPSA id 702CB4ABE45;
	Wed, 11 Nov 2009 06:04:21 -0500 (EST)
Message-ID: <4AFA9A1A.1040808@ens-lyon.org>
Date: Wed, 11 Nov 2009 12:03:54 +0100
From: Martin Jambon <martin.jambon@ens-lyon.org>
User-Agent: Thunderbird 2.0.0.22 (X11/20090908)
MIME-Version: 1.0
To: Dario Teixeira <darioteixeira@yahoo.com>
Cc: Francois.Pottier@inria.fr, Jeff Shaw <shawjef3@msu.edu>,
	caml-list@inria.fr
Subject: Re: [Caml-list] Re: The lexer hack
References: <415678.66831.qm@web111512.mail.gq1.yahoo.com>
In-Reply-To: <415678.66831.qm@web111512.mail.gq1.yahoo.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam: no; 0.00; ens-lyon:01 lexer:01 parser:01 lexer:01 tokens:01 lexbuf:01 lexbuf:01 token:01 token:01 wrote:01 verb:01 verb:01 caml-list:01 jambon:01 jambon:01 

Dario Teixeira wrote:
> Hi,
> 
>> Interesting. Have you confirmed that this works? I am slightly
>> worried by the fact that an LR parser reads one token ahead,
>> i.e. one token past BEGIN_VERB might already have been read
>> before the enter_verb semantic action is executed. If that is
>> so, then this token would be read while the lexer is still in
>> the wrong mode.
> 
> Yes, I was just thinking about that as well... :-)
> I think I can pile another hack on top of the dummy action:
> dummy tokens to take care of the readahead issue.  Though
> this has the potential to get comically silly pretty quickly!
> 
> I'll report later...

If the lexer to use can be determined by only one token (BEGIN_VERB), I think
you can change the state in the lexer like this:

rule token state = parse
 ""   { match !state with
             `Normal -> normal_token state lexbuf
           | `Verbatim -> verbatim_token state lexbuf
      }

and normal_token state = parse
  ...
| "\\begin{verbatim}"   { state := `Verbatim; BEGIN_VERB }

and verbatim_token state = parse
  ...                  { RAW (...) }
| "\\end{verbatim}"    { state := `Normal; END_VERB }


An even simpler option, if possible in your case, is to use a single token for
the whole verbatim section:

rule token = parse
  ...
| "\\begin{verbatim}"   { finish_verbatim lexbuf }

and finish_verbatim = shortest
  _* as s "\\end{verbatim}"   { RAW s }


Martin

-- 
http://mjambon.com/