From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dra-news@metastack.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38])
	by yquem.inria.fr (Postfix) with ESMTP id 03A9EBC0A
	for <caml-list@yquem.inria.fr>; Fri,  6 Apr 2007 09:40:54 +0200 (CEST)
Received: from orion.metastack.com (no-dns-yet.demon.co.uk [80.177.38.218] (may be forged))
	by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l367eqR1025291
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL)
	for <caml-list@yquem.inria.fr>; Fri, 6 Apr 2007 09:40:53 +0200
Received: from treble (cpc2-cmbg6-0-0-cust535.cmbg.cable.ntl.com [81.107.34.24])
	(authenticated bits=0)
	by orion.metastack.com (8.13.4/8.13.3) with ESMTP id l366lso4002853
	(version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NO)
	for <caml-list@yquem.inria.fr>; Fri, 6 Apr 2007 07:47:55 +0100
From: "David Allsopp" <dra-news@metastack.com>
To: <caml-list@yquem.inria.fr>
References: <20070405205804.90509BC76@yquem.inria.fr>
Subject: Re: [Caml-list] Matching start of input in lexer created with ocamllex
Date: Fri, 6 Apr 2007 08:40:51 +0100
Organization: MetaStack Solutions Ltd.
Message-ID: <011c01c7781e$e74a80c0$6a7ba8c0@treble>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3028
Thread-Index: Acd3vfgzUDTu/k4nRsuLUUQPZQW4ygAXr+CQ
In-Reply-To: <20070405205804.90509BC76@yquem.inria.fr>
X-Miltered: at discorde with ID 4615F985.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; lexer:01 ocamllex:01 lexer:01 ocamllex:01 simulate:01 bool:01 lexers:01 reuse:01 lexers:01 token:01 caml-list:01 matched:01 matched:01 lex:01 ident:02 

> Hi,
>
> I'd like to match the beginning of input (or beginning of line) in my
> lexer.  Is there an easy way to do that?
>
> I have a lexer that looks something like this (simplified):
>
> rule initial = parse
>   | '!' [' ' '\t']* "for" { FOR (current_loc ()) }
>   | ident as id { IDENT (id, current_loc ()) }
>   | '!' { BANG (current_loc ()) }
>
> The !for token should only be matched at the beginning of a
> line/input.  However, in the above lexer, there's nothing that
> prevents !for from being matched in the middle of an input string.
> This causes a problem: An input string containing !forbidXyz will be
> lexed FOR, IDENT "bidXyz".  I'd like to lex it as BANG, IDENT
> "forbidXyz".

Ocamllex doesn't have a notion for beginning of line. Three possible
solutions:

1. You can simulate it with a bool ref parameter to your lexer that gets set
to true by each rule to indicate that you're no longer at the beginning of a
line - the "!for" rule than raises Failure if it matches when this ref is
true. Slightly tedious for code maintenance... 
2. You use two lexers - one with the "!for" rule and one without and call
one lexer from the other (not very nice, because ocamllex doesn't support
code reuse between lexers so you'll be duplicating a lot of code).
3. Pre-process the input to ocamllex to include a special character that
cannot appear in your text and place that at the beginning of the line (e.g.
one of the control characters in 0..31).

HTH,


David