From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <skaller@users.sourceforge.net>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39])
	by yquem.inria.fr (Postfix) with ESMTP id 8FEF1BC6E
	for <caml-list@yquem.inria.fr>; Thu,  5 Apr 2007 21:55:07 +0200 (CEST)
Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net [203.16.214.141])
	by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l35Jt4MC012609
	for <caml-list@inria.fr>; Thu, 5 Apr 2007 21:55:06 +0200
X-IronPort-AV: E=Sophos;i="4.14,379,1170595800"; 
   d="scan'208";a="106862734"
Received: from ppp36-111.lns2.syd6.internode.on.net (HELO [192.168.1.201]) ([59.167.36.111])
  by ipmail02.adl2.internode.on.net with ESMTP; 06 Apr 2007 05:25:03 +0930
Subject: Re: [Caml-list] Matching start of input in lexer created with
	ocamllex
From: skaller <skaller@users.sourceforge.net>
To: Janne Hellsten <jjhellst@gmail.com>
Cc: caml-list@inria.fr
In-Reply-To: <700d600f0704050737r3ea45a16gb318ac7acf8e3178@mail.gmail.com>
References: <700d600f0704050737r3ea45a16gb318ac7acf8e3178@mail.gmail.com>
Content-Type: text/plain
Date: Fri, 06 Apr 2007 05:55:01 +1000
Message-Id: <1175802901.5274.14.camel@rosella.wigram>
Mime-Version: 1.0
X-Mailer: Evolution 2.8.1 
Content-Transfer-Encoding: 7bit
X-Miltered: at concorde with ID 46155418.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)!
X-Spam: no; 0.00; lexer:01 ocamllex:01 lexer:01 tokens:01 recursive:01 lexbuf:01 sourceforge:01 wrote:01 token:01 caml-list:01 matched:01 matched:01 tail:01 lex:01 ident:02 

On Thu, 2007-04-05 at 17:37 +0300, Janne Hellsten wrote:
> Hi,
> 
> I'd like to match the beginning of input (or beginning of line) in my
> lexer.  Is there an easy way to do that?
> 
> I have a lexer that looks something like this (simplified):
> 
> rule initial = parse
>   | '!' [' ' '\t']* "for" { FOR (current_loc ()) }
>   | ident as id { IDENT (id, current_loc ()) }
>   | '!' { BANG (current_loc ()) }
> 
> The !for token should only be matched at the beginning of a
> line/input.  However, in the above lexer, there's nothing that
> prevents !for from being matched in the middle of an input string.
> This causes a problem: An input string containing !forbidXyz will be
> lexed FOR, IDENT "bidXyz".  I'd like to lex it as BANG, IDENT
> "forbidXyz".

I do something like this:

let table = ["for", FOR; "while", WHILE]
..
| space-not-newline + { WHITE }
| newline { NEWLINE }
| ident as id { try assoc id table with Not_found -> IDENT id }

An alternative to the WHITE and NEWLINE tokens is a tail
recursive call to the lexer:

| space + { initial lexbuf }

which just skips over the spaces.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net