From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 8FEF1BC6E for ; Thu, 5 Apr 2007 21:55:07 +0200 (CEST) Received: from ipmail02.adl2.internode.on.net (ipmail02.adl2.internode.on.net [203.16.214.141]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l35Jt4MC012609 for ; Thu, 5 Apr 2007 21:55:06 +0200 X-IronPort-AV: E=Sophos;i="4.14,379,1170595800"; d="scan'208";a="106862734" Received: from ppp36-111.lns2.syd6.internode.on.net (HELO [192.168.1.201]) ([59.167.36.111]) by ipmail02.adl2.internode.on.net with ESMTP; 06 Apr 2007 05:25:03 +0930 Subject: Re: [Caml-list] Matching start of input in lexer created with ocamllex From: skaller To: Janne Hellsten Cc: caml-list@inria.fr In-Reply-To: <700d600f0704050737r3ea45a16gb318ac7acf8e3178@mail.gmail.com> References: <700d600f0704050737r3ea45a16gb318ac7acf8e3178@mail.gmail.com> Content-Type: text/plain Date: Fri, 06 Apr 2007 05:55:01 +1000 Message-Id: <1175802901.5274.14.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 46155418.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; lexer:01 ocamllex:01 lexer:01 tokens:01 recursive:01 lexbuf:01 sourceforge:01 wrote:01 token:01 caml-list:01 matched:01 matched:01 tail:01 lex:01 ident:02 On Thu, 2007-04-05 at 17:37 +0300, Janne Hellsten wrote: > Hi, > > I'd like to match the beginning of input (or beginning of line) in my > lexer. Is there an easy way to do that? > > I have a lexer that looks something like this (simplified): > > rule initial = parse > | '!' [' ' '\t']* "for" { FOR (current_loc ()) } > | ident as id { IDENT (id, current_loc ()) } > | '!' { BANG (current_loc ()) } > > The !for token should only be matched at the beginning of a > line/input. However, in the above lexer, there's nothing that > prevents !for from being matched in the middle of an input string. > This causes a problem: An input string containing !forbidXyz will be > lexed FOR, IDENT "bidXyz". I'd like to lex it as BANG, IDENT > "forbidXyz". I do something like this: let table = ["for", FOR; "while", WHILE] .. | space-not-newline + { WHITE } | newline { NEWLINE } | ident as id { try assoc id table with Not_found -> IDENT id } An alternative to the WHITE and NEWLINE tokens is a tail recursive call to the lexer: | space + { initial lexbuf } which just skips over the spaces. -- John Skaller Felix, successor to C++: http://felix.sf.net