From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <weis@pauillac.inria.fr>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78])
	by yquem.inria.fr (Postfix) with ESMTP id 670A0BB81
	for <caml-list@yquem.inria.fr>; Sun, 23 Oct 2005 20:02:38 +0200 (CEST)
Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j9NI2blU032522
	for <caml-list@yquem.inria.fr>; Sun, 23 Oct 2005 20:02:37 +0200
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA10337 for <caml-list@pauillac.inria.fr>; Sun, 23 Oct 2005 20:02:35 +0200 (MET DST)
Received: from mz2.forethought.net (mzpi4.forethought.net [216.241.36.13])
	by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j9NI2Xo2032519
	(version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO)
	for <caml-list@inria.fr>; Sun, 23 Oct 2005 20:02:35 +0200
Received: from [216.241.35.41] (helo=[10.0.0.2])
	by mz2.forethought.net with esmtp (Exim 4.51)
	id 1ETkAx-0005ts-DC
	for caml-list@inria.fr; Sun, 23 Oct 2005 12:02:31 -0600
Message-ID: <435BD045.1000305@gushee.net>
Date: Sun, 23 Oct 2005 12:02:45 -0600
From: Matt Gushee <matt@gushee.net>
User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051002)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: caml-list@inria.fr
Subject: Ocamllex question
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Miltered: at nez-perce with ID 435BD03D.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Miltered: at nez-perce with ID 435BD039.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)!
X-Spam: no; 0.00; ocamllex:01 lexer:01 lexeme:01 lexbuf:01 lexbuf:01 lexing:01 lexeme:01 syntax:01 separators:01 buffer:01 parser:01 emit:01 dict:01 dict:01 declaration:02 
X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled 
	version=3.0.3

Hello, people--

In a lexer definition with two or more entry points, is there a way to
emit a lexeme and pass control to another entrypoint in one action?

The specific problem I am trying to deal with is a configuration file
format that includes comments denoted with an initial '#' character. I
would like to support the typical usage of '#', where a comment may
begin either at the beginning of the line, or after a declaration that I
want to capture, and in either case it extends to the end of the line.

So in general, anything after '#' up to the end of a line should be
ignored, which I think requires a separate 'comment' entrypoint. At the
end of the line, control returns to the main entry point. So my first
cut looks like this:

  rule dict = parse
      [' ']                             { dict lexbuf }
    | '#'                               { comment lexbuf }
    | word                              { WORD (Lexing.lexeme lexbuf) }
    | ':'                               { COLON }
    | '{'                               { DS }
    | '}'                               { DE }
    | ',' | '\n'                        { SEP }
    | eof                               { EOF }
  and comment = parse
      [ ^ '\n' ]                        { comment lexbuf }
    | '\n'                              { dict lexbuf }

So far so good. BUT, for the sake of simplicity (for users, not for me
;-)), my syntax has line endings as separators, and in order to support
comments following non-comments on the same line, a line ending after a
comment should be interpreted as a separator. So what I want to do is
something like:

  and comment = parse
      [ ^ '\n' ]                        { comment lexbuf }
    | '\n'                              { SEP; dict lexbuf }

But that doesn't work, of course. Maybe the solution is to push SEP back
onto the head of the buffer, but I don't see a way to do that.

Or would it be better to simply tag the comment text with, say, a
COMMENT symbol and pass it through to the parser?

--
Matt Gushee
Englewood, CO, USA