From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by yquem.inria.fr (Postfix) with ESMTP id 670A0BB81 for ; Sun, 23 Oct 2005 20:02:38 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j9NI2blU032522 for ; Sun, 23 Oct 2005 20:02:37 +0200 Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id UAA10337 for ; Sun, 23 Oct 2005 20:02:35 +0200 (MET DST) Received: from mz2.forethought.net (mzpi4.forethought.net [216.241.36.13]) by nez-perce.inria.fr (8.13.0/8.13.0) with ESMTP id j9NI2Xo2032519 (version=TLSv1/SSLv3 cipher=EDH-RSA-DES-CBC3-SHA bits=168 verify=NO) for ; Sun, 23 Oct 2005 20:02:35 +0200 Received: from [216.241.35.41] (helo=[10.0.0.2]) by mz2.forethought.net with esmtp (Exim 4.51) id 1ETkAx-0005ts-DC for caml-list@inria.fr; Sun, 23 Oct 2005 12:02:31 -0600 Message-ID: <435BD045.1000305@gushee.net> Date: Sun, 23 Oct 2005 12:02:45 -0600 From: Matt Gushee User-Agent: Mozilla Thunderbird 1.0.7 (X11/20051002) X-Accept-Language: en-us, en MIME-Version: 1.0 To: caml-list@inria.fr Subject: Ocamllex question Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Miltered: at nez-perce with ID 435BD03D.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at nez-perce with ID 435BD039.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocamllex:01 lexer:01 lexeme:01 lexbuf:01 lexbuf:01 lexing:01 lexeme:01 syntax:01 separators:01 buffer:01 parser:01 emit:01 dict:01 dict:01 declaration:02 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 Hello, people-- In a lexer definition with two or more entry points, is there a way to emit a lexeme and pass control to another entrypoint in one action? The specific problem I am trying to deal with is a configuration file format that includes comments denoted with an initial '#' character. I would like to support the typical usage of '#', where a comment may begin either at the beginning of the line, or after a declaration that I want to capture, and in either case it extends to the end of the line. So in general, anything after '#' up to the end of a line should be ignored, which I think requires a separate 'comment' entrypoint. At the end of the line, control returns to the main entry point. So my first cut looks like this: rule dict = parse [' '] { dict lexbuf } | '#' { comment lexbuf } | word { WORD (Lexing.lexeme lexbuf) } | ':' { COLON } | '{' { DS } | '}' { DE } | ',' | '\n' { SEP } | eof { EOF } and comment = parse [ ^ '\n' ] { comment lexbuf } | '\n' { dict lexbuf } So far so good. BUT, for the sake of simplicity (for users, not for me ;-)), my syntax has line endings as separators, and in order to support comments following non-comments on the same line, a line ending after a comment should be interpreted as a separator. So what I want to do is something like: and comment = parse [ ^ '\n' ] { comment lexbuf } | '\n' { SEP; dict lexbuf } But that doesn't work, of course. Maybe the solution is to push SEP back onto the head of the buffer, but I don't see a way to do that. Or would it be better to simply tag the comment text with, say, a COMMENT symbol and pass it through to the parser? -- Matt Gushee Englewood, CO, USA