From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id QAA20340 for caml-red; Fri, 2 Feb 2001 16:23:30 +0100 (MET) Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id PAA00060 for ; Thu, 1 Feb 2001 15:22:16 +0100 (MET) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.11.1/8.10.0) with ESMTP id f11EM7n08847; Thu, 1 Feb 2001 15:22:07 +0100 (MET) Received: (from xleroy@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id PAA20033; Thu, 1 Feb 2001 15:22:06 +0100 (MET) Date: Thu, 1 Feb 2001 15:22:06 +0100 From: Xavier Leroy To: Laurent Reveillere Cc: caml-list@inria.fr Subject: Re: ocamlyacc/ocamllex problems Message-ID: <20010201152206.A30653@pauillac.inria.fr> References: <3A784863.78765C0@labri.u-bordeaux.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0i In-Reply-To: <3A784863.78765C0@labri.u-bordeaux.fr>; from Laurent.Reveillere@labri.u-bordeaux.fr on Wed, Jan 31, 2001 at 06:16:19PM +0100 Sender: weis@pauillac.inria.fr > I am writing a parser that uses Parsing.rhs_start and Parsing.rhs_end in > a rule. The problem is the following, > > 1) If I use a simple rule in the lexer that matches a token all is fine. > ex: > | "'" ['0' '1' '*' '.']+ "'" { ... } > > 2) If I use an automata in the lexer for matching the same token, the > results of Parsing.rhs_start and Parsing.rhs_end are wrong. > ex: > | "'" { ... bits lexbuf ... } > and bits = parse > | '\'' { ... } > | ['0' '1' '.' '*' ] { ... } > | eof { ... } > | _ { ... } > > I am not sure to undertand the reasons of my problem? For terminal symbols (tokens), the locations returned by Parsing.rhs_start and Parsing.rhs_end are those returned by Lexing.lexeme_start and Lexing.lexeme_end. However, these two functions track the location of the *last* regular expression matched by the ocamllex-generated automaton. (This location is stored and updated in place in the "lexbuf" argument.) So, if your lexing rule recursively calls other lexing rules (as in case 2 above), the locations reported correspond to the part of the token that was last matched by a regular expression (i.e. the last "bit" of the token in your example 2). To get correct locations in example 2, a bit of "lexbuf" hacking is required to restore the start location to what it was when the first regexp was matched: | "'" { let start = Lexing.lexeme_start lexbuf in let res = ... bits lexbuf ... in lexbuf.Lexing.lex_start_pos <- start - lexbuf.Lexing.lex_abs_pos; res } and bits = parse ... Hope this helps, - Xavier Leroy