caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Lexing.lexeme_start_p broken?
@ 2004-09-17 22:07 Scott Duckworth
  2004-09-20  9:23 ` Jean-Christophe Filliatre
  0 siblings, 1 reply; 5+ messages in thread
From: Scott Duckworth @ 2004-09-17 22:07 UTC (permalink / raw)
  To: caml-list

I can't seem to get the function Lexing.lexeme_start_p to return a 
position with correct information in it.  Here is my code (test.mll):

{ open Lexing }
rule scan = parse
    eof { raise End_of_file }
    | _ as x
        {
            let pos = lexeme_start_p lexbuf in
            Printf.printf "ASCII %d at line %d col %d\n" (int_of_char x) 
pos.pos_lnum pos.pos_bol;
            scan lexbuf
        }
{ try scan (from_channel stdin) with End_of_file -> () }

I do the following, but I always get incorrect output:

[duckwos@chef]$ ocamllex test.mll
3 states, 257 transitions, table size 1046 bytes
[duckwos@chef]$ ocamlc -o test test.ml
[duckwos@chef]$ ./test << EOF
 > position
 > does not
 > change
 > EOF
ASCII 112 at line 1 col 0
ASCII 111 at line 1 col 0
ASCII 115 at line 1 col 0
ASCII 105 at line 1 col 0
ASCII 116 at line 1 col 0
ASCII 105 at line 1 col 0
ASCII 111 at line 1 col 0
ASCII 110 at line 1 col 0
ASCII 10 at line 1 col 0
ASCII 100 at line 1 col 0
ASCII 111 at line 1 col 0
ASCII 101 at line 1 col 0
ASCII 115 at line 1 col 0
ASCII 32 at line 1 col 0
ASCII 110 at line 1 col 0
ASCII 111 at line 1 col 0
ASCII 116 at line 1 col 0
ASCII 10 at line 1 col 0
ASCII 99 at line 1 col 0
ASCII 104 at line 1 col 0
ASCII 97 at line 1 col 0
ASCII 110 at line 1 col 0
ASCII 103 at line 1 col 0
ASCII 101 at line 1 col 0
ASCII 10 at line 1 col 0
[duckwos@chef]$

Any ideas why this is happening?  I get the same results even if I am 
reading from an actual file opened with Lexing.from_file (open_in 
"filename").

One more thing.  How does the pos_fname field in the position type get 
it's value if there is no way for the Lexing module to know the file 
name?  Am I missing something here?

Thanks in advance!

-- 
Scott Duckworth

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Lexing.lexeme_start_p broken?
  2004-09-17 22:07 [Caml-list] Lexing.lexeme_start_p broken? Scott Duckworth
@ 2004-09-20  9:23 ` Jean-Christophe Filliatre
  2004-09-20 14:44   ` skaller
  0 siblings, 1 reply; 5+ messages in thread
From: Jean-Christophe Filliatre @ 2004-09-20  9:23 UTC (permalink / raw)
  To: Scott Duckworth; +Cc: caml-list


Scott Duckworth writes:
 > I can't seem to get the function Lexing.lexeme_start_p to return a 
 > position with correct information in it.  Here is my code (test.mll):

In the Ocaml manual, in the documentation of the Lexing module, you
can read:

 "Note that the lexing engine will only manage the pos_cnum field of
  lex_curr_p by updating it with the number of characters read since
  the start of the lexbuf. For the other fields to be accurate, they
  must be initialised before the first use of the lexbuf, and updated
  by the lexer actions."

(below the "type lexbuf = ..."). To update these fields, the best way
is to look into ocaml sources to see how this is done is ocaml's own
parser. In file parsing/lexer.mll the function update_loc is doing the
job, being called each time a newline character is read. It is quite
complicated, because it handles many different things at the same
time, but to update the fields pos_lnum and pos_bol, it can be
simplified to 

  let update_loc lexbuf =
    let pos = lexbuf.lex_curr_p in
    lexbuf.lex_curr_p <- 
      { pos with pos_lnum = pos.pos_lnum + 1; pos_bol = pos.pos_cnum }

then you call this function for each newline in your lexer actions, e.g.

  | '\n' 
      { newline lexbuf; token lexbuf }

Hope this helps,
-- 
Jean-Christophe Filliâtre (http://www.lri.fr/~filliatr)

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Lexing.lexeme_start_p broken?
  2004-09-20  9:23 ` Jean-Christophe Filliatre
@ 2004-09-20 14:44   ` skaller
  2004-09-21  8:26     ` Damien Doligez
  0 siblings, 1 reply; 5+ messages in thread
From: skaller @ 2004-09-20 14:44 UTC (permalink / raw)
  To: Jean-Christophe Filliatre; +Cc: caml-list

On Mon, 2004-09-20 at 19:23, Jean-Christophe Filliatre wrote:

> simplified to 
> 
>   let update_loc lexbuf =
>     let pos = lexbuf.lex_curr_p in
>     lexbuf.lex_curr_p <- 
>       { pos with pos_lnum = pos.pos_lnum + 1; pos_bol = pos.pos_cnum }
> 
> then you call this function for each newline in your lexer actions, e.g.
> 
>   | '\n' 
>       { newline lexbuf; token lexbuf }
> 
> Hope this helps,

How does that help, if the tokeniser isn't using the lexbuf?
Here's my parser:

let parse_tokens (parser:'a parser_t) (tokens: Flx_parse.token list) = 
  let toker = (new tokeniser tokens) in
  try 
    parser (toker#token_src) (Lexing.from_string "dummy" )
  with _ ->
    toker#report_syntax_error;
    raise (Flx_exceptions.ParseError "Parsing Tokens")

The token supplying function never looks at the lexbuf.
The parser does, to report errors, so I have to trash
the parser exceptions, since the locations are wrong.

-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Lexing.lexeme_start_p broken?
  2004-09-20 14:44   ` skaller
@ 2004-09-21  8:26     ` Damien Doligez
  2004-09-21  9:25       ` skaller
  0 siblings, 1 reply; 5+ messages in thread
From: Damien Doligez @ 2004-09-21  8:26 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

On Sep 20, 2004, at 16:44, skaller wrote:

> How does that help, if the tokeniser isn't using the lexbuf?
> Here's my parser:
>
> let parse_tokens (parser:'a parser_t) (tokens: Flx_parse.token list) =
>   let toker = (new tokeniser tokens) in
>   try
>     parser (toker#token_src) (Lexing.from_string "dummy" )
>   with _ ->
>     toker#report_syntax_error;
>     raise (Flx_exceptions.ParseError "Parsing Tokens")
>
> The token supplying function never looks at the lexbuf.
> The parser does, to report errors, so I have to trash
> the parser exceptions, since the locations are wrong.

The token supplying function is supposed to _update_ the lexbuf, if
you want the parser to report the correct locations.  ocamllex does
some of the work by updating the char count, the rest is up to the
lexer itself.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Lexing.lexeme_start_p broken?
  2004-09-21  8:26     ` Damien Doligez
@ 2004-09-21  9:25       ` skaller
  0 siblings, 0 replies; 5+ messages in thread
From: skaller @ 2004-09-21  9:25 UTC (permalink / raw)
  To: Damien Doligez; +Cc: caml-list

On Tue, 2004-09-21 at 18:26, Damien Doligez wrote:
> On Sep 20, 2004, at 16:44, skaller wrote:

> The token supplying function is supposed to _update_ the lexbuf, if
> you want the parser to report the correct locations.  ocamllex does
> some of the work by updating the char count, 

Not in my case it doesn't. 
The lexing function isn't an ocamllex lexer.
So I'd have to update the char count too.

I actually am using ocamllex, but I drive it manually
to collect a token list, then feed the list to the parser.

Hmmm.. OK, how is this for an idea:

Suppose we add to the lexbuf a mutable field of type:

	lexbuf -> loc

which returns the location information the parser needs
given a lexbuf. The parser then fetches the location
information by calling this function on the lexbuf
from which it was obtained.

I can then provide a function which accepts my own
state object and curry it. This way, I don't have
to keep updating the lexbuf, and the parser cannot
see the lexbuf details. My routine might be
expensive -- but it only gets called once when
there is a parse error, not every token.

Whilst I don't think this is a perfect solution,
it does seem to partially decouple the parser from
the lexbuf by at least abstracting it using a function.

Would this interfere with the Ocaml bootstrap?

*** a better solution might be to pass this function
directly the the parser, thereby decoupling it
entirely from the lexer. However that changes the
type of parser functions. That can easily be fixed
though -- just make a compatibility wrapper which 
calls the full parser function, passing a default
function value.

If there was any interest, I could probably provide
a design which provided proper decoupling, whilst
retaining compatibility using wrappers and defaults.
[But I imagine it could also be done easily by someone
on the Ocaml team -- and throw in a user state object
at the same time please, as has been done for the lexer :]

The parser does need to get tokens, and it may need
location information, but it should not
depend on an object whose principle purpose is 
to support lexing.

In theory this is also true for generated lexers:
they shouldn't depend on any lexbufs. However for
performance reasons, abstracting a character source
probably isn't tolerable.


-- 
John Skaller, mailto:skaller@users.sf.net
voice: 061-2-9660-0850, 
snail: PO BOX 401 Glebe NSW 2037 Australia
Checkout the Felix programming language http://felix.sf.net



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-09-21  9:25 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-17 22:07 [Caml-list] Lexing.lexeme_start_p broken? Scott Duckworth
2004-09-20  9:23 ` Jean-Christophe Filliatre
2004-09-20 14:44   ` skaller
2004-09-21  8:26     ` Damien Doligez
2004-09-21  9:25       ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).