caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] updating position in an ocamllex lexer
@ 2015-07-28 15:56 Sébastien Hinderer
  2015-07-30  8:21 ` Leo White
  0 siblings, 1 reply; 3+ messages in thread
From: Sébastien Hinderer @ 2015-07-28 15:56 UTC (permalink / raw)
  To: caml-list

Dear all,

I have to maintain a lexer which did not, so far, compute token
positions during the lexing step. The positions were computed
later, which was not very eficient. So I am now trying to compute
positions during the lexing and have a problem to figure out what
to do for one token:

  | ['\n'] [' ' '\t' '\r' '\011' '\012' ]* { ... }

When this token is met and contains just "\n", calling new_line on the
lexbuf is enough to keep further position information accurate.

However, if the recognized token is, say, "\n\t", then it seems that the
column for further tokens will be incorrect.

I assumed that I had to manually update the current position and added
code like so:

  let s = Lexing.lexeme lexbuf in
  let l = String.length s in
  let t = TCommentNewline (tokinfo lexbuf) in
  (* Adjust the column manually *)
  Lexing.new_line lexbuf;
  let lcp = lexbuf.lex_curr_p in
  lexbuf.lex_curr_p <- { lcp with
    pos_cnum = lcp.pos_bol + l - 1;
  };
  t

But that does not seem to work.

Does somebody know how such tokens should be handled, please?

Many thanks in advance for any help,

Sébastien.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] updating position in an ocamllex lexer
  2015-07-28 15:56 [Caml-list] updating position in an ocamllex lexer Sébastien Hinderer
@ 2015-07-30  8:21 ` Leo White
  2015-07-30  8:56   ` Sébastien Hinderer
  0 siblings, 1 reply; 3+ messages in thread
From: Leo White @ 2015-07-30  8:21 UTC (permalink / raw)
  To: caml-list

I think the problem with your attempt to manually update the current
position is that you are adjusting pos_cnum, which is the number of
characters since the beginning of the whole file, not the number of
characters since the beginning of the line. Instead you should be
adjusting pos_bol. For example, if you look at the source for
Lexing.new_line:

  let new_line lexbuf =
    let lcp = lexbuf.lex_curr_p in
    lexbuf.lex_curr_p <- { lcp with
      pos_lnum = lcp.pos_lnum + 1;
      pos_bol = lcp.pos_cnum;
    }

you can see that it adds one to the line number and sets pos_bol
to be the current position.

Regards,

Leo

On Tue, 28 Jul 2015, at 11:56 AM, Sébastien Hinderer wrote:
> Dear all,
> 
> I have to maintain a lexer which did not, so far, compute token
> positions during the lexing step. The positions were computed
> later, which was not very eficient. So I am now trying to compute
> positions during the lexing and have a problem to figure out what
> to do for one token:
> 
>   | ['\n'] [' ' '\t' '\r' '\011' '\012' ]* { ... }
> 
> When this token is met and contains just "\n", calling new_line on the
> lexbuf is enough to keep further position information accurate.
> 
> However, if the recognized token is, say, "\n\t", then it seems that the
> column for further tokens will be incorrect.
> 
> I assumed that I had to manually update the current position and added
> code like so:
> 
>   let s = Lexing.lexeme lexbuf in
>   let l = String.length s in
>   let t = TCommentNewline (tokinfo lexbuf) in
>   (* Adjust the column manually *)
>   Lexing.new_line lexbuf;
>   let lcp = lexbuf.lex_curr_p in
>   lexbuf.lex_curr_p <- { lcp with
>     pos_cnum = lcp.pos_bol + l - 1;
>   };
>   t
> 
> But that does not seem to work.
> 
> Does somebody know how such tokens should be handled, please?
> 
> Many thanks in advance for any help,
> 
> Sébastien.
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] updating position in an ocamllex lexer
  2015-07-30  8:21 ` Leo White
@ 2015-07-30  8:56   ` Sébastien Hinderer
  0 siblings, 0 replies; 3+ messages in thread
From: Sébastien Hinderer @ 2015-07-30  8:56 UTC (permalink / raw)
  To: caml-list

Dear Leo,

Many thanks for your response.

You are abslutely right. pos_cnum was precisely _the_ field not
to modify. By examining the positions more closely I realised what I
had to do (adjusting pos_boland pos_lnum) and now things work perfectly
well.

Thanks again,

Sébastien.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-07-30  8:51 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-28 15:56 [Caml-list] updating position in an ocamllex lexer Sébastien Hinderer
2015-07-30  8:21 ` Leo White
2015-07-30  8:56   ` Sébastien Hinderer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).