caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Ocamllex question
@ 2005-10-23 18:02 Matt Gushee
  2005-10-23 20:58 ` [Caml-list] " Michael Wohlwend
  2005-10-23 21:12 ` Another problem (was Re: [Caml-list] Ocamllex question) Matt Gushee
  0 siblings, 2 replies; 8+ messages in thread
From: Matt Gushee @ 2005-10-23 18:02 UTC (permalink / raw)
  To: caml-list

Hello, people--

In a lexer definition with two or more entry points, is there a way to
emit a lexeme and pass control to another entrypoint in one action?

The specific problem I am trying to deal with is a configuration file
format that includes comments denoted with an initial '#' character. I
would like to support the typical usage of '#', where a comment may
begin either at the beginning of the line, or after a declaration that I
want to capture, and in either case it extends to the end of the line.

So in general, anything after '#' up to the end of a line should be
ignored, which I think requires a separate 'comment' entrypoint. At the
end of the line, control returns to the main entry point. So my first
cut looks like this:

  rule dict = parse
      [' ']                             { dict lexbuf }
    | '#'                               { comment lexbuf }
    | word                              { WORD (Lexing.lexeme lexbuf) }
    | ':'                               { COLON }
    | '{'                               { DS }
    | '}'                               { DE }
    | ',' | '\n'                        { SEP }
    | eof                               { EOF }
  and comment = parse
      [ ^ '\n' ]                        { comment lexbuf }
    | '\n'                              { dict lexbuf }

So far so good. BUT, for the sake of simplicity (for users, not for me
;-)), my syntax has line endings as separators, and in order to support
comments following non-comments on the same line, a line ending after a
comment should be interpreted as a separator. So what I want to do is
something like:

  and comment = parse
      [ ^ '\n' ]                        { comment lexbuf }
    | '\n'                              { SEP; dict lexbuf }

But that doesn't work, of course. Maybe the solution is to push SEP back
onto the head of the buffer, but I don't see a way to do that.

Or would it be better to simply tag the comment text with, say, a
COMMENT symbol and pass it through to the parser?

--
Matt Gushee
Englewood, CO, USA


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Ocamllex question
  2005-10-23 18:02 Ocamllex question Matt Gushee
@ 2005-10-23 20:58 ` Michael Wohlwend
  2005-10-23 21:12 ` Another problem (was Re: [Caml-list] Ocamllex question) Matt Gushee
  1 sibling, 0 replies; 8+ messages in thread
From: Michael Wohlwend @ 2005-10-23 20:58 UTC (permalink / raw)
  To: caml-list

Hi,

On Sunday 23 October 2005 20:02, Matt Gushee wrote:
>
>   and comment = parse
>       [ ^ '\n' ]                        { comment lexbuf }
>
>     | '\n'                              { SEP; dict lexbuf }

I'm not quite sure whether I understood your problem, but... doesn' it help to 
just return a SEP after a comment:

rule dict = parse
 ...
 | ',' | '\n'   { SEP }
 | '#' [^ '\n']* '\n'  { SEP }
 ...


 Michael




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Another problem (was Re: [Caml-list] Ocamllex question)
  2005-10-23 18:02 Ocamllex question Matt Gushee
  2005-10-23 20:58 ` [Caml-list] " Michael Wohlwend
@ 2005-10-23 21:12 ` Matt Gushee
  2005-10-23 21:37   ` Michael Wohlwend
  1 sibling, 1 reply; 8+ messages in thread
From: Matt Gushee @ 2005-10-23 21:12 UTC (permalink / raw)
  To: caml-list

While we're on the subject of Ocamllex, there's another issue I'm
wondering about. My lexer needs to handle quoted strings, something like:

  | '"' [^ '"'] * as word '"'     { WORD word }

Not essential, but it would be nice to allow escaped quotes within such
strings:

  "The quick brown fox jumped over the \"lazy\" dog."

I've haven't actually tried to implement this yet, but thinking about it
it seems like it would make the lexer hugely more complex. Can anyone
suggest a reasonably simple way to deal with escape sequences?

--
Matt Gushee
Englewood, CO, USA


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Another problem (was Re: [Caml-list] Ocamllex question)
  2005-10-23 21:12 ` Another problem (was Re: [Caml-list] Ocamllex question) Matt Gushee
@ 2005-10-23 21:37   ` Michael Wohlwend
  2005-10-24 19:50     ` Matt Gushee
  0 siblings, 1 reply; 8+ messages in thread
From: Michael Wohlwend @ 2005-10-23 21:37 UTC (permalink / raw)
  To: caml-list

On Sunday 23 October 2005 23:12, Matt Gushee wrote:
> While we're on the subject of Ocamllex, there's another issue I'm
>
> wondering about. My lexer needs to handle quoted strings, something like:
>   | '"' [^ '"'] * as word '"'     { WORD word }
>
> Not essential, but it would be nice to allow escaped quotes within such
> strings:
>
>   "The quick brown fox jumped over the \"lazy\" dog."
>
> I've haven't actually tried to implement this yet, but thinking about it
> it seems like it would make the lexer hugely more complex. Can anyone
> suggest a reasonably simple way to deal with escape sequences?

you write a extra lexer-rule which gets called when a " is seen; in this rule 
you read single chars and append them to a string buffer. If you see a 
backslash you handle the next character special; example:

let char_for_backslash = function   
    | 'a' -> '\007'
    | 'v' -> '\011'
    | 'f' -> '\012'
    | 'n' -> '\n'
    | 't' -> '\t'
    | 'b' -> '\b'
    | 'r' -> '\r'
    | c   -> c

let bs_escapes = [ '\032' - '\255' ]

let string_buff = Buffer.create 256
let reset_string_buffer () = Buffer.clear string_buff  
let store_string_char c = Buffer.add_char string_buff c
let store_string s = Buffer.add_string string_buff s
let get_stored_string () = Buffer.contents string_buff

rule dict = parser
  | ...
   | [ '"'  ] as d 
        {
         reset_string_buffer(); 
         scan_str  lexbuf;
         let s = get_stored_string() in
         (* Printf.printf " String (%c) read:%s " d s; *)
         STRING(s) 
        }

and scan_str  = parse 
    | [ '"' ]  { () }
    | '\\' (bs_escapes as c)  
        { store_string_char (char_for_backslash c); scan_str lexbuf }
    | allowed_string_char as c {store_string_char c }
    | eof  { raise( Lexical_error("unterminated string") ) }


works well,

 Michael


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Another problem (was Re: [Caml-list] Ocamllex question)
  2005-10-23 21:37   ` Michael Wohlwend
@ 2005-10-24 19:50     ` Matt Gushee
  2005-10-24 20:18       ` Michael Wohlwend
  0 siblings, 1 reply; 8+ messages in thread
From: Matt Gushee @ 2005-10-24 19:50 UTC (permalink / raw)
  To: caml-list

Michael Wohlwend wrote:

>>Not essential, but it would be nice to allow escaped quotes within such
>>strings:

> you write a extra lexer-rule which gets called when a " is seen; in this rule 
> you read single chars and append them to a string buffer. If you see a 
> backslash you handle the next character special; example:
> ....

Thank you! That was just the sort of example I needed ... and I
understand ocamllex a little better now.

Maybe you should post something like this on the Web. There's not much
in the way of practical examples for ocamllex.

--
Matt Gushee
Englewood, CO, USA


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Another problem (was Re: [Caml-list] Ocamllex question)
  2005-10-24 19:50     ` Matt Gushee
@ 2005-10-24 20:18       ` Michael Wohlwend
  0 siblings, 0 replies; 8+ messages in thread
From: Michael Wohlwend @ 2005-10-24 20:18 UTC (permalink / raw)
  To: caml-list

On Monday 24 October 2005 21:50, Matt Gushee wrote:
> Michael Wohlwend wrote:
> Maybe you should post something like this on the Web. There's not much
> in the way of practical examples for ocamllex.

in some way this exists allready :-) I didn't invent all of it myself.  I 
looked  how the ocaml lexer (lexer.mll in dir lex of the source-tree) does 
this and it was quite easy to understand the relevant parts.


cheers
 Michael


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] ocamllex question
  2009-03-10 22:44 ocamllex question Robert Muller
@ 2009-03-11  0:43 ` Martin Jambon
  0 siblings, 0 replies; 8+ messages in thread
From: Martin Jambon @ 2009-03-11  0:43 UTC (permalink / raw)
  To: Robert Muller; +Cc: O'Caml Mailing List

Robert Muller wrote:
> I am attempting to use ocamllex together with ocamlyacc to parse a
> subset of python. Python uses indentation to denote
> statement blocks so a lexer is sometimes required to return a sequence
> of tokens without advancing the input pointer. In
> particular, a lexer for python should return a sequence of so-called
> DEDENT tokens when indented fragments
> end. E.g.,
> 
> def f(x):
>     statement1;
>     statement2;
>         statement3;
>         statement4;
> A
> 
> the lexer should return two consecutive DEDENT tokens between the '\n'
> at the end of statement4 and the token for A.

What I would do is:

1. pass an argument to each "rule" function, containing the stack of
indentation information (current block and parent blocks, with first line
number and indentation).

2. let each rule produce as many tokens as necessary and return lists of tokens

3. create a token stream for ocamlyacc/menhir that would call the
ocamllex-generated functions as needed; these would put the tokens into a
queue. Refill when the queue is empty.

4. Figure how make good error reports :-)



Martin


> Looking at the documentation and examples, it isn't clear how to
> convince the generated lexer to not advance the input pointer
> so that two consecutive DEDENT tokens can be returned before the token
> for A is returned.
> 
> Any ocamllex perts out there?
> 
> Thanks,
> Bob Muller
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] ocamllex question
  2005-09-21 18:34 skaller
@ 2005-09-22  6:47 ` Alex Baretta
  0 siblings, 0 replies; 8+ messages in thread
From: Alex Baretta @ 2005-09-22  6:47 UTC (permalink / raw)
  To: skaller; +Cc: ocaml

skaller wrote:
> Can eof be read from a lexbuf more than once by an ocamllex lexer?
> In particular is a recursive lexer matches an eof and
> returns to its caller, can the parent caller still read
> another eof?
> 
> In other words, is the character stream postpended by one eof
> or an infinite stream of them?

"eof" in ocamllex is a "condition" not a token. It's like "\b" in emacs,
which matches the empty string but only at the beginning or end of a
word. In ocamllex "eof" matches the empty string at the end of a lexbuf,
thus matching eof is non-destructive lexbuf-wise and can repeated any
number of times.

Alex

-- 
*********************************************************************
http://www.barettadeit.com/
Baretta DE&IT
A division of Baretta SRL

tel. +39 02 370 111 55
fax. +39 02 370 111 54

Our technology:

The Application System/Xcaml (AS/Xcaml)
<http://www.asxcaml.org/>

The FreerP Project
<http://www.freerp.org/>


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-03-11  0:46 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-23 18:02 Ocamllex question Matt Gushee
2005-10-23 20:58 ` [Caml-list] " Michael Wohlwend
2005-10-23 21:12 ` Another problem (was Re: [Caml-list] Ocamllex question) Matt Gushee
2005-10-23 21:37   ` Michael Wohlwend
2005-10-24 19:50     ` Matt Gushee
2005-10-24 20:18       ` Michael Wohlwend
  -- strict thread matches above, loose matches on Subject: below --
2009-03-10 22:44 ocamllex question Robert Muller
2009-03-11  0:43 ` [Caml-list] " Martin Jambon
2005-09-21 18:34 skaller
2005-09-22  6:47 ` [Caml-list] " Alex Baretta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).