caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* ocamllex question
@ 2009-03-10 22:44 Robert Muller
  2009-03-11  0:43 ` [Caml-list] " Martin Jambon
  0 siblings, 1 reply; 4+ messages in thread
From: Robert Muller @ 2009-03-10 22:44 UTC (permalink / raw)
  To: O'Caml Mailing List

I am attempting to use ocamllex together with ocamlyacc to parse a  
subset of python. Python uses indentation to denote
statement blocks so a lexer is sometimes required to return a sequence  
of tokens without advancing the input pointer. In
particular, a lexer for python should return a sequence of so-called  
DEDENT tokens when indented fragments
end. E.g.,

def f(x):
	statement1;
	statement2;
		statement3;
		statement4;
A

the lexer should return two consecutive DEDENT tokens between the '\n'  
at the end of statement4 and the token for A.

Looking at the documentation and examples, it isn't clear how to  
convince the generated lexer to not advance the input pointer
so that two consecutive DEDENT tokens can be returned before the token  
for A is returned.

Any ocamllex perts out there?

Thanks,
Bob Muller


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] ocamllex question
  2009-03-10 22:44 ocamllex question Robert Muller
@ 2009-03-11  0:43 ` Martin Jambon
  0 siblings, 0 replies; 4+ messages in thread
From: Martin Jambon @ 2009-03-11  0:43 UTC (permalink / raw)
  To: Robert Muller; +Cc: O'Caml Mailing List

Robert Muller wrote:
> I am attempting to use ocamllex together with ocamlyacc to parse a
> subset of python. Python uses indentation to denote
> statement blocks so a lexer is sometimes required to return a sequence
> of tokens without advancing the input pointer. In
> particular, a lexer for python should return a sequence of so-called
> DEDENT tokens when indented fragments
> end. E.g.,
> 
> def f(x):
>     statement1;
>     statement2;
>         statement3;
>         statement4;
> A
> 
> the lexer should return two consecutive DEDENT tokens between the '\n'
> at the end of statement4 and the token for A.

What I would do is:

1. pass an argument to each "rule" function, containing the stack of
indentation information (current block and parent blocks, with first line
number and indentation).

2. let each rule produce as many tokens as necessary and return lists of tokens

3. create a token stream for ocamlyacc/menhir that would call the
ocamllex-generated functions as needed; these would put the tokens into a
queue. Refill when the queue is empty.

4. Figure how make good error reports :-)



Martin


> Looking at the documentation and examples, it isn't clear how to
> convince the generated lexer to not advance the input pointer
> so that two consecutive DEDENT tokens can be returned before the token
> for A is returned.
> 
> Any ocamllex perts out there?
> 
> Thanks,
> Bob Muller
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


-- 
http://mjambon.com/


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Ocamllex question
  2005-10-23 18:02 Ocamllex question Matt Gushee
@ 2005-10-23 20:58 ` Michael Wohlwend
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Wohlwend @ 2005-10-23 20:58 UTC (permalink / raw)
  To: caml-list

Hi,

On Sunday 23 October 2005 20:02, Matt Gushee wrote:
>
>   and comment = parse
>       [ ^ '\n' ]                        { comment lexbuf }
>
>     | '\n'                              { SEP; dict lexbuf }

I'm not quite sure whether I understood your problem, but... doesn' it help to 
just return a SEP after a comment:

rule dict = parse
 ...
 | ',' | '\n'   { SEP }
 | '#' [^ '\n']* '\n'  { SEP }
 ...


 Michael




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] ocamllex question
  2005-09-21 18:34 ocamllex question skaller
@ 2005-09-22  6:47 ` Alex Baretta
  0 siblings, 0 replies; 4+ messages in thread
From: Alex Baretta @ 2005-09-22  6:47 UTC (permalink / raw)
  To: skaller; +Cc: ocaml

skaller wrote:
> Can eof be read from a lexbuf more than once by an ocamllex lexer?
> In particular is a recursive lexer matches an eof and
> returns to its caller, can the parent caller still read
> another eof?
> 
> In other words, is the character stream postpended by one eof
> or an infinite stream of them?

"eof" in ocamllex is a "condition" not a token. It's like "\b" in emacs,
which matches the empty string but only at the beginning or end of a
word. In ocamllex "eof" matches the empty string at the end of a lexbuf,
thus matching eof is non-destructive lexbuf-wise and can repeated any
number of times.

Alex

-- 
*********************************************************************
http://www.barettadeit.com/
Baretta DE&IT
A division of Baretta SRL

tel. +39 02 370 111 55
fax. +39 02 370 111 54

Our technology:

The Application System/Xcaml (AS/Xcaml)
<http://www.asxcaml.org/>

The FreerP Project
<http://www.freerp.org/>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-03-11  0:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-10 22:44 ocamllex question Robert Muller
2009-03-11  0:43 ` [Caml-list] " Martin Jambon
  -- strict thread matches above, loose matches on Subject: below --
2005-10-23 18:02 Ocamllex question Matt Gushee
2005-10-23 20:58 ` [Caml-list] " Michael Wohlwend
2005-09-21 18:34 ocamllex question skaller
2005-09-22  6:47 ` [Caml-list] " Alex Baretta

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).