caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Calling ocamlyacc from ocamlyacc
@ 2003-09-01 17:28 katre
  2003-09-02 15:19 ` Michal Moskal
  0 siblings, 1 reply; 6+ messages in thread
From: katre @ 2003-09-01 17:28 UTC (permalink / raw)
  To: caml-list

Hello,

I am currently involved in a project to re-build a compiler for an old
system from the mid-1980's, where we have the original compiler docs, we
have the original source files in the language used, but the actual
compiler itself is lost.  This is an interesting pursuit, and I am
making use of it to learn ocaml.

However, due to the nature of this language, which is not very regular
at all, I am having trouble expressing a parser in ocamlyacc that isn't
a large hack.  What would be ideal would be to have one main parser, and
then one sub-parser that I could call only for a specified domain.
However, all the source code is in one place.

Is there a way to specify a separate parser and lexer (using ocamllex
and ocamlyacc), and then to jump into them from an ocamlyacc action?

Thanks for the help,
John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Calling ocamlyacc from ocamlyacc
  2003-09-01 17:28 [Caml-list] Calling ocamlyacc from ocamlyacc katre
@ 2003-09-02 15:19 ` Michal Moskal
  2003-09-02 15:23   ` katre
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Moskal @ 2003-09-02 15:19 UTC (permalink / raw)
  To: katre; +Cc: caml-list

On Mon, Sep 01, 2003 at 01:28:28PM -0400, katre wrote:
> Hello,
> 
> I am currently involved in a project to re-build a compiler for an old
> system from the mid-1980's, where we have the original compiler docs, we
> have the original source files in the language used, but the actual
> compiler itself is lost.  This is an interesting pursuit, and I am
> making use of it to learn ocaml.
> 
> However, due to the nature of this language, which is not very regular
> at all, I am having trouble expressing a parser in ocamlyacc that isn't
> a large hack.  What would be ideal would be to have one main parser, and
> then one sub-parser that I could call only for a specified domain.
> However, all the source code is in one place.
> 
> Is there a way to specify a separate parser and lexer (using ocamllex
> and ocamlyacc), and then to jump into them from an ocamlyacc action?

You can define several start symbols in your grammar. Parsing functions
are defined for each. You can also define several rule ... in your lexer
(lexing functions are defined for each). Hope that helps, I can't help
more, since I don't quite understand nature of your problem.

-- 
: Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv
: When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Calling ocamlyacc from ocamlyacc
  2003-09-02 15:19 ` Michal Moskal
@ 2003-09-02 15:23   ` katre
  2003-09-02 15:40     ` Michal Moskal
                       ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: katre @ 2003-09-02 15:23 UTC (permalink / raw)
  To: caml-list

Michal Moskal wrote:
> 
> You can define several start symbols in your grammar. Parsing functions
> are defined for each. You can also define several rule ... in your lexer
> (lexing functions are defined for each). Hope that helps, I can't help
> more, since I don't quite understand nature of your problem.
> 

Right, but is there a way, in a ocamlyacc action, to switch which lexer
rule you're using?  That seems to be the main part I am missing.  Or if
I could access the lexbuf directly, I could also use that.

John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Calling ocamlyacc from ocamlyacc
  2003-09-02 15:23   ` katre
@ 2003-09-02 15:40     ` Michal Moskal
  2003-09-03 10:28     ` Hendrik Tews
  2003-09-06  3:36     ` skaller
  2 siblings, 0 replies; 6+ messages in thread
From: Michal Moskal @ 2003-09-02 15:40 UTC (permalink / raw)
  To: katre; +Cc: caml-list

On Tue, Sep 02, 2003 at 11:23:40AM -0400, katre wrote:
> Michal Moskal wrote:
> > 
> > You can define several start symbols in your grammar. Parsing functions
> > are defined for each. You can also define several rule ... in your lexer
> > (lexing functions are defined for each). Hope that helps, I can't help
> > more, since I don't quite understand nature of your problem.
> > 
> 
> Right, but is there a way, in a ocamlyacc action, to switch which lexer
> rule you're using?  That seems to be the main part I am missing.  Or if
> I could access the lexbuf directly, I could also use that.

I believe you can set some flag in lexer (from parser action), to make
it switch for another rule. But you have to consider lookahead.

-- 
: Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv
: When in doubt, use brute force. -- Ken Thompson : {E-,w}-- {b++,e}>+++ h

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Calling ocamlyacc from ocamlyacc
  2003-09-02 15:23   ` katre
  2003-09-02 15:40     ` Michal Moskal
@ 2003-09-03 10:28     ` Hendrik Tews
  2003-09-06  3:36     ` skaller
  2 siblings, 0 replies; 6+ messages in thread
From: Hendrik Tews @ 2003-09-03 10:28 UTC (permalink / raw)
  To: caml-list

katre writes:
   
   Right, but is there a way, in a ocamlyacc action, to switch which lexer
   rule you're using?  That seems to be the main part I am missing.  Or if
   I could access the lexbuf directly, I could also use that.
   
In the lexer you can do 

rule lexer =
   parse
     | ""          { match !global_var with
	                | Xlex -> xlex lexbuf
                        | Ylex -> ylex lexbuf
                   }

and xlex =
   parse 
      ....


and ylex =
   parse
      ....


You can set the global_var from the actions in the grammar. 

However, there is probably a better solution: First note that
ocamlyacc generated functions expect a (Lexing.lexbuf -> token)
function. So you can write your own master lexer:

let lexer lexbuf = match !global_var with
  | Xlex -> Lexer.xlex lexbuf
  | Ylex -> ....

with a bit of hacking you can also combine ocamllex lexers with
other lexers.


In both approaches the problem is the lookahead token: In some
cases yacc fetches the next token and decides on that token
whether to shift or reduce. If the action taken on reduce changes
the lexer then have used the wrong lexer for the next token.

You can examine the grammar.output file and the OCAMLRUNPARAM=p
trace to find out if ocamlyacc needs the lookahead token for a
given rule. (I can give examples on that if you are interested.)


Bye,

Hendrik

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Calling ocamlyacc from ocamlyacc
  2003-09-02 15:23   ` katre
  2003-09-02 15:40     ` Michal Moskal
  2003-09-03 10:28     ` Hendrik Tews
@ 2003-09-06  3:36     ` skaller
  2 siblings, 0 replies; 6+ messages in thread
From: skaller @ 2003-09-06  3:36 UTC (permalink / raw)
  To: katre; +Cc: caml-list

On Wed, 2003-09-03 at 01:23, katre wrote:
> Michal Moskal wrote:
> > 
> > You can define several start symbols in your grammar. Parsing functions
> > are defined for each. You can also define several rule ... in your lexer
> > (lexing functions are defined for each). Hope that helps, I can't help
> > more, since I don't quite understand nature of your problem.
> > 
> 
> Right, but is there a way, in a ocamlyacc action, to switch which lexer
> rule you're using?  That seems to be the main part I am missing.  Or if
> I could access the lexbuf directly, I could also use that.

I think the answer is no, you can't do that.
The reason is that yacc et al are LR(1) meaning
1 token look ahead is needed before a reduction:
in general, when a reduction occurs there is no
guarrantee what other tokens haven't been fetched.

Now yacc/lex are normally driven by the parser
fetching tokens. So what you _can_ do is pretend
that the 'deviant sublanguage' you need to use
a different grammar for is a single *huge* token.

The lexer, unlike the parser, can be invoked
recursively, and in particular when a regex
is matched, the code which returns a value can
do anything.

In particular, you can switch to another lexing rule.
I do this all the time for handling comments
and strings: the lexeme for open quote is
recognised and it calls a string gathering
rule which uses the same lexbuf.

Lexbufs have a current position, which is the
end of the lexeme .. even if the finite state
automaton actually looked ahead further.

So with lexbuf, you know *exactly* where you are,
whenever the code associated with a regexp is matched.

Here is the mainline lexer:
....
(* Python strings *)
| quote  { fun state -> state#inbody; parse_qstring lexbuf state }
| qqq    { fun state -> state#inbody; parse_qqqstring lexbuf state }


which invokes the sublexer:

rule parse_qstring = parse
| qstring { 
    fun state -> 
      state#inbody;
      [STRING (
        state#get_srcref lexbuf, 
        state#decode decode_qstring (lexeme lexbuf)
      )] 
  }
| _ { 
  fun state -> 
    [ERRORTOKEN (
      state#get_srcref lexbuf, 
      "' string"
    )] 
  }

and parse_qqqstring = parse
| qqqstring { 
    fun state -> 
      state#inbody;
      [STRING (
        state#get_srcref lexbuf, 
        state#decode decode_qqqstring (lexeme lexbuf)
      )] 
  }
| _ { 
  fun state -> 
    state#inbody;
    [ERRORTOKEN (
      state#get_srcref lexbuf, 
      "''' string"
    )] 
  }

-------
Now, I said you can do anything, and in the example I just
call another lexer rule, but .. there is no reason you
can't call a parser function, passing the same lexbuf.

Note that you have to do this from the LEXER code,
to ensure that the sub-parser is invoked on exactly
the correct starting character.


You may also note in the example my lexer codes
have a 

	fun state -> ...

form (for every lexeme which is boring to write).
This state is a mutable object which is passed
to the lexer as an extra argument (just add it
after the call to the rule as in the nested example:

| quote  { fun state -> state#inbody; parse_qstring lexbuf state }
--------------------------------------*************--------*****
-                                     rule                extra arg


Note: I am returning lists of tokens not tokens. My lexer code
is NOT called by the parser. I call it myself and build a list
of tokens, pre-process them, and pass the output of that to
the parser via a dummy lexbuf.

I have in fact constructed a PYTHON parser using Ocamlyacc,
even though Python grammar is 'strongly not LR(1)' :-))
I do this by something like 13 filterings of the token
streams (to find the indentation etc) before it
is in an LR(1) form.


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-09-06  3:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-01 17:28 [Caml-list] Calling ocamlyacc from ocamlyacc katre
2003-09-02 15:19 ` Michal Moskal
2003-09-02 15:23   ` katre
2003-09-02 15:40     ` Michal Moskal
2003-09-03 10:28     ` Hendrik Tews
2003-09-06  3:36     ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).