Another nasty with ocamlyacc: magic needed?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Another nasty with ocamlyacc: magic needed?
@ 2005-12-05  5:35 skaller
  2005-12-05  6:25 ` [Caml-list] " Geoff Wozniak
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: skaller @ 2005-12-05  5:35 UTC (permalink / raw)
  To: caml-list

I have what looks like a showstopper (meaning black magic
seems to be the only solution).

I need two *distinct* parsers which use the same set
of tokens. This is currently impossible AFAICS without
magic (distinct -- in a different module entirely)

Roughly, the problem is that Ocamlyacc generates the
tokens, and they go in the parser.mli file. So any
second parser has to depend on it. However the only
way to *invoke* the parser is to pass it inside a token.
the interface for that parser must be defined AFTER
the parser.mli since it needs to know the token types.
But it has to be known INSIDE the actual parser,
to invoke it. There is no way to extend the interface
Ocamlyacc generates for a parser (you can add
extra helper functions but they don't go in
the interface).

Thus -- it is impossible. IMHO the traditional yacc
style design is poor, but it works fine for C because
there is no problem with recursive dependencies in
interfaces (header files).

In Ocaml, there is such a problem, and so the yacc design
basically fails -- it isn't suitable for Ocaml.

But i need to work around this somehow, does anyone
have any smart ideas how to fix the problem, without
abandoning Ocamlyacc?

If I could abstract the type of the secondary parser
so it didn't need to expose the knowledge of tokens,
it would fix the problem I think -- then the type could
be defined BEFORE the ocamlyacc parser, but the implementation
defined AFTER it (using Ocaml classes and class types).

Just to be clear the situation is:

user_statement:
  | USER_KEYWORD 
    { 
      let srt, kw = $1 in
      let sr = slift srt in
      print_endline ("USER KEYWORD " ^ kw);
      `AST_nop (sr,kw)
    }

Instead of `AST_nop I need to call a function
which parses more of the input, probably
passing it the current lexbuf as well.
The lexer knows this information and could
put it into the USER_KEYWORD token .. if it
were possible to define the type of the attribute
of the USER_KEYWORD token.. but that can only
be done in one place .. prior to the parser mli
file.

So I think I may have to use Obj.magic .. or completely
replace Ocamlyacc.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Another nasty with ocamlyacc: magic needed?
  2005-12-05  5:35 Another nasty with ocamlyacc: magic needed? skaller
@ 2005-12-05  6:25 ` Geoff Wozniak
  2005-12-05  7:08 ` Alessandro Baretta
  2005-12-06  0:41 ` skaller
  2 siblings, 0 replies; 6+ messages in thread
From: Geoff Wozniak @ 2005-12-05  6:25 UTC (permalink / raw)
  To: caml-list

On 5-Dec-05, at 12:35 AM, skaller wrote:

> I need two *distinct* parsers which use the same set
> of tokens. This is currently impossible AFAICS without
> magic (distinct -- in a different module entirely)
>

For the record, I ran into this problem myself.  The easiest solution  
to my particular problem was to switch to using Lisp, although I am  
not advocating that you do the same.  (I'm not interested in any kind  
of language evangelizing flamewar.)  I couldn't think of an appealing  
way to deal with the problem.

-- 
Geoff Wozniak
PhD Candidate
Department of Computer Science
University of Western Ontario

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Another nasty with ocamlyacc: magic needed?
  2005-12-05  5:35 Another nasty with ocamlyacc: magic needed? skaller
  2005-12-05  6:25 ` [Caml-list] " Geoff Wozniak
@ 2005-12-05  7:08 ` Alessandro Baretta
  2005-12-05  8:52   ` skaller
  2005-12-06  0:41 ` skaller
  2 siblings, 1 reply; 6+ messages in thread
From: Alessandro Baretta @ 2005-12-05  7:08 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

skaller wrote:
> I have what looks like a showstopper (meaning black magic
> seems to be the only solution).
> 
> I need two *distinct* parsers which use the same set
> of tokens. This is currently impossible AFAICS without
> magic (distinct -- in a different module entirely)

Let's not overstate the problem. Yacc is an imperfect tool. Everybody who used 
it knows this, but it is usually flexible enough that you can--eventually--get 
it to do what you have in mind. I would suggest that you consider a couple of 
tools which might help get out of this /impasse/. Firstly, remember that can 
define as many parser entry points as your heat desires. Each entry point is 
virtually a distinct parser, except that the token type is shared and the 
non-terminal namespace is also shared, which might be a bane or a boon depending 
on whether the two grammars you wish to impolement share some productions or 
not. Secondly, after compiling the automaton with ocamlyacc, you can always 
delete and regenerate the interface with ocamlc -i. My Makefile generator for 
ocaml, builder, actually handles this pretty nicely, allowing the parser to 
export anything you put in the header.

Alex

-- 
*********************************************************************
http://www.barettadeit.com/
Baretta DE&IT
A division of Baretta SRL

tel. +39 02 370 111 55
fax. +39 02 370 111 54

Our technology:

The Application System/Xcaml (AS/Xcaml)
<http://www.asxcaml.org/>

The FreerP Project
<http://www.freerp.org/>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Another nasty with ocamlyacc: magic needed?
  2005-12-05  7:08 ` Alessandro Baretta
@ 2005-12-05  8:52   ` skaller
  2005-12-05  9:05     ` Alessandro Baretta
  0 siblings, 1 reply; 6+ messages in thread
From: skaller @ 2005-12-05  8:52 UTC (permalink / raw)
  To: Alessandro Baretta; +Cc: caml-list

On Mon, 2005-12-05 at 08:08 +0100, Alessandro Baretta wrote:
> skaller wrote:
> > I have what looks like a showstopper (meaning black magic
> > seems to be the only solution).
> > 
> > I need two *distinct* parsers which use the same set
> > of tokens. This is currently impossible AFAICS without
> > magic (distinct -- in a different module entirely)

> Let's not overstate the problem. Yacc is an imperfect tool. 

Yup :)

>  Firstly, remember that can 
> define as many parser entry points as your heat desires. 

I do. It is useful. But the secondary parser is an RD parser
interpreter. The idea isn't to allow arbitrary grammar
extensions .. only to make particular yacc productions
open-recursive closed by the dynamically built table
of extensions.

Thus the RD parser calls back into yacc entry points,
and, the yacc productions call into the RD parser entry
points. This is organised via the lexer.

> Secondly, after compiling the automaton with ocamlyacc, you can always 
> delete and regenerate the interface with ocamlc -i. 

Yes, that is possible, but the result is contains undesired
entry points (tables and whatnot). It also isn't clear
how good a solution this is if I have to paste many other
modules into it -- I can always build the whole program
as a single module :)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Another nasty with ocamlyacc: magic needed?
  2005-12-05  8:52   ` skaller
@ 2005-12-05  9:05     ` Alessandro Baretta
  0 siblings, 0 replies; 6+ messages in thread
From: Alessandro Baretta @ 2005-12-05  9:05 UTC (permalink / raw)
  To: skaller; +Cc: caml-list

skaller wrote:

> I do. It is useful. But the secondary parser is an RD parser
> interpreter. The idea isn't to allow arbitrary grammar
> extensions .. only to make particular yacc productions
> open-recursive closed by the dynamically built table
> of extensions.
> 
> Thus the RD parser calls back into yacc entry points,
> and, the yacc productions call into the RD parser entry
> points. This is organised via the lexer.

By applying the 'ocamlyacc parser.mly; rm parser.mli; ocamlc -i parser.ml' 
approach, you can enclose the code generated by ocamlyacc into a structure. I 
used this approach when I was young and inexperienced to generate a functor from 
a parser definition, whereby the parser actions called a bunch of functions 
passed as parameters to the functor. I can also imagine that, possibly with some 
help from pa_ocamllex, you can define a pair of mutually recursive lexer and 
parser modules, which is about as far as you think about going with the yacc 
architecture.

All this does require some "black magic", but no Obj.magic, which is rather neat.

Alex

-- 
*********************************************************************
http://www.barettadeit.com/
Baretta DE&IT
A division of Baretta SRL

tel. +39 02 370 111 55
fax. +39 02 370 111 54

Our technology:

The Application System/Xcaml (AS/Xcaml)
<http://www.asxcaml.org/>

The FreerP Project
<http://www.freerp.org/>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Another nasty with ocamlyacc: magic needed?
  2005-12-05  5:35 Another nasty with ocamlyacc: magic needed? skaller
  2005-12-05  6:25 ` [Caml-list] " Geoff Wozniak
  2005-12-05  7:08 ` Alessandro Baretta
@ 2005-12-06  0:41 ` skaller
  2 siblings, 0 replies; 6+ messages in thread
From: skaller @ 2005-12-06  0:41 UTC (permalink / raw)
  To: caml-list

Thank you to all who helped on this and my other ocamlyacc problem.
A proof-of-principle implementation is now working!!

------USER DEFINED STATEMENTS DEMO-------
#statement while expr do statements done ;
#keyword whenever
#statement repeat statements whenever expr ;
#statement forall ident in expr do statements done ;

repeat 
  f x; g k; 
whenever 1>0;

while 1>0 do 
  f x;
  g x;
done;

forall i in (1,2,3) do 
  f i; 
done;
--------------------------------------

The code integrates Ocamlyacc LALR1
parser for a complex language (Felix) including expressions and
statements, with a hand written recursive descent parser engine
driven by a table of extensions to the statement non-terminal 
of the Ocamlyacc grammar. 

With respect to Felix these extensions can be added
on the fly (dynamically) as illustrated in the example.

To make this all work I had to cast two spells,
however the code is free of the evil Obj.magic
and is reentrant: no global variables!

Spell 1: 

to parse expressions, I had to add a special
entry point:

%type<expr_t * token> exprx

exprx: expr expr_terminating_token { $1, $2 }
expr_terminating_token: 
  | TOK1 { TOK1 $1 }
  | TOK2 { TOK2 $1 }
  ...

and the function parsing an expression has to 'put back'
the gratuitous lookahead token into the token source.

Having to list, by hand, all the tokens which terminate
an expression is a maintenance problem .. but it does work.
I formed the list by adding ALL tokens, and then using
ocamlyacc -v output to remove tokens that caused a conflict.

Spell 2:

In order to parse anything, the token source must
be available, however it is defined *after* the
ocamlyacc parser and so cannot be embedded into
the token, because there is no way then to specify
the type of the token. In addition, it cannot be placed
in the token at the point of defining the grammar
extensions either, because that is handled by the
pre-processor, whose job is to *construct* the token
source -- it doesn't exist until after the preprocessor
has run.

So I used a two phase approach. A new USER_STATEMENT_KEYWORD
token is used to capture the grammar production in the
preprocessor -- the production is just a list of tokens,
and so it can be defined in the ocamlyacc parser,
it has type:

%token<Flx_ast.srcref * string * token list> USER_STATEMENT_KEYWORD

Then, I modified the token source function itself,
so that when this token is seen it is replaced by:

%token<Flx_ast.srcref * string * (unit -> Flx_ast.statement_t)>
USER_STATEMENT_DRIVER

The function here is a closure, here is the actual code:

  method token_src (dummy :Lexing.lexbuf) =
    let tmp = List.hd tokens in
    tokens <- List.tl tokens;
    current_token_index <- current_token_index + 1;
    match tmp with
    | Flx_parse.USER_STATEMENT_KEYWORD (sr,s,tks) ->
      let f = fun () -> self#parse_user_statement s tks in
      Flx_parse.USER_STATEMENT_DRIVER (sr,s,f)
    | _ -> tmp

and so, by delaying the binding until the token source
object is known, and then currying it into the new
driver token, I was able to hide all the gritty details
defined after the ocamlyacc parser from it. The Ocamlyacc
parser then calls back the function stored in the token,
which calls the RD parser, which in turn can recursively
call ocamlyacc parser entry points: the actual 
ocamlyacc production used is just:

user_statement:
  | USER_STATEMENT_DRIVER  {  let srt, kw, f = $1 in f ()  }

In effect, this is 'covariant open recursion' closed by the dynamically
extensible driver tables. The LALR1/RD combination is pleasing
because it is expected to be faster and more reliable
than a pure RD parser, yet many of the advantages, including
the ease of extension remain available: in particular the
compiler writer can extend the Ocamlyacc grammar and the user
can extend the RD grammar: the coupling is fairly well isolated.

Again thanks to those who encouraged my to continue seeking
a solution!

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2005-12-06  0:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-05  5:35 Another nasty with ocamlyacc: magic needed? skaller
2005-12-05  6:25 ` [Caml-list] " Geoff Wozniak
2005-12-05  7:08 ` Alessandro Baretta
2005-12-05  8:52   ` skaller
2005-12-05  9:05     ` Alessandro Baretta
2005-12-06  0:41 ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).