caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: skaller <skaller@users.sourceforge.net>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Another nasty with ocamlyacc: magic needed?
Date: Tue, 06 Dec 2005 11:41:35 +1100	[thread overview]
Message-ID: <1133829695.9871.97.camel@rosella> (raw)
In-Reply-To: <1133760933.8957.125.camel@rosella>

Thank you to all who helped on this and my other ocamlyacc problem.
A proof-of-principle implementation is now working!!

------USER DEFINED STATEMENTS DEMO-------
#statement while expr do statements done ;
#keyword whenever
#statement repeat statements whenever expr ;
#statement forall ident in expr do statements done ;

repeat 
  f x; g k; 
whenever 1>0;

while 1>0 do 
  f x;
  g x;
done;

forall i in (1,2,3) do 
  f i; 
done;
--------------------------------------

The code integrates Ocamlyacc LALR1
parser for a complex language (Felix) including expressions and
statements, with a hand written recursive descent parser engine
driven by a table of extensions to the statement non-terminal 
of the Ocamlyacc grammar. 

With respect to Felix these extensions can be added
on the fly (dynamically) as illustrated in the example.

To make this all work I had to cast two spells,
however the code is free of the evil Obj.magic
and is reentrant: no global variables!

Spell 1: 

to parse expressions, I had to add a special
entry point:

%type<expr_t * token> exprx

exprx: expr expr_terminating_token { $1, $2 }
expr_terminating_token: 
  | TOK1 { TOK1 $1 }
  | TOK2 { TOK2 $1 }
  ...

and the function parsing an expression has to 'put back'
the gratuitous lookahead token into the token source.

Having to list, by hand, all the tokens which terminate
an expression is a maintenance problem .. but it does work.
I formed the list by adding ALL tokens, and then using
ocamlyacc -v output to remove tokens that caused a conflict.

Spell 2:

In order to parse anything, the token source must
be available, however it is defined *after* the
ocamlyacc parser and so cannot be embedded into
the token, because there is no way then to specify
the type of the token. In addition, it cannot be placed
in the token at the point of defining the grammar
extensions either, because that is handled by the
pre-processor, whose job is to *construct* the token
source -- it doesn't exist until after the preprocessor
has run.

So I used a two phase approach. A new USER_STATEMENT_KEYWORD
token is used to capture the grammar production in the
preprocessor -- the production is just a list of tokens,
and so it can be defined in the ocamlyacc parser,
it has type:

%token<Flx_ast.srcref * string * token list> USER_STATEMENT_KEYWORD

Then, I modified the token source function itself,
so that when this token is seen it is replaced by:

%token<Flx_ast.srcref * string * (unit -> Flx_ast.statement_t)>
USER_STATEMENT_DRIVER

The function here is a closure, here is the actual code:

  method token_src (dummy :Lexing.lexbuf) =
    let tmp = List.hd tokens in
    tokens <- List.tl tokens;
    current_token_index <- current_token_index + 1;
    match tmp with
    | Flx_parse.USER_STATEMENT_KEYWORD (sr,s,tks) ->
      let f = fun () -> self#parse_user_statement s tks in
      Flx_parse.USER_STATEMENT_DRIVER (sr,s,f)
    | _ -> tmp

and so, by delaying the binding until the token source
object is known, and then currying it into the new
driver token, I was able to hide all the gritty details
defined after the ocamlyacc parser from it. The Ocamlyacc
parser then calls back the function stored in the token,
which calls the RD parser, which in turn can recursively
call ocamlyacc parser entry points: the actual 
ocamlyacc production used is just:

user_statement:
  | USER_STATEMENT_DRIVER  {  let srt, kw, f = $1 in f ()  }

In effect, this is 'covariant open recursion' closed by the dynamically
extensible driver tables. The LALR1/RD combination is pleasing
because it is expected to be faster and more reliable
than a pure RD parser, yet many of the advantages, including
the ease of extension remain available: in particular the
compiler writer can extend the Ocamlyacc grammar and the user
can extend the RD grammar: the coupling is fairly well isolated.

Again thanks to those who encouraged my to continue seeking
a solution!

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


      parent reply	other threads:[~2005-12-06  0:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-12-05  5:35 skaller
2005-12-05  6:25 ` [Caml-list] " Geoff Wozniak
2005-12-05  7:08 ` Alessandro Baretta
2005-12-05  8:52   ` skaller
2005-12-05  9:05     ` Alessandro Baretta
2005-12-06  0:41 ` skaller [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1133829695.9871.97.camel@rosella \
    --to=skaller@users.sourceforge.net \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).