Parser state variables

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: skaller <skaller@users.sourceforge.net>
To: Caml Mailing List <caml-list@inria.fr>
Subject: Parser state variables
Date: Thu, 27 Oct 2005 17:35:47 +1000	[thread overview]
Message-ID: <1130398547.7544.66.camel@rosella> (raw)

Are there any plans to extend Ocamlyacc to allow state variables,
in a manner similar to the recent extensions to Ocamllex to
allow them?

At present, there are two workarounds:

(a) global variables

I hope I do not need to say how bad this is for a language
that is supposed to provide good support for functional programming.

(b) Token hack trick

This trick avoids the problems of global variables by using
a proper state object. The idea is that EVERY token should
contain the state object, which is put into it by the
lexer: thus the state variables passed to the lexer can
be transmitted to the parser.

The cost of course is an extra pointer in every token,
plus the messiness of actually constructing the tokens,
which has to be repeated in every single lexing rule,
and of course, you have to declare the tokens:

%token<MyLexerState.lexer_state> ATOKEN
....

An obvious use of such a facility is a C parser,
which needs to modify a list of typedef names,
so the that lexer can generate the right token
for a given identifier.

Most C parsers use a global variable for communication,
which of course is very bad.

In other cases, such as the Felix parser, tokenisation
is entirely divorced from parsing. Nevertheless, whilst
the parse is not influence by any data, the *user actions*
could be. One such use is a 'fresh variable' counter.

Whilst LALR(1) parsers are not directly
amenable to extension, one can certainly use a pushdown
list of LALR(1) automata to integrate 'recursive descent'
parsing techniques with LALR(1).

For example, a grammar for statements might use a single
token for 'expressions', and we have a separate expression
grammar. We then create a pair of lexers: one lexes statements,
recognizing an expression as a single token, attaching the whole
expression as an attribute. The statement parser, on seeing
such a token, extracts the string from it, and lexes and
parses it using an expression lexer/parser, takes the expression
AST which results, and glues it into the statement AST.

Without arguing about the feasibility or quality of such a 
design, the point is that, for example, the expression lexer
parser might be made extensible and depend on tables.
And the question is: where do the tables come from?

The only way to do this properly at the moment is the
token hack trick -- the tables have to be stored IN
the tokens, simply because there is no way to pass them
to the parser, so that the parser can pass them to user
actions.

One possible solution is to put the state data in a Lexbuf.t,
unfortunately, this would require it to be polymorphic,
and thus the lexer and parsers would also have to be polymorphic.
Since the parser has access to the lexbuf .. it can get the user
data out and pass it to the user actions. 

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

next             reply	other threads:[~2005-10-27  7:36 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-10-27  7:35 skaller [this message]
2005-10-28  6:22 ` [Caml-list] " Alain Frisch
2005-10-28  7:11   ` skaller
2005-10-31 22:32   ` Erik de Castro Lopo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1130398547.7544.66.camel@rosella \
    --to=skaller@users.sourceforge.net \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).