caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Parser state variables
@ 2005-10-27  7:35 skaller
  2005-10-28  6:22 ` [Caml-list] " Alain Frisch
  0 siblings, 1 reply; 4+ messages in thread
From: skaller @ 2005-10-27  7:35 UTC (permalink / raw)
  To: Caml Mailing List

Are there any plans to extend Ocamlyacc to allow state variables,
in a manner similar to the recent extensions to Ocamllex to
allow them?

At present, there are two workarounds:

(a) global variables

I hope I do not need to say how bad this is for a language
that is supposed to provide good support for functional programming.

(b) Token hack trick

This trick avoids the problems of global variables by using
a proper state object. The idea is that EVERY token should
contain the state object, which is put into it by the
lexer: thus the state variables passed to the lexer can
be transmitted to the parser.

The cost of course is an extra pointer in every token,
plus the messiness of actually constructing the tokens,
which has to be repeated in every single lexing rule,
and of course, you have to declare the tokens:

%token<MyLexerState.lexer_state> ATOKEN
....

An obvious use of such a facility is a C parser,
which needs to modify a list of typedef names,
so the that lexer can generate the right token
for a given identifier.

Most C parsers use a global variable for communication,
which of course is very bad.

In other cases, such as the Felix parser, tokenisation
is entirely divorced from parsing. Nevertheless, whilst
the parse is not influence by any data, the *user actions*
could be. One such use is a 'fresh variable' counter.

Whilst LALR(1) parsers are not directly
amenable to extension, one can certainly use a pushdown
list of LALR(1) automata to integrate 'recursive descent'
parsing techniques with LALR(1).

For example, a grammar for statements might use a single
token for 'expressions', and we have a separate expression
grammar. We then create a pair of lexers: one lexes statements,
recognizing an expression as a single token, attaching the whole
expression as an attribute. The statement parser, on seeing
such a token, extracts the string from it, and lexes and
parses it using an expression lexer/parser, takes the expression
AST which results, and glues it into the statement AST.

Without arguing about the feasibility or quality of such a 
design, the point is that, for example, the expression lexer
parser might be made extensible and depend on tables.
And the question is: where do the tables come from?

The only way to do this properly at the moment is the
token hack trick -- the tables have to be stored IN
the tokens, simply because there is no way to pass them
to the parser, so that the parser can pass them to user
actions.

One possible solution is to put the state data in a Lexbuf.t,
unfortunately, this would require it to be polymorphic,
and thus the lexer and parsers would also have to be polymorphic.
Since the parser has access to the lexbuf .. it can get the user
data out and pass it to the user actions. 


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Parser state variables
  2005-10-27  7:35 Parser state variables skaller
@ 2005-10-28  6:22 ` Alain Frisch
  2005-10-28  7:11   ` skaller
  2005-10-31 22:32   ` Erik de Castro Lopo
  0 siblings, 2 replies; 4+ messages in thread
From: Alain Frisch @ 2005-10-28  6:22 UTC (permalink / raw)
  To: skaller; +Cc: Caml Mailing List

skaller wrote:
> Are there any plans to extend Ocamlyacc to allow state variables,
> in a manner similar to the recent extensions to Ocamllex to
> allow them?
> 
> At present, there are two workarounds:

Another one is to make semantic actions return functions which take the
state as an argument. This is similar to introducing a new data type to
represent pure concrete parse trees and then post-process these trees to
produce abstract syntax.

-- Alain


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Parser state variables
  2005-10-28  6:22 ` [Caml-list] " Alain Frisch
@ 2005-10-28  7:11   ` skaller
  2005-10-31 22:32   ` Erik de Castro Lopo
  1 sibling, 0 replies; 4+ messages in thread
From: skaller @ 2005-10-28  7:11 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Caml Mailing List

On Fri, 2005-10-28 at 08:22 +0200, Alain Frisch wrote:
> skaller wrote:
> > Are there any plans to extend Ocamlyacc to allow state variables,
> > in a manner similar to the recent extensions to Ocamllex to
> > allow them?
> > 
> > At present, there are two workarounds:
> 
> Another one is to make semantic actions return functions which take the
> state as an argument. This is similar to introducing a new data type to
> represent pure concrete parse trees and then post-process these trees to
> produce abstract syntax.

Aha! Another win for higher order functions! Thanks for this
insight. Dang -- so obvious when you see it :) 

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Parser state variables
  2005-10-28  6:22 ` [Caml-list] " Alain Frisch
  2005-10-28  7:11   ` skaller
@ 2005-10-31 22:32   ` Erik de Castro Lopo
  1 sibling, 0 replies; 4+ messages in thread
From: Erik de Castro Lopo @ 2005-10-31 22:32 UTC (permalink / raw)
  To: caml-list

Alain Frisch wrote:

> Another one is to make semantic actions return functions which take the
> state as an argument.

I've also used a variation on this which places a filter function
with signature 

    filter : state -> token -> token

between the lexer and the parser and modifies the stream of tokens 
as required by the current state. 

Erik
-- 
+-----------------------------------------------------------+
  Erik de Castro Lopo
+-----------------------------------------------------------+
"Death is perhaps too easy" -- Iqbal Sacranie in 1989 about
Salman Rushdie, author of "The Satanic Verses". Sacranie 
received a knighthood in 2005 as the face of 'moderate'
British Islam. He has never disowned his earlier statement.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2005-10-31 22:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-10-27  7:35 Parser state variables skaller
2005-10-28  6:22 ` [Caml-list] " Alain Frisch
2005-10-28  7:11   ` skaller
2005-10-31 22:32   ` Erik de Castro Lopo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).