caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* There's an elephant in the room: Solution to sharing a symbol table
@ 2007-05-01 21:39 Joel Reymont
  2007-05-02  5:44 ` [Menhir-list] " Francois Pottier
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Reymont @ 2007-05-01 21:39 UTC (permalink / raw)
  To: Caml List; +Cc: menhir-list, skaller


%type <Symtab.t -> Easy.program>

program: statement EOF { fun t -> List.rev ($1 t) }

etc

Would this work with Menhir and would it be detrimental to performance?

I think there's an elephant in the room. Functorization of a Menhir- 
generated parser doesn't solve this issue since the specialization  
happens at compile time whereas a new symbol table is a run-time value.

	Thanks, Joel
	
--
http://wagerlabs.com/






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
  2007-05-01 21:39 There's an elephant in the room: Solution to sharing a symbol table Joel Reymont
@ 2007-05-02  5:44 ` Francois Pottier
  2007-05-02  5:58   ` Joel Reymont
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Pottier @ 2007-05-02  5:44 UTC (permalink / raw)
  To: Joel Reymont; +Cc: Caml List, skaller, menhir-list


Hello,

On Tue, May 01, 2007 at 10:39:19PM +0100, Joel Reymont wrote:
> 
> %type <Symtab.t -> Easy.program>
> 
> program: statement EOF { fun t -> List.rev ($1 t) }
> 
> Would this work with Menhir and would it be detrimental to performance?

Yes, it would work (with ocamlyacc or Menhir). It is in fact a classic idiom.
However, you must be aware that having your semantic actions build closures
means that all of the actual work (e.g., the invocation of List.rev above)
is delayed until *after* the entire parsing process is over. In other words,
performance-wise, your code is equivalent to building an abstract syntax
tree, without using your symbol table, and then transforming that tree.

> I think there's an elephant in the room. Functorization of a Menhir- 
> generated parser doesn't solve this issue since the specialization  
> happens at compile time whereas a new symbol table is a run-time value.

Functors can be applied at run-time (via "let module"), so a functor parameter
*can* provide a parser with runtime information. I would still recommend the
%parameter approach, since it is syntactically lighter and more efficient (the
semantic actions won't be delayed).

-- 
François Pottier
Francois.Pottier@inria.fr
http://cristal.inria.fr/~fpottier/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
  2007-05-02  5:44 ` [Menhir-list] " Francois Pottier
@ 2007-05-02  5:58   ` Joel Reymont
       [not found]     ` <20070502070345.GA5242@yquem.inria.fr>
  2007-05-02  9:04     ` [Caml-list] " skaller
  0 siblings, 2 replies; 10+ messages in thread
From: Joel Reymont @ 2007-05-02  5:58 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: Caml List, skaller, menhir-list


On May 2, 2007, at 6:44 AM, Francois Pottier wrote:

> Functors can be applied at run-time (via "let module"), so a  
> functor parameter
> *can* provide a parser with runtime information. I would still  
> recommend the
> %parameter approach, since it is syntactically lighter and more  
> efficient (the
> semantic actions won't be delayed).

I have been butting my head against this for a few days now. Please,  
please provide an example!!! I'm not talking about a general example  
of a %parameter approach but an example of using that approach with a  
lexer.

Consider this function that should parser a string and return an AST:

let parse str =
   let lexbuf = Lexing.from_string str in
   program (token tab) lexbuf

The token lexer above takes a tab argument before lexbuf but program  
always expects (Lexing.lexbuf -> token) so we give program the  
curried function.

How do you write parse above with the %parameter approach to ensure  
that parse ALWAYS uses a new symbol table that is shared between the  
lexer and the parser in this parsing run?

	Thanks, Joel

--
http://wagerlabs.com/






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
       [not found]     ` <20070502070345.GA5242@yquem.inria.fr>
@ 2007-05-02  7:35       ` Joel Reymont
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Reymont @ 2007-05-02  7:35 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: menhir-list, Caml List

I'm hurt! It cannot be this easy. Thank you Francois!

On May 2, 2007, at 8:03 AM, Francois Pottier wrote:

> Well, if the parser is parameterized over "T : sig val tab: ... end",
> then you could write something like
>
>   (* assuming "tab" is defined here *)
>
>   let parser str =
>     let lexbuf = Lexing.from_string str in
>     let module P = EasyParser.Make (struct let tab = tab end) in
>     P.program (token tab) lexbuf

--
http://wagerlabs.com/






^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
  2007-05-02  5:58   ` Joel Reymont
       [not found]     ` <20070502070345.GA5242@yquem.inria.fr>
@ 2007-05-02  9:04     ` skaller
  2007-05-02 11:56       ` Francois Pottier
  1 sibling, 1 reply; 10+ messages in thread
From: skaller @ 2007-05-02  9:04 UTC (permalink / raw)
  To: Joel Reymont; +Cc: Francois.Pottier, menhir-list, Caml List

On Wed, 2007-05-02 at 06:58 +0100, Joel Reymont wrote:
> On May 2, 2007, at 6:44 AM, Francois Pottier wrote:

> How do you write parse above with the %parameter approach to ensure  
> that parse ALWAYS uses a new symbol table that is shared between the  
> lexer and the parser in this parsing run?

You can't. Don't be confused by functor interface in Menhir. 
It is useless for maintaining state.

What you need is a feature I asked for .. the possibility of
passing a state argument to the parser the same way you can
now do for ocamllex: you just write

	rule fred state = parse ...


and now 'state' is passed to all actions. That's ocamllex,
we need the same for parsing.

[I thought this was implemented but can't see it anywhere
in the Menhir manual]

This IS a way to hack this .. by using the ocamllex feature
and cheating by wrapping stuff inside tokens so that
the parser user actions can get at the state object.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
  2007-05-02  9:04     ` [Caml-list] " skaller
@ 2007-05-02 11:56       ` Francois Pottier
  2007-05-02 16:07         ` skaller
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Pottier @ 2007-05-02 11:56 UTC (permalink / raw)
  To: skaller; +Cc: Joel Reymont, menhir-list, Caml List


Hello,

On Wed, May 02, 2007 at 07:04:50PM +1000, skaller wrote:
> You can't. Don't be confused by functor interface in Menhir. 
> It is useless for maintaining state.

I don't think so...

> What you need is a feature I asked for .. the possibility of
> passing a state argument to the parser the same way you can
> now do for ocamllex: you just write
> 
> 	rule fred state = parse ...

I remember our discussion. What you need primarily, if I recall
correctly, is the ability for a semantic action to recursively
invoke the parser itself. This happens to work if the parser is
defined as a recursive function, and does not (currently) work
when the parser is defined as a functor, because the functor is
not declared as recursive. We could, in principle, make it a
recursive functor, which would allow recursive invocations.

-- 
François Pottier
Francois.Pottier@inria.fr
http://cristal.inria.fr/~fpottier/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
  2007-05-02 11:56       ` Francois Pottier
@ 2007-05-02 16:07         ` skaller
  2007-05-02 18:30           ` Francois Pottier
  0 siblings, 1 reply; 10+ messages in thread
From: skaller @ 2007-05-02 16:07 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: Caml List, menhir-list

On Wed, 2007-05-02 at 13:56 +0200, Francois Pottier wrote:
> Hello,
> 
> On Wed, May 02, 2007 at 07:04:50PM +1000, skaller wrote:
> > You can't. Don't be confused by functor interface in Menhir. 
> > It is useless for maintaining state.
> 
> I don't think so...

State must be maintained in function arguments.

Functors are entirely irrelevant. This is about associating
data with a thread of control.

Joel asked:

"How do you write parse above with the %parameter approach to ensure  
that parse ALWAYS uses a new symbol table that is shared between the  
lexer and the parser in this parsing run?"

and the answer is you have to pass an argument to the parser function.

> > What you need is a feature I asked for .. the possibility of
> > passing a state argument to the parser the same way you can
> > now do for ocamllex: you just write
> > 
> > 	rule fred state = parse ...
> 
> I remember our discussion. What you need primarily, if I recall
> correctly, is the ability for a semantic action to recursively
> invoke the parser itself. This happens to work if the parser is
> defined as a recursive function, and does not (currently) work
> when the parser is defined as a functor, because the functor is
> not declared as recursive. We could, in principle, make it a
> recursive functor, which would allow recursive invocations.

Let me give you a real example: I have a parser that
reads Felix code that looks like:

fun f .. // felix stuff
include "filename";
fun g .. //


When the parser sees the include statement it parses the
file 'filename'. The parse of the included file, in Felix,
is independent of the surrounding code. The parser is
called recursively and the resulting AST is returned
from the user action.

There is no way to do this without passing a state variable
on the stack. In fact Felix uses a variable passed to the
lexer to work around the deficiency in Ocamlyacc .. it
passes the variable to the parser in a token .. :)

[That works with Menhir too by the way]

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
  2007-05-02 16:07         ` skaller
@ 2007-05-02 18:30           ` Francois Pottier
  2007-05-03  1:17             ` skaller
  0 siblings, 1 reply; 10+ messages in thread
From: Francois Pottier @ 2007-05-02 18:30 UTC (permalink / raw)
  To: skaller; +Cc: Caml List, menhir-list


Hi,

On Thu, May 03, 2007 at 02:07:26AM +1000, skaller wrote:
> State must be maintained in function arguments.

This is a trivial example, but:

  let x = ref 0
  let f () : bool = incr x; x / 2 = 0

Here, f exploits an internal state, even though it is not explicitly
parameterized over a piece of state. Similarly, a parser can have internal
state, even if the parsing functions are not explicitly parameterized over a
piece of state. It is sufficient for the entire parser either to refer to a
global variable or to abstracted over a piece of state (which functors allow).

> When the parser sees the include statement it parses the
> file 'filename'. The parse of the included file, in Felix,
> is independent of the surrounding code. The parser is
> called recursively and the resulting AST is returned
> from the user action.
> 
> There is no way to do this without passing a state variable
> on the stack.

I don't see why this is so. In fact, nothing in your example seems to require
any kind of state -- you only need the parser to be able to recursively invoke
itself. Or am I missing something?

-- 
François Pottier
Francois.Pottier@inria.fr
http://cristal.inria.fr/~fpottier/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
  2007-05-02 18:30           ` Francois Pottier
@ 2007-05-03  1:17             ` skaller
  2007-05-03  8:48               ` Joel Reymont
  0 siblings, 1 reply; 10+ messages in thread
From: skaller @ 2007-05-03  1:17 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: Caml List, menhir-list

On Wed, 2007-05-02 at 20:30 +0200, Francois Pottier wrote:
> Hi,
> 
> On Thu, May 03, 2007 at 02:07:26AM +1000, skaller wrote:
> > State must be maintained in function arguments.
> 
> This is a trivial example, but:
> 
>   let x = ref 0
>   let f () : bool = incr x; x / 2 = 0

Yes, but I rule out global mutable state as unacceptable
by requiring reentrancy.

> Here, f exploits an internal state, even though it is not explicitly
> parameterized over a piece of state. Similarly, a parser can have internal
> state, even if the parsing functions are not explicitly parameterized over a
> piece of state. It is sufficient for the entire parser either to refer to a
> global variable or to abstracted over a piece of state (which functors allow).

As above, if that state is global the result isn't re-entrant
and isn't acceptable.

> I don't see why this is so. In fact, nothing in your example seems to require
> any kind of state -- you only need the parser to be able to recursively invoke
> itself. Or am I missing something?

I didn't show the full parser of course. In fact, Felix grammar
is user extensible, so there has to be somewhere to hold
the extensions. Similarly, a C parser requires a symbol table
for typedefs, and Felix allows embedded C.

In fact, Felix can call the FrontC/CIL parser on fragments 
of C code -- and CIL uses global variables so isn't re-entrant.

In fact Joel explained the 'elephant' technique:

   let parser str =
     let tab = .. in
     let lexbuf = Lexing.from_string str in
     let module P = EasyParser.Make (struct let tab = tab end) in
     P.program (token tab) lexbuf

which uses a local module to turn a global variable into
a function local one. That's pretty clever, the function
'parser' here is reentrant (even though P.program is not).

To support recursion you'd need:

     let module P = EasyParser.Make (struct 
       let tab = tab  
       let parser = parser (* pass fun for recursion *)
     end) in

Quite a clever use of a local modules and functors therefore,
allows %parameter to suffice, without further changes to
Menhir .. though it would still be much easier if you
could just do it like Ocamllex, I have to agree this does
work.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Re: [Menhir-list] There's an elephant in the room: Solution to sharing a symbol table
  2007-05-03  1:17             ` skaller
@ 2007-05-03  8:48               ` Joel Reymont
  0 siblings, 0 replies; 10+ messages in thread
From: Joel Reymont @ 2007-05-03  8:48 UTC (permalink / raw)
  To: skaller; +Cc: Francois.Pottier, Caml List, menhir-list


On May 3, 2007, at 2:17 AM, skaller wrote:

> In fact Joel explained the 'elephant' technique:

The elephant in the room was something that, seemingly, no one talked  
about, although it was large and glaringly obvious.

>    let parser str =
>      let tab = .. in
>      let lexbuf = Lexing.from_string str in
>      let module P = EasyParser.Make (struct let tab = tab end) in
>      P.program (token tab) lexbuf

Kudos to Francois to offering the above example.

	Thanks, Joel

--
http://wagerlabs.com/






^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-05-03  8:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-01 21:39 There's an elephant in the room: Solution to sharing a symbol table Joel Reymont
2007-05-02  5:44 ` [Menhir-list] " Francois Pottier
2007-05-02  5:58   ` Joel Reymont
     [not found]     ` <20070502070345.GA5242@yquem.inria.fr>
2007-05-02  7:35       ` Joel Reymont
2007-05-02  9:04     ` [Caml-list] " skaller
2007-05-02 11:56       ` Francois Pottier
2007-05-02 16:07         ` skaller
2007-05-02 18:30           ` Francois Pottier
2007-05-03  1:17             ` skaller
2007-05-03  8:48               ` Joel Reymont

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).