[Caml-list] yacc & lex: Does the contructor name matter?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] yacc & lex: Does the contructor name matter?
@ 2002-12-02  8:55 Andrzej M. Ostruszka
  2002-12-03 10:01 ` Xavier Leroy
  0 siblings, 1 reply; 3+ messages in thread
From: Andrzej M. Ostruszka @ 2002-12-02  8:55 UTC (permalink / raw)
  To: Caml List

I've got two questions (I hope they qualify to this list -- otherwise
my apologies)

1. Does the constructor name matter in yacc and lex?

I've suddenly wanted to have an interpreter (I'm rather numeric guy so
this comes to me as an surprise :)) so as you probably guess I started
with a calculator :).  The problem is that I don't understand behaviour
of the following implementation -- if I change the constructor name EoL
into EOF then everything works OK but otherwise I have a "parse error".
Could someone enlighten me?  If the grammar (line rule especially) is
incorrect then I should always get "parse error", if it is correct then
it should work no matter the name of the constructor --- at least this
is what I think.

--->8--parser.mly-----
%token <float> Num
%token Plus Minus Times Div Pow
%token LParen RParen
%token EoL  <-- change this into EOF and it will work

%left Plus Minus
%left Times Div
%right Pow
%nonassoc UMinus

%start line
%type <float option> line

%%
line:
  | expr { Some $1 }
  | /* nothing */ { None }
;

expr:
  | Num                 { $1 }
  | LParen expr RParen  { $2 }
  | expr Plus expr      { $1 +. $3 }
  | expr Minus expr     { $1 -. $3 }
  | expr Times expr     { $1 *. $3 }
  | expr Div expr       { $1 /. $3 }
  | expr Pow expr       { $1 ** $3 }
  | Minus expr %prec UMinus { -. $2 }
;
--->8--lexer.mll-----
{
open Parser
}

let white = [' ' '\t']
let digit = ['0'-'9']

rule token = parse
  | white+  { token lexbuf }    (* skip white *)
  | digit* '.'? digit* (['e' 'E'] ['+' '-']? digit+)?
         { Num (float_of_string (Lexing.lexeme lexbuf)) }
  | '('  { LParen }
  | ')'  { RParen }
  | '+'  { Plus }
  | '-'  { Minus }
  | '*'  { Times }
  | '/'  { Div }
  | '^'  { Pow }
  | eof  { EoL } <--- change this to EOF
--->8----------------
And if that matters here's calc.ml

let main_loop () =
  let go_on = ref true in
  while !go_on do
    begin
      try
	let lexbuf = Lexing.from_string (read_line ()) in
	begin match Parser.line Lexer.token lexbuf with
	| Some res -> print_float res; print_newline ()
	| None -> print_endline "Give some meaningful expression"
	end
      with
      | End_of_file -> go_on := false
      | Failure s -> print_endline ("failure: " ^ s)
      | Parsing.Parse_error -> print_endline "parse error"
    end;
    flush stdout
  done

let _ = main_loop ()

2. My second question is what is the most efficient representation of
the compiled expression.  I don't care how long (with some reasonable
limits :)) it will take to compile but I want the result be the fastest
afterwords.  The only representations I can come up with are:
- keep AST and recursively pattern match on it
- compile AST to stack like evaluator: there's stack of the results and
  a list of functions that perform operations on it: push_var,
  push_const, plus, minus etc... and the result is left as the only
  value on the stack after applying all function in the list
- compile AST to list of functions which will take in arguments the
  operands and the "pointer" of the place where to store the result.
  I don't have now a clear idea how to do it (I still think in C :)) but
  the motivation is to remove pushing functions and stack management of
  the previous solution

I guess that most of you are quite experienced in this stuff and can shed
a bit of light on this subject.

						Best regards
PS. Sorry for a length post
-- 
    ____   _  ___
   /  | \_/ |/ _ \		Andrzej Marek Ostruszka
  / _ |     | (_) | Instytut Fizyki, Uniwersytet Jagiellonski (Cracow)
 /_/ L|_|V|_|\___/	(PGP <-- finger ostruszk@order.if.uj.edu.pl)
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] yacc & lex: Does the contructor name matter?
  2002-12-02  8:55 [Caml-list] yacc & lex: Does the contructor name matter? Andrzej M. Ostruszka
@ 2002-12-03 10:01 ` Xavier Leroy
  2002-12-03 10:48   ` [Caml-list] " Andrzej M. Ostruszka
  0 siblings, 1 reply; 3+ messages in thread
From: Xavier Leroy @ 2002-12-03 10:01 UTC (permalink / raw)
  To: Caml List, ostruszk

> 1. Does the constructor name matter in yacc and lex?
> 
> I've suddenly wanted to have an interpreter (I'm rather numeric guy so
> this comes to me as an surprise :)) so as you probably guess I started
> with a calculator :).  The problem is that I don't understand behaviour
> of the following implementation -- if I change the constructor name EoL
> into EOF then everything works OK but otherwise I have a "parse error".
> Could someone enlighten me?  If the grammar (line rule especially) is
> incorrect then I should always get "parse error", if it is correct then
> it should work no matter the name of the constructor --- at least this
> is what I think.

The EOF token is indeed special, in the sense that it is the only
valid token for the lookahead of a grammar entry point.
To understand why in full details, you need to look at the .output
file generated by ocamlyacc -v.  But here is a simplified account.
The actual grammar that ocamlyacc compiles down to a PDA has some
additional productions:

$accept: %entry EOF       { should not reduce }
%entry:  '\001' line      { stop parsing with result $2 }
line:    expr             { Some $1 }
      | /*nothing*/       { None }
...

where "$accept" is the real entry point and EOF is either a
user-provided token with that name, or if none was provided,
an internally-generated token distinct from any other.

(The %entry and '\001' business is used to select between multiple
entry points; you can ignore it for the purposes of this explanation.)

Here, the parsing automaton cannot reduce "line" by the "nothing"
production (thus terminating the parsing) without looking at the next
token, which is EoL in your case.  However, EoL and EOF are distinct
tokens, so the input doesn't match the grammar, and a parse error is
generated.

The best way to avoid this difficulty is to make sure that all your
grammar entry points are explicitly terminated by a token, as shown
in the example in section 12.6 of the manual:

    %start line

    %%
    line:
      | expr EoL { Some $1 }
      | /* nothing */ EoL { None }
    ...

Here, the parsing automaton will reduce "line" by the correct rule
just by seeing the EoL token; no test "is the next token EOF?" will
ever be performed.

The moral of this story is thus: always terminate your entry points.

> 2. My second question is what is the most efficient representation of
> the compiled expression.  I don't care how long (with some reasonable
> limits :)) it will take to compile but I want the result be the fastest
> afterwords.  The only representations I can come up with are:
> - keep AST and recursively pattern match on it
> - compile AST to stack like evaluator: there's stack of the results and
>   a list of functions that perform operations on it: push_var,
>   push_const, plus, minus etc... and the result is left as the only
>   value on the stack after applying all function in the list
> - compile AST to list of functions which will take in arguments the
>   operands and the "pointer" of the place where to store the result.
>   I don't have now a clear idea how to do it (I still think in C :)) but
>   the motivation is to remove pushing functions and stack management of
>   the previous solution

I don't understand your third option.  My advice would be to go for
the simplest, clearest solution, which is the first you mentioned.
Your second solution is closer to a bytecode interpreter (a virtual
machine), but to make it significantly more efficient than the first,
you'd need to go further: encode instructions as a datatype, not as
functions; represent the program by arrays or strings rather than
lists of instructions; etc.  Make sure it's worth the effort before
embarking on this task :-)

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [Caml-list] Re: yacc & lex: Does the contructor name matter?
  2002-12-03 10:01 ` Xavier Leroy
@ 2002-12-03 10:48   ` Andrzej M. Ostruszka
  0 siblings, 0 replies; 3+ messages in thread
From: Andrzej M. Ostruszka @ 2002-12-03 10:48 UTC (permalink / raw)
  To: Caml List

On Tue, Dec 03 (2002), Xavier Leroy wrote:
[...]

First of all thank you for an explanation

> The actual grammar that ocamlyacc compiles down to a PDA has some
> additional productions:
> 
> $accept: %entry EOF       { should not reduce }
> %entry:  '\001' line      { stop parsing with result $2 }
> line:    expr             { Some $1 }
>       | /*nothing*/       { None }

I've looked at the .ml and .mli but haven't found anything regarding EOF
so that's why I asked.

> The best way to avoid this difficulty is to make sure that all your
> grammar entry points are explicitly terminated by a token, as shown
> in the example in section 12.6 of the manual:
> 
>     %start line
> 
>     %%
>     line:
>       | expr EoL { Some $1 }
>       | /* nothing */ EoL { None }

:), I've read that section and knew the solution and just wanted to know
the cause of this behaviour.

> > - compile AST to list of functions which will take in arguments the
> >   operands and the "pointer" of the place where to store the result.
> >   I don't have now a clear idea how to do it (I still think in C :)) but
> >   the motivation is to remove pushing functions and stack management of
> >   the previous solution
> 
> I don't understand your third option.

Well I thought about having some list of records
{ double x, y; double* r; and maybe function pointer }
where the function would take x and y and store the result where
r points to and the job of the compiler would be to craft those pointers
(and x's in case of constants) appropriately.  As I've told I'm still
rather C than OCaml guy :).

						Best regards

PS. Doubling of the posts is not my fault it happens "somewhere" in
  France (one can check that by message ID).  Maybe this is because I'm
  sending from another address that I'm subscribed?
-- 
    ____   _  ___
   /  | \_/ |/ _ \		Andrzej Marek Ostruszka
  / _ |     | (_) | Instytut Fizyki, Uniwersytet Jagiellonski (Cracow)
 /_/ L|_|V|_|\___/	(PGP <-- finger ostruszk@order.if.uj.edu.pl)
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2002-12-03 19:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-12-02  8:55 [Caml-list] yacc & lex: Does the contructor name matter? Andrzej M. Ostruszka
2002-12-03 10:01 ` Xavier Leroy
2002-12-03 10:48   ` [Caml-list] " Andrzej M. Ostruszka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).