Re: [Caml-list] yacc & lex: Does the contructor name matter?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Xavier Leroy <xavier.leroy@inria.fr>
To: Caml List <caml-list@inria.fr>, ostruszk@order.if.uj.edu.pl
Subject: Re: [Caml-list] yacc & lex: Does the contructor name matter?
Date: Tue, 3 Dec 2002 11:01:53 +0100	[thread overview]
Message-ID: <20021203110153.B7512@pauillac.inria.fr> (raw)
In-Reply-To: <20021202095557.A20380@order.if.uj.edu.pl>; from ostruszk@order.if.uj.edu.pl on Mon, Dec 02, 2002 at 09:55:57AM +0100

> 1. Does the constructor name matter in yacc and lex?
> 
> I've suddenly wanted to have an interpreter (I'm rather numeric guy so
> this comes to me as an surprise :)) so as you probably guess I started
> with a calculator :).  The problem is that I don't understand behaviour
> of the following implementation -- if I change the constructor name EoL
> into EOF then everything works OK but otherwise I have a "parse error".
> Could someone enlighten me?  If the grammar (line rule especially) is
> incorrect then I should always get "parse error", if it is correct then
> it should work no matter the name of the constructor --- at least this
> is what I think.

The EOF token is indeed special, in the sense that it is the only
valid token for the lookahead of a grammar entry point.
To understand why in full details, you need to look at the .output
file generated by ocamlyacc -v.  But here is a simplified account.
The actual grammar that ocamlyacc compiles down to a PDA has some
additional productions:

$accept: %entry EOF       { should not reduce }
%entry:  '\001' line      { stop parsing with result $2 }
line:    expr             { Some $1 }
      | /*nothing*/       { None }
...

where "$accept" is the real entry point and EOF is either a
user-provided token with that name, or if none was provided,
an internally-generated token distinct from any other.

(The %entry and '\001' business is used to select between multiple
entry points; you can ignore it for the purposes of this explanation.)

Here, the parsing automaton cannot reduce "line" by the "nothing"
production (thus terminating the parsing) without looking at the next
token, which is EoL in your case.  However, EoL and EOF are distinct
tokens, so the input doesn't match the grammar, and a parse error is
generated.

The best way to avoid this difficulty is to make sure that all your
grammar entry points are explicitly terminated by a token, as shown
in the example in section 12.6 of the manual:

    %start line

    %%
    line:
      | expr EoL { Some $1 }
      | /* nothing */ EoL { None }
    ...

Here, the parsing automaton will reduce "line" by the correct rule
just by seeing the EoL token; no test "is the next token EOF?" will
ever be performed.

The moral of this story is thus: always terminate your entry points.

> 2. My second question is what is the most efficient representation of
> the compiled expression.  I don't care how long (with some reasonable
> limits :)) it will take to compile but I want the result be the fastest
> afterwords.  The only representations I can come up with are:
> - keep AST and recursively pattern match on it
> - compile AST to stack like evaluator: there's stack of the results and
>   a list of functions that perform operations on it: push_var,
>   push_const, plus, minus etc... and the result is left as the only
>   value on the stack after applying all function in the list
> - compile AST to list of functions which will take in arguments the
>   operands and the "pointer" of the place where to store the result.
>   I don't have now a clear idea how to do it (I still think in C :)) but
>   the motivation is to remove pushing functions and stack management of
>   the previous solution

I don't understand your third option.  My advice would be to go for
the simplest, clearest solution, which is the first you mentioned.
Your second solution is closer to a bytecode interpreter (a virtual
machine), but to make it significantly more efficient than the first,
you'd need to go further: encode instructions as a datatype, not as
functions; represent the program by arrays or strings rather than
lists of instructions; etc.  Make sure it's worth the effort before
embarking on this task :-)

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

next prev parent reply	other threads:[~2002-12-03 10:02 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-12-02  8:55 Andrzej M. Ostruszka
2002-12-03 10:01 ` Xavier Leroy [this message]
2002-12-03 10:48   ` [Caml-list] " Andrzej M. Ostruszka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021203110153.B7512@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=ostruszk@order.if.uj.edu.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).