Re: [Caml-list] Array.make exception and parser

From: Boris Yakobowski <ml@yakobowski.org>
To: The Caml Mailing List <caml-list@inria.fr>
Subject: Re: [Caml-list] Array.make exception and parser
Date: Wed, 5 Jan 2011 22:12:53 +0100	[thread overview]
Message-ID: <AANLkTime5jiSBiFnSvsJMo1+ejKCycwnzDhwQMoawLq=@mail.gmail.com> (raw)
In-Reply-To: <20110104203152.GA3828@yquem.inria.fr>

On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier
<Francois.Pottier@inria.fr> wrote:
> It is true that ocamlyacc (and Menhir) offer essentially no support for
> explaining parse errors. (The "error" pseudo-token, inherited from yacc, is
> supposed to help, but in my opinion its use pollutes the grammar and makes it
> uncontrollable.) Nevertheless, as underlined by Yitzhak, I don't think there
> is a deep reason why LR parsers must be bad at explaining errors. In
> principle, upon detecting an error, an LR parser could easily dump the stack,
> which corresponds to the sentence (composed of terminal and non-terminal
> symbols) that has been recognized so far.

I think the stack would be useless for the user: too long, and
impossible to understand without the grammar. It would be barely
better for the writer of the grammar, as he would need to recognize
the parsing state to produce an intelligible error report. I think the
error token is a good idea, that just went too far. Its ability to
shift and reduce allows writing parsers that recovers from syntax
errors, but we hardly do that nowadays. Instead, using the error token
causes bogus shift/reduce conflicts...

What I propose is the following: still use the error token, but do not
allow reduction. Instead, only allow productions that return
exceptions when they contain the error token. This way, the parse
errors are caught inside the grammar, as they should, but do not
pollute the parsing itself.

> It could also display the set of
> look-ahead tokens that would *not* have caused an error in the current
> state. (Come to think of it, this is a feature that I would like to add to
> Menhir, if only time was not so much of the essence!)

This would be incredibly useful (provided the mly writer uses sane
names for its tokens, or ideally with some further cooperation from
the lexer).

Finally, a remark on the parse errors returned by the Ocaml compiler.
As many of us, I find them very frustrating. However, the fault does
not lie only in the parsing technology. The Ocaml grammar is much too
ambiguous for its own good (no difference between toplevel lets and
inner ones, no delimiters for ifs and matchs, etc...). As a result,
the compiler often reports the error too far. Camlp4 explains
what syntactic entity it expected when it finds a parse error, but
this only works if the error is detected at the right place  :-(

(BTW: a link to a changelog on the homepage of Menhir would be great.
And on http://yquem.inria.fr/cgi-bin/mailman/listinfo/menhir-list, the
link to the archives is broken.)

Cheers,