From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p05ECknt016741
	for <caml-list@sympa-roc.inria.fr>; Wed, 5 Jan 2011 15:12:46 +0100
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AsoAAHYLJE3RVdg2kGdsb2JhbACkHwgVAQEBAQkJExEEIKdIjBCEYoZYAQEDBYVHBIsKiUM
X-IronPort-AV: E=Sophos;i="4.60,278,1291590000"; 
   d="scan'208";a="72141672"
Received: from mail-qw0-f54.google.com ([209.85.216.54])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 05 Jan 2011 15:12:40 +0100
Received: by qwj9 with SMTP id 9so15836013qwj.27
        for <caml-list@inria.fr>; Wed, 05 Jan 2011 06:12:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:received:mime-version:sender:reply-to:received
         :in-reply-to:references:from:date:x-google-sender-auth:message-id
         :subject:to:content-type;
        bh=nCASYMwo+Jx6kAxc/f8lJbOxjENFN64uJ0xROw82HWo=;
        b=MWZtTUErG2+z3kKiRv8V9fyP4IWkSY0VkJtnRvbq0hT1D3Khs4LzjJPwMWrccsBW/C
         69XOty6dheKd/M6zU0WUbKROyLewxz2GEcmsdnG2UauKGNqtJNSxKvIsJjnvX6+VcdWF
         t2ZOL0143y1C16k/foRDiUfM4OAkg+qpJkUuc=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:sender:reply-to:in-reply-to:references:from:date
         :x-google-sender-auth:message-id:subject:to:content-type;
        b=Ujf89CMfcJaOpYCVYSMqMHY5IfWjjhRKNhGIDHdHQusntHgMRvx6gk3TX8jG1xsjA4
         +VlZatkGkmRSpCmudpxV4aLkbWWh/oIyCKWB2wuRNwDyZYcvx2Z8YfeVsdch8EQzRbwR
         Gylx5bUrPFtuiSQMvgcwKZtBfXn7yIXIy8B7Q=
Received: by 10.224.11.18 with SMTP id r18mr21651480qar.226.1294236759933;
 Wed, 05 Jan 2011 06:12:39 -0800 (PST)
MIME-Version: 1.0
Sender: cppcrazy@gmail.com
Reply-To: boris@yakobowski.org
Received: by 10.220.188.139 with HTTP; Wed, 5 Jan 2011 06:12:19 -0800 (PST)
In-Reply-To: <20110104203152.GA3828@yquem.inria.fr>
References: <AANLkTinYBmZV94CpJsAN4sg94UYJayxrMZyW+LPm46HW@mail.gmail.com>
 <4D23353C.8020803@gmail.com> <AANLkTimtwMLh=54+Pe5413o=gxU16P=6d+cU0cBKmfqP@mail.gmail.com>
 <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr>
 <20110104174545.GA1535@yquem.inria.fr> <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr>
 <20110104203152.GA3828@yquem.inria.fr>
From: Boris Yakobowski <boris@yakobowski.org>
Date: Wed, 5 Jan 2011 15:12:19 +0100
X-Google-Sender-Auth: 11biyAzstOIPEmUQYI4RguOinMo
Message-ID: <AANLkTi=h371vF5ZCys7-nTGfnmAUnfn0=JuXurA78w4A@mail.gmail.com>
To: The Caml Mailing List <caml-list@inria.fr>
Content-Type: text/plain; charset=ISO-8859-1
Subject: Re: [Caml-list] Array.make exception and parser

On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier
<Francois.Pottier@inria.fr> wrote:
> It is true that ocamlyacc (and Menhir) offer essentially no support for
> explaining parse errors. (The "error" pseudo-token, inherited from yacc, is
> supposed to help, but in my opinion its use pollutes the grammar and makes it
> uncontrollable.) Nevertheless, as underlined by Yitzhak, I don't think there
> is a deep reason why LR parsers must be bad at explaining errors. In
> principle, upon detecting an error, an LR parser could easily dump the stack,
> which corresponds to the sentence (composed of terminal and non-terminal
> symbols) that has been recognized so far.

I think the stack would be useless for the user: too long, and
impossible to understand without the grammar. It would be barely
better for the writer of the grammar, as he would need to recognize
the parsing state to produce an intelligible error report. I think the
error token is a good idea, that just went too far. Its ability to
shift and reduce allows writing parsers that recovers from syntax
errors, but we hardly do that nowadays. Instead, using the error token
causes bogus shift/reduce conflicts...

What I propose is the following: still use the error token, but do not
allow reduction. Instead, only allow productions that return
exceptions when they contain the error token. This way, the parse
errors are caught inside the grammar, as they should, but do not
pollute the parsing itself.

> It could also display the set of
> look-ahead tokens that would *not* have caused an error in the current
> state. (Come to think of it, this is a feature that I would like to add to
> Menhir, if only time was not so much of the essence!)

This would be incredibly useful (provided the mly writer uses sane
names for its tokens, or ideally with some further cooperation from
the lexer).

Finally, a remark on the parse errors returned by the Ocaml compiler.
As many of us, I find them very frustrating. However, the fault does
not lie only in the parsing technology. The Ocaml grammar is much too
ambiguous for its own good (no difference between toplevel lets and
inner ones, no delimiters for ifs and matchs, etc...). As a result,
the compiler often reports the error much too far. Camlp4 explains
what syntactic entity it expected when it finds a parse error, but
this only works if the error is detected at the right place. Language
writers would be well inspired to keep that in mind when they design
their language.

(BTW: a link to a changelog on the homepage of Menhir would be great.
And on http://yquem.inria.fr/cgi-bin/mailman/listinfo/menhir-list, the
link to the archives is broken.)

Cheers,

-- 
Boris