caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Array.make exception and parser
@ 2011-01-04 13:41 Jean Krivine
  2011-01-04 14:56 ` Daniel Bünzli
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Jean Krivine @ 2011-01-04 13:41 UTC (permalink / raw)
  To: caml users

[-- Attachment #1: Type: text/plain, Size: 509 bytes --]

Hi all,

I am encountering a weird problem, I am trying to parse a very large file
(around 1.2 GB) according to a grammar defined in ocamlyacc.
During the parsing I get the exception

Invalid_argument "Array.make".

This  is strange because I am not using any array.
My guess it that a big chunk of the file I am parsing is matching a non
terminal, something like

rule:
non_term END {blah};

where non_term is  going to be 1GB of characters. Does anyone know what
could be raising the exception ?

Thanks!
J

[-- Attachment #2: Type: text/html, Size: 585 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 13:41 [Caml-list] Array.make exception and parser Jean Krivine
@ 2011-01-04 14:56 ` Daniel Bünzli
  2011-01-04 14:57   ` Daniel Bünzli
  2011-01-04 14:57 ` Török Edwin
  2011-01-05 13:37 ` Xavier Leroy
  2 siblings, 1 reply; 26+ messages in thread
From: Daniel Bünzli @ 2011-01-04 14:56 UTC (permalink / raw)
  To: Jean Krivine; +Cc: caml users

Compile and link in debug mode (-g flag, or debug tag with
ocamlbuild). Run your program with :

> export OCAMLRUNPARAM='b'  myprogram.native

You should get a backtrace to the offender. If the trace is not
precise enough compile in bytecode.

Best,

Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 13:41 [Caml-list] Array.make exception and parser Jean Krivine
  2011-01-04 14:56 ` Daniel Bünzli
@ 2011-01-04 14:57 ` Török Edwin
  2011-01-04 15:14   ` Jean Krivine
  2011-01-05 13:37 ` Xavier Leroy
  2 siblings, 1 reply; 26+ messages in thread
From: Török Edwin @ 2011-01-04 14:57 UTC (permalink / raw)
  To: caml-list

On 2011-01-04 15:41, Jean Krivine wrote:
> Hi all,
> 
> I am encountering a weird problem, I am trying to parse a very large file
> (around 1.2 GB) according to a grammar defined in ocamlyacc.

On a 32-bit or 64-bit architecture?

> During the parsing I get the exception
> 
> Invalid_argument "Array.make".
> 
> This  is strange because I am not using any array.

You can try calling 'Printexc.record_backtrace true' on startup, and
compile with -g. If you compile to bytecode the stacktrace will usually
be better, but that is not always accurate either.

Another way is to run your bytecode in ocamldebug, and use 'backstep'
from the point where exception is raised.

> My guess it that a big chunk of the file I am parsing is matching a non
> terminal, something like
> 
> rule:
> non_term END {blah};
> 
> where non_term is  going to be 1GB of characters. Does anyone know what
> could be raising the exception ?

Trying to allocate a too big array:

#ifdef ARCH_SIXTYFOUR
#define Max_wosize (((intnat)1 << 54) - 1)
#else
#define Max_wosize ((1 << 22) - 1)
#endif
    if (wsize > Max_wosize) caml_invalid_argument("Array.make");
> 
> Thanks!
> J
> 


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 14:56 ` Daniel Bünzli
@ 2011-01-04 14:57   ` Daniel Bünzli
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Bünzli @ 2011-01-04 14:57 UTC (permalink / raw)
  To: Jean Krivine; +Cc: caml users

> export OCAMLRUNPARAM='b'  myprogram.native

export OCAMLRUNPARAM='b';  myprogram.native

Sorry,

Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 14:57 ` Török Edwin
@ 2011-01-04 15:14   ` Jean Krivine
  2011-01-04 15:31     ` Daniel Bünzli
                       ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Jean Krivine @ 2011-01-04 15:14 UTC (permalink / raw)
  To: Török Edwin; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1878 bytes --]

Thanks, it seems the exception is raised by the function grow_stacks() in
the 'Parsing' module of Ocaml (not mine).
Indeed, looking at the code I can see several calls to Array.create that are
not encapsulated in a try .. with. Isn't that a bug ?

Anyway I think I found the answer of my problem.

Cheers!

2011/1/4 Török Edwin <edwintorok@gmail.com>

> On 2011-01-04 15:41, Jean Krivine wrote:
> > Hi all,
> >
> > I am encountering a weird problem, I am trying to parse a very large file
> > (around 1.2 GB) according to a grammar defined in ocamlyacc.
>
> On a 32-bit or 64-bit architecture?
>
> > During the parsing I get the exception
> >
> > Invalid_argument "Array.make".
> >
> > This  is strange because I am not using any array.
>
> You can try calling 'Printexc.record_backtrace true' on startup, and
> compile with -g. If you compile to bytecode the stacktrace will usually
> be better, but that is not always accurate either.
>
> Another way is to run your bytecode in ocamldebug, and use 'backstep'
> from the point where exception is raised.
>
> > My guess it that a big chunk of the file I am parsing is matching a non
> > terminal, something like
> >
> > rule:
> > non_term END {blah};
> >
> > where non_term is  going to be 1GB of characters. Does anyone know what
> > could be raising the exception ?
>
> Trying to allocate a too big array:
>
> #ifdef ARCH_SIXTYFOUR
> #define Max_wosize (((intnat)1 << 54) - 1)
> #else
> #define Max_wosize ((1 << 22) - 1)
> #endif
>    if (wsize > Max_wosize) caml_invalid_argument("Array.make");
> >
> > Thanks!
> > J
> >
>
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

[-- Attachment #2: Type: text/html, Size: 2698 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 15:14   ` Jean Krivine
@ 2011-01-04 15:31     ` Daniel Bünzli
  2011-01-04 16:22       ` Yitzhak Mandelbaum
  2011-01-04 17:04       ` Jean Krivine
  2011-01-04 15:38     ` bluestorm
       [not found]     ` <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr>
  2 siblings, 2 replies; 26+ messages in thread
From: Daniel Bünzli @ 2011-01-04 15:31 UTC (permalink / raw)
  To: Jean Krivine; +Cc: caml-list

> I can see several calls to Array.create that are
> not encapsulated in a try .. with. Isn't that a bug ?

That's not the bug. The Invalid_argument exception is never supposed
to be caught, it denotes a programming error from the client of the
module.

The bug lies in calling Array.make with a value larger than
Sys.max_array_length.

Best,

Daniel

P.S.

I don't know if that's your case but so many languages to parse are
LL(k) for some k. I don't really understand why people insist on using
yacc like parser generators where a recursive descent parser with some
combinators fits perfectly, seems nearly as efficient and allow you to
give more precise syntax errors to your users.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 15:14   ` Jean Krivine
  2011-01-04 15:31     ` Daniel Bünzli
@ 2011-01-04 15:38     ` bluestorm
  2011-01-04 17:43       ` Jean Krivine
       [not found]       ` <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr>
       [not found]     ` <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr>
  2 siblings, 2 replies; 26+ messages in thread
From: bluestorm @ 2011-01-04 15:38 UTC (permalink / raw)
  To: Jean Krivine; +Cc: Török Edwin, caml-list

[-- Attachment #1: Type: text/plain, Size: 912 bytes --]

Indeed, Ocaml arrays are limited in length, and the parser generator may use
arrays internally that would run into the limit for specific grammars (I
don't think the memory used is linear in the size of the input in the common
use cases).

You may be interested in trying to use menhir [1] as a Parser generator
instead of ocamlyacc. Menhir is mostly compatible with ocamlyacc, but
doesn't use the Parsing module. While I don't think it as done anything
specific to support larger input files, the issue may go away (or don't
appear on the input sizes you need) using the different menhir
implementation.

 [1] http://gallium.inria.fr/~fpottier/menhir/

Of course, patching ocamlyacc (or any other generator) to fix this issue
would be the best way to handle this. But still, switching to a different
but 90% compatible software may be a least-effort solution for you --
provided it doesn't have the same issue.

[-- Attachment #2: Type: text/html, Size: 1218 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 15:31     ` Daniel Bünzli
@ 2011-01-04 16:22       ` Yitzhak Mandelbaum
  2011-01-04 16:42         ` Daniel Bünzli
  2011-01-04 17:04       ` Jean Krivine
  1 sibling, 1 reply; 26+ messages in thread
From: Yitzhak Mandelbaum @ 2011-01-04 16:22 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: Jean Krivine, caml-list

For one perspective on this issue, you might want to take a look here:

http://lambda-the-ultimate.org/node/4150

A short answer (among many) to your particular question: it often requires a lot off effort just to discover whether " a recursive descent parser with some
combinators fits perfectly," and many, many grammars do not fit perfectly.


On Jan 4, 2011, at 10:31 AM, Daniel Bünzli wrote:

> P.S.
> 
> I don't know if that's your case but so many languages to parse are
> LL(k) for some k. I don't really understand why people insist on using
> yacc like parser generators where a recursive descent parser with some
> combinators fits perfectly, seems nearly as efficient and allow you to
> give more precise syntax errors to your users.

-----------------------------
Yitzhak Mandelbaum





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 16:22       ` Yitzhak Mandelbaum
@ 2011-01-04 16:42         ` Daniel Bünzli
  2011-01-04 17:03           ` Yitzhak Mandelbaum
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel Bünzli @ 2011-01-04 16:42 UTC (permalink / raw)
  To: Yitzhak Mandelbaum; +Cc: Jean Krivine, caml-list

> A short answer (among many) to your particular question: it often requires a lot off effort just to discover whether " a recursive descent parser with some
> combinators fits perfectly,"

Well I don't know what "a lot of effort" means to you. But IIRC my
compiler course it's a matter of taking your EBNF grammar, eliminating
left recursion and left factoring the grammar. If you can't do that
with your grammar then you cannot parse it with LL(k).

Best,

Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 16:42         ` Daniel Bünzli
@ 2011-01-04 17:03           ` Yitzhak Mandelbaum
  0 siblings, 0 replies; 26+ messages in thread
From: Yitzhak Mandelbaum @ 2011-01-04 17:03 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: Caml-list List

Daniel,

You're right about those transformations. Yet, when your grammar has hundreds of nonterminals, most or all with semantic actions attached, the transformations you describe can be quite tedious and error-prone.  Tools can certainly help here, but that would require people being familiar with them.  Its usually easier just to take Yacc for a spin than to go that effort. 

I'm not trying to advocate for one approach over another. I was just trying to answer your question about why people do the things they do.

Also, if you have any further questions, may I suggest we take it off the caml-list? I think we've wandered off-topic here. :-)

-- Yitzhak


On Jan 4, 2011, at 11:42 AM, Daniel Bünzli wrote:

>> A short answer (among many) to your particular question: it often requires a lot off effort just to discover whether " a recursive descent parser with some
>> combinators fits perfectly,"
> 
> Well I don't know what "a lot of effort" means to you. But IIRC my
> compiler course it's a matter of taking your EBNF grammar, eliminating
> left recursion and left factoring the grammar. If you can't do that
> with your grammar then you cannot parse it with LL(k).
> 
> Best,
> 
> Daniel
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 

-----------------------------
Yitzhak Mandelbaum





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 15:31     ` Daniel Bünzli
  2011-01-04 16:22       ` Yitzhak Mandelbaum
@ 2011-01-04 17:04       ` Jean Krivine
  2011-01-04 17:22         ` Daniel Bünzli
  1 sibling, 1 reply; 26+ messages in thread
From: Jean Krivine @ 2011-01-04 17:04 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On Tue, Jan 4, 2011 at 4:31 PM, Daniel Bünzli
<daniel.buenzli@erratique.ch>wrote:


> That's not the bug. The Invalid_argument exception is never supposed
> to be caught, it denotes a programming error from the client of the
> module.
>
> The bug lies in calling Array.make with a value larger than
> Sys.max_array_length.
>
>
Agree, but here I am never calling Array.make so ocaml is "internally"
calling array.make with a wrong argument. So the bug is not really mine. An
easy "patch" would be to define a Parsing.Overflow exception to throw
whenever the function grow_stack() fails...

J

[-- Attachment #2: Type: text/html, Size: 922 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 17:04       ` Jean Krivine
@ 2011-01-04 17:22         ` Daniel Bünzli
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Bünzli @ 2011-01-04 17:22 UTC (permalink / raw)
  To: Jean Krivine; +Cc: caml-list

> So the bug is not really mine.

Of course it's not.

> An easy "patch" would be to define a Parsing.Overflow exception to throw
> whenever the function grow_stack() fails...

If you mean catching Invalid_argument and raising Overflow then no.
What needs to be done is to raise Overflow when you are about to call
Array.make with a a value greater than Sys.max_array_length.

Again, Invalid_argument exceptions are programming errors (here a
programming error of the Parsing module).

Code catching Invalid_argument to implement an algorithm is unsound.
The only place where you should catch Invalid_argument is at the
toplevel of your program to report the stack trace. One obvious way of
understanding that is to see what is raised on out of bounds access
and consider the -unsafe compiler option.

Daniel

P.S. Btw if you want that to be fixed you should file a bug report.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 15:38     ` bluestorm
@ 2011-01-04 17:43       ` Jean Krivine
       [not found]       ` <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr>
  1 sibling, 0 replies; 26+ messages in thread
From: Jean Krivine @ 2011-01-04 17:43 UTC (permalink / raw)
  To: bluestorm; +Cc: Török Edwin, caml-list

[-- Attachment #1: Type: text/plain, Size: 1482 bytes --]

I just tried menhir. Looks great ! Problem is it generates an invalid
myParser.ml file:
it contains a integer

_menhir_shifted = 4611686018427387903;

which way beyond int32 integers. I guess it comes from my grammar file which
is certainly very bad, but in any case I should receive some insult from
menhir first no ?
I guess this error is linked to the Array.make exception that I get when I
use ocamlyacc!

J

On Tue, Jan 4, 2011 at 4:38 PM, bluestorm <bluestorm.dylc@gmail.com> wrote:

> Indeed, Ocaml arrays are limited in length, and the parser generator may
> use arrays internally that would run into the limit for specific grammars (I
> don't think the memory used is linear in the size of the input in the common
> use cases).
>
> You may be interested in trying to use menhir [1] as a Parser generator
> instead of ocamlyacc. Menhir is mostly compatible with ocamlyacc, but
> doesn't use the Parsing module. While I don't think it as done anything
> specific to support larger input files, the issue may go away (or don't
> appear on the input sizes you need) using the different menhir
> implementation.
>
>  [1] http://gallium.inria.fr/~fpottier/menhir/<http://gallium.inria.fr/%7Efpottier/menhir/>
>
> Of course, patching ocamlyacc (or any other generator) to fix this issue
> would be the best way to handle this. But still, switching to a different
> but 90% compatible software may be a least-effort solution for you --
> provided it doesn't have the same issue.
>

[-- Attachment #2: Type: text/html, Size: 1900 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
       [not found]     ` <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr>
@ 2011-01-04 17:45       ` Francois Pottier
  2011-01-04 19:30         ` Daniel Bünzli
       [not found]         ` <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr>
  0 siblings, 2 replies; 26+ messages in thread
From: Francois Pottier @ 2011-01-04 17:45 UTC (permalink / raw)
  To: bluestorm; +Cc: Jean Krivine, Török Edwin, caml-list


Hello all,

On Tue, Jan 04, 2011 at 04:38:56PM +0100, bluestorm wrote:
> You may be interested in trying to use menhir [1] as a Parser generator
> instead of ocamlyacc. Menhir is mostly compatible with ocamlyacc, but
> doesn't use the Parsing module. While I don't think it as done anything
> specific to support larger input files, the issue may go away (or don't
> appear on the input sizes you need) using the different menhir
> implementation.
> 
>  [1] http://gallium.inria.fr/~fpottier/menhir/

Interesting suggestion, indeed. Menhir does not use an ocaml array to
implement the parser's stack; instead, it uses a heap-allocated linked
list. It might scale up better than the ocamlyacc-generated parser
(although of course it might just fill up the heap; the linked list
approach probably consumes more space than the array-based approach
by a constant factor).

I would be curious to hear whether this works.

Earlier, Daniel Bünzli wrote:
> I don't really understand why people insist on using
> yacc like parser generators where a recursive descent parser
> with some combinators ...

For what it's worth, here is my answer: an LR parser generator (like Menhir)
accepts a larger class of grammars without refactoring (no need to eliminate
left recursion, identify common left factors, etc.) and is also able to
*explain* why the grammar is ambiguous (or rather, why it lies outside the LR
class), which a combinator-based approach cannot do.

-- 
François Pottier
Francois.Pottier@inria.fr
http://gallium.inria.fr/~fpottier/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
       [not found]       ` <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr>
@ 2011-01-04 17:53         ` Francois Pottier
  0 siblings, 0 replies; 26+ messages in thread
From: Francois Pottier @ 2011-01-04 17:53 UTC (permalink / raw)
  To: Jean Krivine; +Cc: bluestorm, Török Edwin, caml-list

On Tue, Jan 04, 2011 at 06:44:03PM +0100, Jean Krivine wrote:
> I just tried menhir. Looks great ! Problem is it generates an invalid
> myParser.ml file:
> it contains a integer
> 
> _menhir_shifted = 4611686018427387903;
> 
> which way beyond int32 integers. I guess it comes from my grammar file which
> is certainly very bad, but in any case I should receive some insult from
> menhir first no ?

Weird. Could you send a bug report that includes your grammar (if possible)
as well as the generated file and send it either to me or (even better) to
menhir-list@yquem.inria.fr?

In the meantime, you may also try with the --table option (this uses a
different code generation scheme).

-- 
François Pottier
Francois.Pottier@inria.fr
http://gallium.inria.fr/~fpottier/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 17:45       ` Francois Pottier
@ 2011-01-04 19:30         ` Daniel Bünzli
  2011-01-04 19:52           ` Yitzhak Mandelbaum
       [not found]         ` <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr>
  1 sibling, 1 reply; 26+ messages in thread
From: Daniel Bünzli @ 2011-01-04 19:30 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: caml-list

> For what it's worth, here is my answer: an LR parser generator (like Menhir)
> accepts a larger class of grammars without refactoring (no need to eliminate
> left recursion, identify common left factors, etc.) and is also able to
> *explain* why the grammar is ambiguous (or rather, why it lies outside the LR
> class), which a combinator-based approach cannot do.

So everybody trades developer convenience for end user convenience --
I mean, the syntax errors produced by LR parsers are just terrible,
the ocaml compiler is here to witness that.

That's depressing.

Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 19:30         ` Daniel Bünzli
@ 2011-01-04 19:52           ` Yitzhak Mandelbaum
  2011-01-04 20:36             ` Daniel Bünzli
  0 siblings, 1 reply; 26+ messages in thread
From: Yitzhak Mandelbaum @ 2011-01-04 19:52 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: Caml-list List

Daniel,

I'm not sure you want to bring the ocaml compiler as proof-case for your argument. Its parser is generated by ocamlyacc, which is a relatively ancient tool (the codebase is a modified version of the Berkeley Yacc, written in C. I'm guessing that the large majority of that code is well over 20 years old at this point). There are more modern LR parser generators which do a far better job and there's no theoretical reason they can't do just as well as LL tools.

Also, you might want to keep in mind that this is largely a zero-sum game. If the developers spend more time on the parser, that's less time on the rest of the compiler or new research. So, saving developer time does, in fact, mean *more* user convenience -- just in a different part of the compiler.

Yitzhak

On Jan 4, 2011, at 2:30 PM, Daniel Bünzli wrote:

>> For what it's worth, here is my answer: an LR parser generator (like Menhir)
>> accepts a larger class of grammars without refactoring (no need to eliminate
>> left recursion, identify common left factors, etc.) and is also able to
>> *explain* why the grammar is ambiguous (or rather, why it lies outside the LR
>> class), which a combinator-based approach cannot do.
> 
> So everybody trades developer convenience for end user convenience --
> I mean, the syntax errors produced by LR parsers are just terrible,
> the ocaml compiler is here to witness that.
> 
> That's depressing.
> 
> Daniel
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 

-----------------------------
Yitzhak Mandelbaum





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
       [not found]         ` <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr>
@ 2011-01-04 20:31           ` Francois Pottier
  2011-01-04 20:40             ` Lukasz Stafiniak
                               ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Francois Pottier @ 2011-01-04 20:31 UTC (permalink / raw)
  To: Daniel Bünzli; +Cc: caml-list

On Tue, Jan 04, 2011 at 08:30:48PM +0100, Daniel Bünzli wrote:
> So everybody trades developer convenience for end user convenience --
> I mean, the syntax errors produced by LR parsers are just terrible,

It is true that ocamlyacc (and Menhir) offer essentially no support for
explaining parse errors. (The "error" pseudo-token, inherited from yacc, is
supposed to help, but in my opinion its use pollutes the grammar and makes it
uncontrollable.) Nevertheless, as underlined by Yitzhak, I don't think there
is a deep reason why LR parsers must be bad at explaining errors. In
principle, upon detecting an error, an LR parser could easily dump the stack,
which corresponds to the sentence (composed of terminal and non-terminal
symbols) that has been recognized so far. It could also display the set of
look-ahead tokens that would *not* have caused an error in the current
state. (Come to think of it, this is a feature that I would like to add to
Menhir, if only time was not so much of the essence!)

-- 
François Pottier
Francois.Pottier@inria.fr
http://gallium.inria.fr/~fpottier/

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 19:52           ` Yitzhak Mandelbaum
@ 2011-01-04 20:36             ` Daniel Bünzli
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel Bünzli @ 2011-01-04 20:36 UTC (permalink / raw)
  To: Yitzhak Mandelbaum; +Cc: Caml-list List

> There are more modern LR parser generators which do a far better job and there's no theoretical reason they can't do just as well as LL tools.

I'm not talking about theory. Practically I have never seen an LR
parser throw me good error messages. Besides ocaml users
(understandably) do use ocamlyacc since it comes with the distribution
and is needed by the ocaml compiler.

> Also, you might want to keep in mind that this is largely a zero-sum game. If the developers spend more time on the parser, that's less time on the rest of the compiler or new research. So, saving developer time does, in fact, mean *more* user convenience -- just in a different part of the compiler.

I dont buy this. Loosing the time of your users by producing crappy
error messages is never a user convenience. And for a programming
language like ocaml this also has the bad side effect to make the
beginner's learning curve much steeper, which doesn't benefit ocaml
developers as a whole.

I think you overestimate the time it takes to code a recursive descent
parser. The needed infrastructure is nothing more than a programming
pattern. After that your code pretty much follows your (LL factored)
grammar.

Best,

Daniel


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 20:31           ` Francois Pottier
@ 2011-01-04 20:40             ` Lukasz Stafiniak
  2011-01-04 21:03               ` Török Edwin
  2011-01-05 14:12             ` Boris Yakobowski
  2011-01-05 21:12             ` Boris Yakobowski
  2 siblings, 1 reply; 26+ messages in thread
From: Lukasz Stafiniak @ 2011-01-04 20:40 UTC (permalink / raw)
  To: Francois.Pottier; +Cc: Daniel Bünzli, caml-list

On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier
<Francois.Pottier@inria.fr> wrote:
> On Tue, Jan 04, 2011 at 08:30:48PM +0100, Daniel Bünzli wrote:
>> So everybody trades developer convenience for end user convenience --
>> I mean, the syntax errors produced by LR parsers are just terrible,
>
> It is true that ocamlyacc (and Menhir) offer essentially no support for
> explaining parse errors.

But Menhir's support for grammar debugging is very nice and already a
strong enough argument to switch over from ocamlyacc!

Happy New Year.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 20:40             ` Lukasz Stafiniak
@ 2011-01-04 21:03               ` Török Edwin
  2011-01-05  3:24                 ` Yitzhak Mandelbaum
  0 siblings, 1 reply; 26+ messages in thread
From: Török Edwin @ 2011-01-04 21:03 UTC (permalink / raw)
  To: caml-list

On 2011-01-04 22:40, Lukasz Stafiniak wrote:
> On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier
> <Francois.Pottier@inria.fr> wrote:
>> On Tue, Jan 04, 2011 at 08:30:48PM +0100, Daniel Bünzli wrote:
>>> So everybody trades developer convenience for end user convenience --
>>> I mean, the syntax errors produced by LR parsers are just terrible,
>>
>> It is true that ocamlyacc (and Menhir) offer essentially no support for
>> explaining parse errors.
> 
> But Menhir's support for grammar debugging is very nice and already a
> strong enough argument to switch over from ocamlyacc!
> 

How about camlp4, what kind of parser is it? LL/LR?
I found its error messages more helpful than ocamlc's (for example just
running my source code through camlp4o), though it could be better
(it could say ; expected insteam of [semi]).

Best regards,
--Edwin

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 21:03               ` Török Edwin
@ 2011-01-05  3:24                 ` Yitzhak Mandelbaum
  0 siblings, 0 replies; 26+ messages in thread
From: Yitzhak Mandelbaum @ 2011-01-05  3:24 UTC (permalink / raw)
  To: Török Edwin; +Cc: caml-list

Good question -- its languages fall into neither of LL(k) or LR(k). It uses a form of priority parsing, with one token lookahead and a few grammar transformations (assuming camlp4 shares the same parsing technology as camlp5, and nothing has changed since I last took a look at the docs and source code).

In more detail:

Its basic operation is to try each branch in a series of branches until one is found which successfully parses beyond one token. So, it uses only 1 token of lookahead, and it decides which branch to take at the beginning of set of branches. These characteristics make it similar to LL(1). However, it does not ensure that branches do not conflict. So, if two branches share the same token (see more on this below), then the second one will never be taken. IIRC, the parsec parser combinator library has similar semantics for its standard branch combinator.

Furthermore, it performs a number of transformations: First, it does a (shallow) left factoring. So, if a set of branches are defined, it will left factor them without looking into any of the nonterminals. Second, it supports a certain amount of left-recursion through dynamic tracking of the current nonterminal.

--Yitzhak

On Jan 4, 2011, at 4:03 PM, Török Edwin wrote:

> On 2011-01-04 22:40, Lukasz Stafiniak wrote:
>> On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier
>> <Francois.Pottier@inria.fr> wrote:
>>> On Tue, Jan 04, 2011 at 08:30:48PM +0100, Daniel Bünzli wrote:
>>>> So everybody trades developer convenience for end user convenience --
>>>> I mean, the syntax errors produced by LR parsers are just terrible,
>>> 
>>> It is true that ocamlyacc (and Menhir) offer essentially no support for
>>> explaining parse errors.
>> 
>> But Menhir's support for grammar debugging is very nice and already a
>> strong enough argument to switch over from ocamlyacc!
>> 
> 
> How about camlp4, what kind of parser is it? LL/LR?
> I found its error messages more helpful than ocamlc's (for example just
> running my source code through camlp4o), though it could be better
> (it could say ; expected insteam of [semi]).
> 
> Best regards,
> --Edwin
> 
> -- 
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 

-----------------------------
Yitzhak Mandelbaum





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 13:41 [Caml-list] Array.make exception and parser Jean Krivine
  2011-01-04 14:56 ` Daniel Bünzli
  2011-01-04 14:57 ` Török Edwin
@ 2011-01-05 13:37 ` Xavier Leroy
  2011-01-05 13:46   ` Jean Krivine
  2 siblings, 1 reply; 26+ messages in thread
From: Xavier Leroy @ 2011-01-05 13:37 UTC (permalink / raw)
  To: caml-list

On 01/04/2011 02:41 PM, Jean Krivine wrote:

> I am encountering a weird problem, I am trying to parse a very large
> file (around 1.2 GB) according to a grammar defined in ocamlyacc.
> During the parsing I get the exception 
> Invalid_argument "Array.make".
> This  is strange because I am not using any array.

As others said, you're just overflowing the stack of the parse engine.

As no one else said already, watch out for left recusion vs. right
recursion in your grammar, esp. the rules matching long sequences.  For more
details, see e.g.

http://tldp.org/HOWTO/Lex-YACC-HOWTO-6.html

section 6.2, or many compiler textbooks.

- Xavier Leroy

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-05 13:37 ` Xavier Leroy
@ 2011-01-05 13:46   ` Jean Krivine
  0 siblings, 0 replies; 26+ messages in thread
From: Jean Krivine @ 2011-01-05 13:46 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Good point. I think the fundamental reason is my poor grammar. But I
guess this should deserve a better warning that an Invalid_argument in
my face :)

Thanks!

On Wed, Jan 5, 2011 at 2:37 PM, Xavier Leroy <Xavier.Leroy@inria.fr> wrote:
> On 01/04/2011 02:41 PM, Jean Krivine wrote:
>
>> I am encountering a weird problem, I am trying to parse a very large
>> file (around 1.2 GB) according to a grammar defined in ocamlyacc.
>> During the parsing I get the exception
>> Invalid_argument "Array.make".
>> This  is strange because I am not using any array.
>
> As others said, you're just overflowing the stack of the parse engine.
>
> As no one else said already, watch out for left recusion vs. right
> recursion in your grammar, esp. the rules matching long sequences.  For more
> details, see e.g.
>
> http://tldp.org/HOWTO/Lex-YACC-HOWTO-6.html
>
> section 6.2, or many compiler textbooks.
>
> - Xavier Leroy
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa-roc.inria.fr/wws/info/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 20:31           ` Francois Pottier
  2011-01-04 20:40             ` Lukasz Stafiniak
@ 2011-01-05 14:12             ` Boris Yakobowski
  2011-01-05 21:12             ` Boris Yakobowski
  2 siblings, 0 replies; 26+ messages in thread
From: Boris Yakobowski @ 2011-01-05 14:12 UTC (permalink / raw)
  To: The Caml Mailing List

On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier
<Francois.Pottier@inria.fr> wrote:
> It is true that ocamlyacc (and Menhir) offer essentially no support for
> explaining parse errors. (The "error" pseudo-token, inherited from yacc, is
> supposed to help, but in my opinion its use pollutes the grammar and makes it
> uncontrollable.) Nevertheless, as underlined by Yitzhak, I don't think there
> is a deep reason why LR parsers must be bad at explaining errors. In
> principle, upon detecting an error, an LR parser could easily dump the stack,
> which corresponds to the sentence (composed of terminal and non-terminal
> symbols) that has been recognized so far.

I think the stack would be useless for the user: too long, and
impossible to understand without the grammar. It would be barely
better for the writer of the grammar, as he would need to recognize
the parsing state to produce an intelligible error report. I think the
error token is a good idea, that just went too far. Its ability to
shift and reduce allows writing parsers that recovers from syntax
errors, but we hardly do that nowadays. Instead, using the error token
causes bogus shift/reduce conflicts...

What I propose is the following: still use the error token, but do not
allow reduction. Instead, only allow productions that return
exceptions when they contain the error token. This way, the parse
errors are caught inside the grammar, as they should, but do not
pollute the parsing itself.

> It could also display the set of
> look-ahead tokens that would *not* have caused an error in the current
> state. (Come to think of it, this is a feature that I would like to add to
> Menhir, if only time was not so much of the essence!)

This would be incredibly useful (provided the mly writer uses sane
names for its tokens, or ideally with some further cooperation from
the lexer).

Finally, a remark on the parse errors returned by the Ocaml compiler.
As many of us, I find them very frustrating. However, the fault does
not lie only in the parsing technology. The Ocaml grammar is much too
ambiguous for its own good (no difference between toplevel lets and
inner ones, no delimiters for ifs and matchs, etc...). As a result,
the compiler often reports the error much too far. Camlp4 explains
what syntactic entity it expected when it finds a parse error, but
this only works if the error is detected at the right place. Language
writers would be well inspired to keep that in mind when they design
their language.

(BTW: a link to a changelog on the homepage of Menhir would be great.
And on http://yquem.inria.fr/cgi-bin/mailman/listinfo/menhir-list, the
link to the archives is broken.)

Cheers,

-- 
Boris

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [Caml-list] Array.make exception and parser
  2011-01-04 20:31           ` Francois Pottier
  2011-01-04 20:40             ` Lukasz Stafiniak
  2011-01-05 14:12             ` Boris Yakobowski
@ 2011-01-05 21:12             ` Boris Yakobowski
  2 siblings, 0 replies; 26+ messages in thread
From: Boris Yakobowski @ 2011-01-05 21:12 UTC (permalink / raw)
  To: The Caml Mailing List

On Tue, Jan 4, 2011 at 9:31 PM, Francois Pottier
<Francois.Pottier@inria.fr> wrote:
> It is true that ocamlyacc (and Menhir) offer essentially no support for
> explaining parse errors. (The "error" pseudo-token, inherited from yacc, is
> supposed to help, but in my opinion its use pollutes the grammar and makes it
> uncontrollable.) Nevertheless, as underlined by Yitzhak, I don't think there
> is a deep reason why LR parsers must be bad at explaining errors. In
> principle, upon detecting an error, an LR parser could easily dump the stack,
> which corresponds to the sentence (composed of terminal and non-terminal
> symbols) that has been recognized so far.

I think the stack would be useless for the user: too long, and
impossible to understand without the grammar. It would be barely
better for the writer of the grammar, as he would need to recognize
the parsing state to produce an intelligible error report. I think the
error token is a good idea, that just went too far. Its ability to
shift and reduce allows writing parsers that recovers from syntax
errors, but we hardly do that nowadays. Instead, using the error token
causes bogus shift/reduce conflicts...

What I propose is the following: still use the error token, but do not
allow reduction. Instead, only allow productions that return
exceptions when they contain the error token. This way, the parse
errors are caught inside the grammar, as they should, but do not
pollute the parsing itself.

> It could also display the set of
> look-ahead tokens that would *not* have caused an error in the current
> state. (Come to think of it, this is a feature that I would like to add to
> Menhir, if only time was not so much of the essence!)

This would be incredibly useful (provided the mly writer uses sane
names for its tokens, or ideally with some further cooperation from
the lexer).

Finally, a remark on the parse errors returned by the Ocaml compiler.
As many of us, I find them very frustrating. However, the fault does
not lie only in the parsing technology. The Ocaml grammar is much too
ambiguous for its own good (no difference between toplevel lets and
inner ones, no delimiters for ifs and matchs, etc...). As a result,
the compiler often reports the error too far. Camlp4 explains
what syntactic entity it expected when it finds a parse error, but
this only works if the error is detected at the right place  :-(

(BTW: a link to a changelog on the homepage of Menhir would be great.
And on http://yquem.inria.fr/cgi-bin/mailman/listinfo/menhir-list, the
link to the archives is broken.)

Cheers,

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2011-01-05 21:13 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-04 13:41 [Caml-list] Array.make exception and parser Jean Krivine
2011-01-04 14:56 ` Daniel Bünzli
2011-01-04 14:57   ` Daniel Bünzli
2011-01-04 14:57 ` Török Edwin
2011-01-04 15:14   ` Jean Krivine
2011-01-04 15:31     ` Daniel Bünzli
2011-01-04 16:22       ` Yitzhak Mandelbaum
2011-01-04 16:42         ` Daniel Bünzli
2011-01-04 17:03           ` Yitzhak Mandelbaum
2011-01-04 17:04       ` Jean Krivine
2011-01-04 17:22         ` Daniel Bünzli
2011-01-04 15:38     ` bluestorm
2011-01-04 17:43       ` Jean Krivine
     [not found]       ` <1125074892.441923.1294163043602.JavaMail.root@zmbs2.inria.fr>
2011-01-04 17:53         ` Francois Pottier
     [not found]     ` <1259991756.440008.1294155536392.JavaMail.root@zmbs2.inria.fr>
2011-01-04 17:45       ` Francois Pottier
2011-01-04 19:30         ` Daniel Bünzli
2011-01-04 19:52           ` Yitzhak Mandelbaum
2011-01-04 20:36             ` Daniel Bünzli
     [not found]         ` <1263353434.442766.1294169448342.JavaMail.root@zmbs2.inria.fr>
2011-01-04 20:31           ` Francois Pottier
2011-01-04 20:40             ` Lukasz Stafiniak
2011-01-04 21:03               ` Török Edwin
2011-01-05  3:24                 ` Yitzhak Mandelbaum
2011-01-05 14:12             ` Boris Yakobowski
2011-01-05 21:12             ` Boris Yakobowski
2011-01-05 13:37 ` Xavier Leroy
2011-01-05 13:46   ` Jean Krivine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).