Re: [Caml-list] Ocamlyacc vs stream parser

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Re: [Caml-list] Ocamlyacc vs stream parser
@ 2003-06-05 14:02 Diego Olivier Fernandez Pons
  2003-06-10  9:03 ` Diego Olivier Fernandez Pons
  0 siblings, 1 reply; 12+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-06-05 14:02 UTC (permalink / raw)
  To: caml-list

    Bonjour,

Michal Moskal a écrit :
> Oh, so by proof-by-book you're right :-) But in practice LARL(1)
> seems more usefull for parsing, at parsing least programming
> languages.

It is easier to be right when one uses up-to-date books. Even if it is
an excellent book, the Dragon is almost 20 years old, which in
computer science is quite a lot.

Anyway, that is not the main point.

The question is 'are LR parsers really more powerfull/difficult to
implement/ difficult to hack/difficult to understand... than LL
parsers' ? And why ?

There are three fundamental parts in a parser :
- the control automaton
- the desambiguisation technique
- the non-determinism handling system

LL parsers are traditionaly compiled to push-down automata whereas LR
parsers usualy use LR(0) automata. In the first case the stack is
explicit and transitions have the form (q, E, t, E', q') which means q
-> q' is possible if (E, t) is the current stack * string, and if the
transition is taken, (E, t) will be removed and (E', epsilon) pushed.
In the second case the stack is implicit and contains the list of the
previously visited nodes. When a transition (q, t, q') is taken, q is
pushed in the stack. When a reduction takes place A -> abc, 3 symbols
of the stack are popped.

This may seem a big difference, but both stack-automata are
equivalent. You can transform one into the other and then write a LL
parser with a LR(0) control automata, and conversely a pda LR parser.
In fact, the Dragon book (chapter 4.4 of the french edition at least)
shows a LL parser controled by a LR(0) automaton. Wim Pijls wrote in
1993 a paper titled 'Unifying LL and LR parsing' in which he shows
that 'traversing the LR(0) automaton in a special way is equivalent to
LL(1) parsing'. (More recent papers 'LR, LC and LL parsing, some new
points of view'. All papers available on citeseer)

Then, LR and LL have the same (theoretical) power, which is the
ability of parsing algebraic (ie context-free) grammars.

Of course... but conflicts ?

When you clearly separate the control automaton and the
desambiguisation mechanism, you notice that all usual algorithms
(LL(k), LR(k), LALR(k), SLR(k)) use the same principle : approximating
algebraic languages by rational ones.

Let's take an LL conflict example :

sentence to parse 'abaa'

S -> aX | aY | cZ
X -> b...
Y -> c...
Z -> d...

Which one to chose, X or Y ?

For an LR algorithm,

a shift/reduce conflict

X -> a
Y -> aa

a reduce/reduce conflict

X -> a
Y -> a

Every non-terminal X in a grammar generates a context-free language
L(X) which can be approximated (upper bound) by a rational language
R(X). The idea is that checking if a sentence can be derivated from a
rational language is linear while it is cubic for the algebraic case.

(LL example)

L(X) is included in aE*
L(Y) is included in bE*

When your entery is ab... if it can be derived by the grammar, the
only way is to be derived by X (S -> aX -> ab...)

Both LL(k) and LR(k) use a k-prefix rational approximation technique.
That is why their tables occupate so much space (the cartesian product
of the control automaton and a k-trie). LALR and SLR follow the same
idea but merge some states according to some given rules.

Then, there is no reason for LR, LALR and SLR to be more conceptually
difficult than LL.

Notice that TAG (Tree Adjoint Grammars) and HPSG (Head driven Phrase
Structure Grammar) try to compute more discriminating approximations
using the properties of some particular terminals (named 'foot' in the
first, 'head' in the latter).

The difference of power can be explained by a few simple arguments :
- LL(1) should be compared to LR(0), not to LR(1)
- top-down algorithms elagate, bottom-up algorithms memorize. But the
approximation is essentially an elagation technique. Then, LR take
more advantage of this orthogonality. But the same power could be
achieved for LL parsers if they were added memorization (LL + CPS,
...).

Finally, the non-determinism handling question... There are some
conflicts you just cannot solve (otherwhise you would have proven
rational = algebraic). There are two techniques : breadth first search
(GLR) and depth-first search (backtracking). Both require memorization
not to compute several times the same thing.

Of course, Masaru Tomita named in 1986 his parsing technique (using
ideas of Bernard Lang in 1974, theirselves taked from Jan Earley 1970)
General LR. But it could have been General LL too, since graph
traversals are generic algorithms.

Brian Rogoff a écrit :
> I happen to think that recursive descent is the best way to write
> parsers, but note that recursive descent parsers are capable of
> parsing non-LL(1) grammars, even without the fairly obvious hacks.

Michal Moskal a écrit :
> But keyword parser build with camlp4 libraries can be modified at
> runtime

The ease of implementation is another classical discussion. The
parsing algorithm (ascending or descending) is orthogonal to its
recursive functions/table implementation or the static/dynamic data
structures used.

Most of the people think 'huge tables' as soon as they hear LR(1).
But this is only the case if you make the two following design choices
- collapsing the control automaton and the desambiguisation automata
- representing graphs by tables

You can of course implement a LR(0) automaton with recursive functions
(recursive ascent), using function calls and exceptions (or any
equivalent technique...). LR(0) has in general a quite moderated
number of states.

You can represent all your graphs (stack or desambiguisation automata)
with purely functional data structures, leading to a parser which can
be modified at runtime.

More over, separating correctly the three parts of your parsers allow
you to use different representation/optimisation techniques for every
element :  non-deterministic bit atomata for desambiguisation automata
of less than 31 states, matrix representation for dense graphs, ...
instead of one monolithic 'table compression' technique. And all this
algorithms can be hidden behind a library, not to be seen by the
low-level students, casual users, common programmers...

Last point, as said by Luc Maranget, exhaustiveness can be computed on
pda or lr(0) automata. Not having anything to do with conflict
resolution, there is no need to work on the desambiguisation automata.

        Diego Olivier

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-06-05 14:02 [Caml-list] Ocamlyacc vs stream parser Diego Olivier Fernandez Pons
@ 2003-06-10  9:03 ` Diego Olivier Fernandez Pons
  0 siblings, 0 replies; 12+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-06-10  9:03 UTC (permalink / raw)
  To: caml-list

    Bonjour,

Michal Moskal a écrit :

> Sorry, I thought camlp4 recognizes LL(1) languages, and my dragon
> book copy states that LR(1) > LL(1) (I'm not sure about LARL(1)
> though.

Well... I must admit remembering that LL(1) < LALR(1) was almost true.
Which means that I knew from the beginning you weren't so far :-)

I found in comp.compilers an example by Terence Parr of a LL(1)
grammar which is not LALR(1) - I have not checked it ! -

http://compilers.iecc.com/comparch/article/93-09-025

He says 'there is at least one LL(1) grammar which is not LALR(1)'

Same comment in the Errata page of Andrew Appel's book, edition of
1997 (the figure of that edition was incorrect)

http://www.cs.princeton.edu/~appel/modern/basic/ml/errata.html

'Figure 3.26 incorrectly shows LL(1) as a subset of SLR. In fact,
LL(1) is not even a subset of LALR(1): there is an LL(1) grammar that
is not LALR(1)'

And a lot of people state that LL(1) < LALR(1) is 'essentially' true.

Wim Pijls writes in his paper 'unifying LL and LR' that LL(1) is a
subclass of LALR(1) but he seems to have removed the problematic
cases. (Once more, I have not checked the proofs since it wasn't the
point I was interested in when reading that paper).

        Diego Olivier

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-27 21:12   ` Pierre Weis
  2003-05-28  9:20     ` Michal Moskal
@ 2003-05-28 20:34     ` Alain.Frisch
  1 sibling, 0 replies; 12+ messages in thread
From: Alain.Frisch @ 2003-05-28 20:34 UTC (permalink / raw)
  To: Pierre Weis; +Cc: Caml list

On Tue, 27 May 2003, Pierre Weis wrote:

> [...]
> > ocamlyacc gives you better error reporting (about conflicts) and
> > strictly bigger set of languages it can recognize.
>
> Interesting. Could you please, give usa proof of this fact (which
> is new to me BTW) ?

AFAIK, ocamlyacc gives some information about conflicts, and Camlp4
doesn't at all, right ?

-- Alain

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-28 10:24         ` Michal Moskal
@ 2003-05-28 15:45           ` brogoff
  0 siblings, 0 replies; 12+ messages in thread
From: brogoff @ 2003-05-28 15:45 UTC (permalink / raw)
  To: Michal Moskal; +Cc: Diego Olivier Fernandez Pons, caml-list

On Wed, 28 May 2003, Michal Moskal wrote:
> On Wed, May 28, 2003 at 11:37:03AM +0200, Diego Olivier Fernandez Pons wrote:
> >     Bonjour,
> > 
> > > Sorry, I thought camlp4 recognizes LL(1) languages, and my dragon book
> > > copy states that LR(1) > LL(1) (I'm not sure about LARL(1) though).
> > 
> > My Appel states that LL(1) is not included in LALR(1).
> 
> Oh, so by proof-by-book you're right :-) 

That's pretty funny, but those Romans did have this case covered. I believe 
the Latin expression is "argumentum ad verecundiam". 

> But in practice LARL(1) seems
> more usefull for parsing, at parsing least programming languages.

This is esoteric flame bait for the parsing crowd. I happen to think that 
recursive descent is the best way to write parsers, but note that recursive 
descent parsers are capable of parsing non-LL(1) grammars, even without the 
fairly obvious hacks. As a real world proof by example, consider that Ada is 
often cited as a language not amenable to RDP (Fraser&Hanson : "For example, 
C is in the class of languages that can be recognized by recursive descent 
parsers, but other languages, like ADA (sic), are not."), yet somehow the 
authors of GNAT were ignorant enough to make it work. 

In the real world, you'll need to be proficient with YACC and RDP, of course. 

-- Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-23  9:26 ` Michal Moskal
  2003-05-26 12:28   ` Damien Doligez
  2003-05-27 21:12   ` Pierre Weis
@ 2003-05-28 10:48   ` Anton Moscal
  2 siblings, 0 replies; 12+ messages in thread
From: Anton Moscal @ 2003-05-28 10:48 UTC (permalink / raw)
  To: Michal Moskal; +Cc: caml-list

Michal Moskal wrote:

>>One question:
>>  Why should I use parser keyword instead of ocamlyacc?
> 
> 
> ocamlyacc gives you better error reporting (about conflicts) and
> strictly bigger set of languages it can recognize. But keyword parser
> build with camlp4 libraries can be modified at runtime. It also provides
> some higher-level constructs like LIST0.
> 
There is another useful feature of camlp4 - you can manually write 
parsing function which can perform  lookahead to any number of tokens 
and use criteria of arbitrary complexity. This provides simply and 
efficient workaround for resolving ambiguities.

(or use some external information to implement context dependecies - for 
example - I can easily write rule type_identifier, which can succeed 
only if current lexem is name which was declared before as type name) -

Example: "guard" rule, which don't eat any tokens from input strean and 
succeed iff next two tokens are:
"[ <int number>"
or "[]"
or "[..."



let is_bound =
   Grammar.Entry.of_parser gram "[is_bound]"
     (fun strm ->
       match Stream.npeek 2 strm with
       | ("", "[")::("INT", _)::_
       |	("", "[")::("", "]")::_
       |	("", "[")::("", "...")::_ -> ()
       | _ -> raise Stream.Failure
     )

usage:

....
  | LEFTA [ t=SELF; is_bound; "["; bounds=LIST0 bound SEP ","; "]"
         -> Type.Array (t, bounds)
....


Anton Moscal

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-28  9:37       ` Diego Olivier Fernandez Pons
@ 2003-05-28 10:24         ` Michal Moskal
  2003-05-28 15:45           ` brogoff
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Moskal @ 2003-05-28 10:24 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list

On Wed, May 28, 2003 at 11:37:03AM +0200, Diego Olivier Fernandez Pons wrote:
>     Bonjour,
> 
> > Sorry, I thought camlp4 recognizes LL(1) languages, and my dragon book
> > copy states that LR(1) > LL(1) (I'm not sure about LARL(1) though).
> 
> My Appel states that LL(1) is not included in LALR(1).

Oh, so by proof-by-book you're right :-) But in practice LARL(1) seems
more usefull for parsing, at parsing least programming languages.

-- 
: Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv
: PLD Linux ::::::::: Wroclaw University, CS Dept : {E-,w}-- {b++,e}>+++ h

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-28  9:20     ` Michal Moskal
@ 2003-05-28  9:37       ` Diego Olivier Fernandez Pons
  2003-05-28 10:24         ` Michal Moskal
  0 siblings, 1 reply; 12+ messages in thread
From: Diego Olivier Fernandez Pons @ 2003-05-28  9:37 UTC (permalink / raw)
  To: Michal Moskal; +Cc: caml-list

    Bonjour,

> Sorry, I thought camlp4 recognizes LL(1) languages, and my dragon book
> copy states that LR(1) > LL(1) (I'm not sure about LARL(1) though).

My Appel states that LL(1) is not included in LALR(1).

        Diego Olivier

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-27 21:12   ` Pierre Weis
@ 2003-05-28  9:20     ` Michal Moskal
  2003-05-28  9:37       ` Diego Olivier Fernandez Pons
  2003-05-28 20:34     ` Alain.Frisch
  1 sibling, 1 reply; 12+ messages in thread
From: Michal Moskal @ 2003-05-28  9:20 UTC (permalink / raw)
  To: Pierre Weis; +Cc: ll189417, caml-list

On Tue, May 27, 2003 at 11:12:32PM +0200, Pierre Weis wrote:
> [...]
> > ocamlyacc gives you better error reporting (about conflicts) and
> > strictly bigger set of languages it can recognize.
> 
> Interesting. Could you please, give usa proof of this fact (which
> is new to me BTW) ?

Sorry, I thought camlp4 recognizes LL(1) languages, and my dragon book
copy states that LR(1) > LL(1) (I'm not sure about LARL(1) though). 

-- 
: Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv
: PLD Linux ::::::::: Wroclaw University, CS Dept : {E-,w}-- {b++,e}>+++ h

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-23  9:26 ` Michal Moskal
  2003-05-26 12:28   ` Damien Doligez
@ 2003-05-27 21:12   ` Pierre Weis
  2003-05-28  9:20     ` Michal Moskal
  2003-05-28 20:34     ` Alain.Frisch
  2003-05-28 10:48   ` Anton Moscal
  2 siblings, 2 replies; 12+ messages in thread
From: Pierre Weis @ 2003-05-27 21:12 UTC (permalink / raw)
  To: Michal Moskal; +Cc: ll189417, caml-list

[...]
> ocamlyacc gives you better error reporting (about conflicts) and
> strictly bigger set of languages it can recognize.

Interesting. Could you please, give usa proof of this fact (which
is new to me BTW) ?

Thanks in advance,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-23  9:26 ` Michal Moskal
@ 2003-05-26 12:28   ` Damien Doligez
  2003-05-27 21:12   ` Pierre Weis
  2003-05-28 10:48   ` Anton Moscal
  2 siblings, 0 replies; 12+ messages in thread
From: Damien Doligez @ 2003-05-26 12:28 UTC (permalink / raw)
  To: Michal Moskal; +Cc: Lukasz Lew, caml-list

On Friday, May 23, 2003, at 11:26 AM, Michal Moskal wrote:

> ocamlyacc gives you better error reporting (about conflicts) and
> strictly bigger set of languages it can recognize.

This is not true.  First, LALR(1) is not a superset of LL(1);
second, Camlp4 implements more than LL(1), for example with
parameterised parsers.

-- Damien

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] Ocamlyacc vs stream parser
  2003-05-23  9:07 Lukasz Lew
@ 2003-05-23  9:26 ` Michal Moskal
  2003-05-26 12:28   ` Damien Doligez
                     ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Michal Moskal @ 2003-05-23  9:26 UTC (permalink / raw)
  To: Lukasz Lew; +Cc: caml-list

On Fri, May 23, 2003 at 11:07:27AM +0200, Lukasz Lew wrote:
> Hi
> One question:
>   Why should I use parser keyword instead of ocamlyacc?

ocamlyacc gives you better error reporting (about conflicts) and
strictly bigger set of languages it can recognize. But keyword parser
build with camlp4 libraries can be modified at runtime. It also provides
some higher-level constructs like LIST0.

-- 
: Michal Moskal :: http://www.kernel.pl/~malekith : GCS {C,UL}++++$ a? !tv
: PLD Linux ::::::::: Wroclaw University, CS Dept : {E-,w}-- {b++,e}>+++ h

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Caml-list] Ocamlyacc vs stream parser
@ 2003-05-23  9:07 Lukasz Lew
  2003-05-23  9:26 ` Michal Moskal
  0 siblings, 1 reply; 12+ messages in thread
From: Lukasz Lew @ 2003-05-23  9:07 UTC (permalink / raw)
  To: caml-list

Hi
One question:
  Why should I use parser keyword instead of ocamlyacc?

Regards
Lukasz Lew

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2003-06-10  9:03 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-06-05 14:02 [Caml-list] Ocamlyacc vs stream parser Diego Olivier Fernandez Pons
2003-06-10  9:03 ` Diego Olivier Fernandez Pons
  -- strict thread matches above, loose matches on Subject: below --
2003-05-23  9:07 Lukasz Lew
2003-05-23  9:26 ` Michal Moskal
2003-05-26 12:28   ` Damien Doligez
2003-05-27 21:12   ` Pierre Weis
2003-05-28  9:20     ` Michal Moskal
2003-05-28  9:37       ` Diego Olivier Fernandez Pons
2003-05-28 10:24         ` Michal Moskal
2003-05-28 15:45           ` brogoff
2003-05-28 20:34     ` Alain.Frisch
2003-05-28 10:48   ` Anton Moscal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).