caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* yacc question
@ 2005-12-04 14:15 skaller
  2005-12-04 15:03 ` [Caml-list] " Robert W.
  2005-12-04 15:25 ` Brian Hurt
  0 siblings, 2 replies; 5+ messages in thread
From: skaller @ 2005-12-04 14:15 UTC (permalink / raw)
  To: caml-list

I have the 'usual' kind of parser for expressions, with two
nonterminals 'expr' and 'atom', the latter including ( expr ) 
and INTEGER of course.

When I have input like

1;
1 + 2 ;
(1 + 2) ;

none of the case parse as expressions, the first and
last do parse as atoms (leaving the ; in the buffer).

What I want is to parse the longest possible match.
The only way I can think of doing this is:

top_expr:
  | expr token_not_allowed_in_expressions

and then 'put back' the trailing token into the buffer.
Is there another way?

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] yacc question
  2005-12-04 14:15 yacc question skaller
@ 2005-12-04 15:03 ` Robert W.
  2005-12-04 16:46   ` skaller
  2005-12-04 15:25 ` Brian Hurt
  1 sibling, 1 reply; 5+ messages in thread
From: Robert W. @ 2005-12-04 15:03 UTC (permalink / raw)
  To: caml-list

On Mon, Dec 05, 2005 at 01:15:40AM +1100, skaller wrote:
> I have the 'usual' kind of parser for expressions, with two
> nonterminals 'expr' and 'atom', the latter including ( expr ) 
> and INTEGER of course.
> 
> When I have input like
> 
> 1;
> 1 + 2 ;
> (1 + 2) ;
> 
> none of the case parse as expressions, the first and
> last do parse as atoms (leaving the ; in the buffer).
> 
> What I want is to parse the longest possible match.
> The only way I can think of doing this is:
> 

ocamlyacc parses longest match by default.
Your rule for atoms seem to complex or you lexer isn't able to extract
the token for atoms correctly from the line.

>   | expr token_not_allowed_in_expressions
> 
> and then 'put back' the trailing token into the buffer.
> Is there another way?
> 
This shouldn't be necessary, normally you can redesign your parser
rules to avoid this.

-- 

	Robert...


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] yacc question
  2005-12-04 14:15 yacc question skaller
  2005-12-04 15:03 ` [Caml-list] " Robert W.
@ 2005-12-04 15:25 ` Brian Hurt
  2005-12-04 16:37   ` skaller
  1 sibling, 1 reply; 5+ messages in thread
From: Brian Hurt @ 2005-12-04 15:25 UTC (permalink / raw)
  To: skaller; +Cc: caml-list



On Mon, 5 Dec 2005, skaller wrote:

> I have the 'usual' kind of parser for expressions, with two
> nonterminals 'expr' and 'atom', the latter including ( expr )
> and INTEGER of course.
>
> When I have input like
>
> 1;
> 1 + 2 ;
> (1 + 2) ;
>
> none of the case parse as expressions, the first and
> last do parse as atoms (leaving the ; in the buffer).
>
> What I want is to parse the longest possible match.
> The only way I can think of doing this is:
>
> top_expr:
>  | expr token_not_allowed_in_expressions
>
> and then 'put back' the trailing token into the buffer.
> Is there another way?
>

The standard way to implement this is:

statement:
 	expression SEMICOLON

expression:
 	| mul_expression
 	| expression PLUS mul_expression
 	| expression MINUS mul_expression

mul_expression:
 	atom_expression
 	| mul_expression TIMES atom_expression
 	| mul_expression DIVIDE atom_expression
 	| mul_expression MODULO atom_expression

atom_expression:
 	ATOM
 	| OPEN_PAREN expression CLOSE_PAREN

Notice that precedence is taken care of immediately- 3 + 4 * 5 is parsed 
as 3 + (4 * 5).  Also notice that the semicolon is what terminates us- we 
keep collecting up an expression until we see a semicolon- only then do we 
complete the statement.

Hope this helps.

Brian


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] yacc question
  2005-12-04 15:25 ` Brian Hurt
@ 2005-12-04 16:37   ` skaller
  0 siblings, 0 replies; 5+ messages in thread
From: skaller @ 2005-12-04 16:37 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

On Sun, 2005-12-04 at 09:25 -0600, Brian Hurt wrote:
> 
> On Mon, 5 Dec 2005, skaller wrote:
> 
> > I have the 'usual' kind of parser for expressions, with two
> > nonterminals 'expr' and 'atom', the latter including ( expr )
> > and INTEGER of course.
> >
> > When I have input like
> >
> > 1;
> > 1 + 2 ;
> > (1 + 2) ;
> >
> > none of the case parse as expressions, the first and
> > last do parse as atoms (leaving the ; in the buffer).
> >
> > What I want is to parse the longest possible match.
> > The only way I can think of doing this is:
> >
> > top_expr:
> >  | expr token_not_allowed_in_expressions
> >
> > and then 'put back' the trailing token into the buffer.
> > Is there another way?
> >
> 
> The standard way to implement this is:
> 
> statement:
>  	expression SEMICOLON
> 
> expression:
>  	| mul_expression
>  	| expression PLUS mul_expression
>  	| expression MINUS mul_expression
> 
> mul_expression:
>  	atom_expression
>  	| mul_expression TIMES atom_expression
>  	| mul_expression DIVIDE atom_expression
>  	| mul_expression MODULO atom_expression
> 
> atom_expression:
>  	ATOM
>  	| OPEN_PAREN expression CLOSE_PAREN
> 
> Notice that precedence is taken care of immediately- 3 + 4 * 5 is parsed 
> as 3 + (4 * 5).  Also notice that the semicolon is what terminates us- we 
> keep collecting up an expression until we see a semicolon- only then do we 
> complete the statement.

Expressions cannot be statements, but I have instead:

statement:
	NAME LEFT_ARROW expression SEMICOLON

for example, so basically your example holds with a modification.

The problem is I want to parse a *user defined statement*.
This will be done by a preprocessor definition, something like:

#keyword within
#statement select expr within statement ;

This makes 'select' and 'within' keywords, when ocamlyacc
sees the user defined statement keyword 'select', it looks
up a table and finds the grammar production. I then parse
by recursive descent -- backtracking is cool in an FPL,
my tokens are a list, so I can backtrack easily.

When interpreting the grammar, if I see the token 'expr'
then I parse an expression using the ocamlyacc entry point.
Similarly for statements -- in particular, since the 
user defined statements can be 'added' into the statement
grammar, recursion in the 'within' clause will also include
the new statements (the recursion is covariant?)

There is no problem with statements .. but for expressions,
I cannot just terminate on SEMICOLON -- I have to terminate
on many symbols, which can't be part of an expression --
and leave them in the token buffer too. In the example
grammar, 'within' terminates the expression.

The problem of course is that Ocamlyacc doesn't know about
the new grammar .. but I'm still using it to parse expressions.

Mixing LALR1 and an RD parser seems messy .. but I see
no other way to obtain high performance for 'the usual case'
whilst still allowing extensibility -- Ocaml doesn't come
with an LL parser generator.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] yacc question
  2005-12-04 15:03 ` [Caml-list] " Robert W.
@ 2005-12-04 16:46   ` skaller
  0 siblings, 0 replies; 5+ messages in thread
From: skaller @ 2005-12-04 16:46 UTC (permalink / raw)
  To: Robert W.; +Cc: caml-list

On Sun, 2005-12-04 at 16:03 +0100, Robert W. wrote:

> ocamlyacc parses longest match by default.
> Your rule for atoms seem to complex or you lexer isn't able to extract
> the token for atoms correctly from the line.

When I parse atoms only, it all works fine. The problem
is that if the user specifies a production like:

#statement select expr within statement ;

then if I parse 'expr' with 'atom' this will not be allowed:

select 1 + 2 within print x;

because '1 + 2' isn't an atom. The user would be forced to write:

select (1 + 2) within print x;

But if I parse with 'expr' instead of atom,
the parse fails when it hits the unknown symbol 'within'.

> >   | expr token_not_allowed_in_expressions
> > 
> > and then 'put back' the trailing token into the buffer.
> > Is there another way?
> > 
> This shouldn't be necessary, normally you can redesign your parser
> rules to avoid this.

The problem is I'm trying to add 'on the fly' user 
defined grammar productions -- so the 'grammar' is 
extensible. This will be done by recursive descent,
but hooked inside, and hooking, the existing ocamlyacc
grammar.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-12-04 16:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-04 14:15 yacc question skaller
2005-12-04 15:03 ` [Caml-list] " Robert W.
2005-12-04 16:46   ` skaller
2005-12-04 15:25 ` Brian Hurt
2005-12-04 16:37   ` skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).