using different lexers with one parser?

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* using different lexers with one parser?
@ 2007-04-17 20:33 Oliver Bandel
  2007-04-18  7:38 ` [Caml-list] " Hendrik Tews
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver Bandel @ 2007-04-17 20:33 UTC (permalink / raw)
  To: caml-list

Hello,


using many parsers with ocamlyacc
is possible, because ocamlyacc provides the possibility
forusing moire than one start-token.

Using many lexers is possible with ocamllex,
because each lexer is a seperated function.

Is it possible (without too much effort) to switch
the lexer during parsing (from within the parser)?

Or is better to 
  a) implement calling of different lexers in the *.mll file
or
  b) use many starting-points of the *.mly-file and
     work on the data from the outside of the parser
     (I'm meaning here: the function that calls the
      yacc-grammar-rule (start-rule)?
     So: does it rather make sense to make a higher-level
     parse on the data that the *.mly-parser gives back?!

BTW: The data (line-based) I want to parse could otherwise
     be parsed relatively easy with regexp's; maybe
     this is the better way?
     Or is there a way to convert such context-dependend
     things to ocamllex/ocamlyacc-syntaxes?
     (It seems to me that similar problems arise, when parsing
      files like picture-files or movie-files (mpg or so).)

TIA,
   Oliver


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] using different lexers with one parser?
  2007-04-17 20:33 using different lexers with one parser? Oliver Bandel
@ 2007-04-18  7:38 ` Hendrik Tews
  2007-04-18 10:01   ` Oliver Bandel
  0 siblings, 1 reply; 5+ messages in thread
From: Hendrik Tews @ 2007-04-18  7:38 UTC (permalink / raw)
  To: caml-list

Oliver Bandel <oliver@first.in-berlin.de> writes:

   Is it possible (without too much effort) to switch
   the lexer during parsing (from within the parser)?

Yes, see
http://caml.inria.fr/pub/ml-archives/caml-list/2003/09/3e7f3495840e2bc851b91c3dba8abab9.en.html

The main problem is the lookahead token.

Bye,

Hendrik


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] using different lexers with one parser?
  2007-04-18  7:38 ` [Caml-list] " Hendrik Tews
@ 2007-04-18 10:01   ` Oliver Bandel
  2007-04-18 14:21     ` skaller
  0 siblings, 1 reply; 5+ messages in thread
From: Oliver Bandel @ 2007-04-18 10:01 UTC (permalink / raw)
  To: caml-list

On Wed, Apr 18, 2007 at 09:38:20AM +0200, Hendrik Tews wrote:
> Oliver Bandel <oliver@first.in-berlin.de> writes:
> 
>    Is it possible (without too much effort) to switch
>    the lexer during parsing (from within the parser)?
> 
> Yes, see
> http://caml.inria.fr/pub/ml-archives/caml-list/2003/09/3e7f3495840e2bc851b91c3dba8abab9.en.html
> 
[...]

OK; I hoped to find a solution without the global-switching-var hacks;
something that is built in in ocamlyacc.

Maybe formy purposes then it's better to write my own
parser and call ocamllex-generated functions directly.


> The main problem is the lookahead token.

OK; then I better hackmy own parser.
As the fileformat is not too hard,
this should be the better way.

Ciao,
   Oliver


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] using different lexers with one parser?
  2007-04-18 10:01   ` Oliver Bandel
@ 2007-04-18 14:21     ` skaller
  2007-04-19  7:42       ` Hendrik Tews
  0 siblings, 1 reply; 5+ messages in thread
From: skaller @ 2007-04-18 14:21 UTC (permalink / raw)
  To: Oliver Bandel; +Cc: caml-list

On Wed, 2007-04-18 at 12:01 +0200, Oliver Bandel wrote:
> On Wed, Apr 18, 2007 at 09:38:20AM +0200, Hendrik Tews wrote:
> > Oliver Bandel <oliver@first.in-berlin.de> writes:
> > 
> >    Is it possible (without too much effort) to switch
> >    the lexer during parsing (from within the parser)?
> > 
> > Yes, see
> > http://caml.inria.fr/pub/ml-archives/caml-list/2003/09/3e7f3495840e2bc851b91c3dba8abab9.en.html
> > 
> [...]
> 
> OK; I hoped to find a solution without the global-switching-var hacks;
> something that is built in in ocamlyacc.

There is a solution without global variables.

Referring to Hendrick's example, write:

rule lexer global_var =
   parse
     | ""          { match !global_var with
	                | Xlex -> xlex global_var lexbuf
                        | Ylex -> ylex global_var lexbuf
                   }

instead. Now, you must change at least some --
but I recommend all -- of your tokens to include
the global_var, for example

%token TOKEN<bool ref * int>

and xlex global var = 
  parse 
    | ... { TOKEN (global_var,42) }  

Now in any ocamlyacc action code you can match write this

  | TOKEN ... { let global_var, attrib = $1 in ... }

to get the variable into the action code, and now you
can change it.

This leaves the problem of the lookahead token, which
may or may not exist. You can predict whether it exists
for each reduction. You can do this 'mentally' by asking
"Does this production have a definite set of terminators,
(no lookahead) or does it rely on bumping 
into something unrecognizable (lookahead)"?

[If you can't tell .. you must refactor your grammar
so you can]

If the lookahead exists, you must 'put it back' into the
lexbuf. At present you can do this by studying the implementation
details in Lexing module. The lexeme is certain to be IN the
buffer, so it is safe to reset the pointers to the lexbuf state
before it was lexed.

To get at the lexbuf from the parser .. you have to modify
your tokens AGAIN to include the lexbuf.

I recommend considering using an Ocaml class to encapsulate
all the state information you need. Passing state from the
lexer to the parser this way is ugly but it does work
and is fully re-entrant.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] using different lexers with one parser?
  2007-04-18 14:21     ` skaller
@ 2007-04-19  7:42       ` Hendrik Tews
  0 siblings, 0 replies; 5+ messages in thread
From: Hendrik Tews @ 2007-04-19  7:42 UTC (permalink / raw)
  To: caml-list

skaller <skaller@users.sourceforge.net> writes:

   This leaves the problem of the lookahead token, which
   may or may not exist. You can predict whether it exists
   for each reduction. You can do this 'mentally' by asking
   "Does this production have a definite set of terminators,
   (no lookahead) or does it rely on bumping 
   into something unrecognizable (lookahead)"?

I believe that in the grammar.output file you can pretty easy
see, which rules require a lookahead token. A state with two
reduce rules or one shift and one reduce rules obviously needs a
lookahead token. You would have to check that those reductions
that change the lexer are done in a state with precisely one
reduction. Of course you would have to do that every time you
change the grammar... 

Bye,

Hendrik

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-04-19  7:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-04-17 20:33 using different lexers with one parser? Oliver Bandel
2007-04-18  7:38 ` [Caml-list] " Hendrik Tews
2007-04-18 10:01   ` Oliver Bandel
2007-04-18 14:21     ` skaller
2007-04-19  7:42       ` Hendrik Tews

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).