[Caml-list] On ocamlyacc and ocamllex

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* [Caml-list] On ocamlyacc and ocamllex
@ 2001-09-22 21:09 Vesa Karvonen
  2001-09-23  1:10 ` Christian Lindig
  2001-09-24  1:05 ` Christian RINDERKNECHT
  0 siblings, 2 replies; 12+ messages in thread
From: Vesa Karvonen @ 2001-09-22 21:09 UTC (permalink / raw)
  To: caml-list

Hi,

Ocamlyacc with the standard Parsing module seems to generate non-reentrant
parsers, because the env is stored as a global variable in the Parsing module.
(BTW: Having a reentrant Parsing module would also eliminate the need for the
clear_parser function.)

Ocamllex with the Lexing module generates a lexer that seems to be difficult
to extend without using global variables. You may wonder what I mean with
extending the lexer. I would like to make it so that the lexer would record
the positions of line breaks so that I could directly give line number and
column information in error messages. I would like to effectively augment the
lexbuf with additional fields and write new functions for the augmented lexbuf
(functions manipulating the new fields).

Has someone written extensible and reentrant lexing and parsing modules for
Ocaml?

Is someone interested in such lexer and parser modules?

I was thinking about writing replacements for the Lexing and Parsing modules
that would use the (lex_)engine and parse_engine external functions. The
replacements would effectively be parameterized layers on top of some of the
facilities in Lexing and Parsing modules. In addition I also thought about
writing a program that modifies the files generated by ocamllex and ocamlyacc
so that they can be used with the replacement Lexing and Parsing modules.

Are the some hidden problems with the above approach?

Alternatively, the tools and modules of the Ocaml distribution could be
modified, but this isn't necessarily an easy route.

As my project schedule is tight and I may be able to live with the global
variables, I don't necessarily want to spend time on this, but if there are
others interested in such modules, I might put in some extra time to implement
them.

...

Another issue with ocamllex and ocamlyacc (and lex/flex and yacc/bison) is
that the dependencies between the generated lexer and parser are not quite
optimal. Currently the generated lexer is dependent on the parser, because the
parser generates the token type. This means that each time the grammar is
modified, but not the token definitions, the lexer is recompiled. This could
be avoided by making it so that the token type is defined in a separate
module.

Currently: lexer -> parser.token
Effect: lexer recompiled when grammar or token type is modified

Ideally: lexer -> token <- parser
Effect: lexer recompiled only when token type is modified

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-22 21:09 [Caml-list] On ocamlyacc and ocamllex Vesa Karvonen
@ 2001-09-23  1:10 ` Christian Lindig
  2001-09-23 16:27   ` Vesa Karvonen
  2001-09-24  1:05 ` Christian RINDERKNECHT
  1 sibling, 1 reply; 12+ messages in thread
From: Christian Lindig @ 2001-09-23  1:10 UTC (permalink / raw)
  To: Vesa Karvonen; +Cc: caml-list

On Sun, Sep 23, 2001 at 12:09:14AM +0300, Vesa Karvonen wrote:
> I would like to make it so that the lexer would record the positions
> of line breaks so that I could directly give line number and column
> information in error messages.

I agree that more flexible lexer and parser generators would be nice and
have myself lobbied for them in the past. On the other hand I have
always found my way with the existing ones which probably is the reason
that we still use them. 

The particular problem can be solved outside of Lex and Yacc: in the
Quick C-- compiler we have a mutable Sourcemap.map data type that
records the connection between character positions and
(file,line,column) triples. The scanner call a function Sourcemap.nl for
every newline that it encounters and to build up the connection. Later
the map can be used to find the (file,line,column) position for every
character offset. This method has the advantage that it can deal with
input streams that are created from different source files using a
pre-processor. You can find the module as part of the ocamlerror tool
that annotates stack traces with source code positions:

http://www.eecs.harvard.edu/~lindig/software/download/ocamlerror.tar.gz

> Another issue with ocamllex and ocamlyacc (and lex/flex and
> yacc/bison) is that the dependencies between the generated lexer and
> parser are not quite optimal. Currently the generated lexer is
> dependent on the parser, because the parser generates the token type.
> This means that each time the grammar is modified, but not the token
> definitions, the lexer is recompiled. This could be avoided by making
> it so that the token type is defined in a separate module.

This is a general problem with make: when you edit a comment, a file is
touched and all dependent files must be recompiled. Knowing nothing
about OCaml, Make must assume that a touched file has changed in a
significant way. You can help Make out by comparing files explicitly
(untested - you get the idea): foo.mli is only updated, if the token
type has changed.

foo.mli foo.ml:         bar.mly
                        ocamlyacc bar.mly
                        cp bar.ml  foo.ml
                        cmp -s bar.mli foo.mli || cp bar.mli foo.mli 

-- Christian    

-- 
Christian Lindig          Harvard University - EECS 
lindig@eecs.harvard.edu   33 Oxford St, MD 242, Cambridge MA 02138
phone: (617) 496-7157     http://www.eecs.harvard.edu/~lindig/   

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-23  1:10 ` Christian Lindig
@ 2001-09-23 16:27   ` Vesa Karvonen
  2001-09-23 17:44     ` Christian Lindig
  0 siblings, 1 reply; 12+ messages in thread
From: Vesa Karvonen @ 2001-09-23 16:27 UTC (permalink / raw)
  To: caml-list

From: "Christian Lindig" <lindig@eecs.harvard.edu>
> On Sun, Sep 23, 2001 at 12:09:14AM +0300, Vesa Karvonen wrote:
> > I would like to make it so that the lexer would record the positions
> > of line breaks so that I could directly give line number and column
> > information in error messages.
[snip snip]
> The particular problem can be solved outside of Lex and Yacc: in the
> Quick C-- compiler we have a mutable Sourcemap.map data type that
> records the connection between character positions and
> (file,line,column) triples.

This is basically the same technique that I have been using. The problem is
that the map has to be global, because the only context passed to the lexer
actions is the lexbuf. Furthermore, the records need to be manually removed
(in order to save memory) after a file has been processed completely and the
recorded connections for the file are no longer needed.

An extendable lexer makes it possible to extend the context passed to the
lexer actions so that globals can be avoided.

> I agree that more flexible lexer and parser generators would be nice and
> have myself lobbied for them in the past. On the other hand I have
> always found my way with the existing ones which probably is the reason
> that we still use them.

Replacing the Lex and Yacc modules turned out to be simpler than I thought.
I'm almost done with writing replacements for the Lexing and Parsing modules.
I have written replacement modules called Lex and Yacc. The Lex module defines
an abstract parameterized type lexbuf like this:

    type 't lexbuf
    val access : 't lexbuf -> 't
    val from_channel : in_channel -> 't -> 't lexbuf
    ...

It is now possible to make a simple module for tracking line numbers:

    type t
    val make : unit -> t
    val new_line_at_pos : t -> int -> unit
    val line_and_col_of_pos : t -> int -> int * int

And then extend the lexbuf with the line map:

    val from_channel : in_channel -> Line_map.t Lex.lexbuf
    val new_line : Line_map.t Lex.lexbuf -> unit
    ...

and use those functions in the lexer actions:

    '\n' { new_line lexbuf; token lexbuf; }
    ...

I have made it so that the ocamlyacc and ocalmlex generated files go through
sed commands which change the generated files to work with the Lex and Yacc
modules instead of the Lexing and Parsing modules.

> > Another issue with ocamllex and ocamlyacc (and lex/flex and
> > yacc/bison) is that the dependencies between the generated lexer and
> > parser are not quite optimal. Currently the generated lexer is
> > dependent on the parser, because the parser generates the token type.
> > This means that each time the grammar is modified, but not the token
> > definitions, the lexer is recompiled. This could be avoided by making
> > it so that the token type is defined in a separate module.
>
> This is a general problem with make: when you edit a comment, a file is
> touched and all dependent files must be recompiled.
[...]

I think that you slightly misunderstood.

The basic idea was to put the token type definition into a separate module.
Instead of two source files, you would have three source files:

    lexer.mll
    token.ml
    parser.mly

The token definition is now effectively demoted into its own module which is
now dependent upon by the lexer and parser modules.

In parser.mly there would be code that would tell ocamlyacc to look at
token.ml for the token type.

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-23 16:27   ` Vesa Karvonen
@ 2001-09-23 17:44     ` Christian Lindig
  2001-09-23 19:32       ` Vesa Karvonen
  2001-10-22 16:47       ` John Max Skaller
  0 siblings, 2 replies; 12+ messages in thread
From: Christian Lindig @ 2001-09-23 17:44 UTC (permalink / raw)
  To: Vesa Karvonen; +Cc: caml-list

On Sun, Sep 23, 2001 at 07:27:36PM +0300, Vesa Karvonen wrote:
> From: "Christian Lindig" <lindig@eecs.harvard.edu>
> > The particular problem can be solved outside of Lex and Yacc: in the
> > Quick C-- compiler we have a mutable Sourcemap.map data type that
> > records the connection between character positions and
> > (file,line,column) triples.
> 
> This is basically the same technique that I have been using. The problem is
> that the map has to be global, because the only context passed to the lexer
> actions is the lexbuf. 

You can pass the map to the lexer such that it does not has to be
global:

    rule token = parse
        eof         { fun map -> P.EOF          }
      | ws+         { fun map -> token lexbuf map }
      | tab         { fun map -> tab lexbuf map; token lexbuf map }
      | nl          { fun map -> nl lexbuf map ; token lexbuf map }
      | nl '#'      { fun map -> line lexbuf map 0; token lexbuf map }
      ....

The lexer built from the above specification takes a lexbuf and map as
arguments. 

> Furthermore, the records need to be manually removed (in order to save
> memory) after a file has been processed completely and the recorded
> connections for the file are no longer needed. 

I assume that in a functional programming style without a global mutable
value the garbage collector will remove the map once I cannot access it
any longer.

> The basic idea was to put the token type definition into a separate
> module.  Instead of two source files, you would have three source
> files:
> 
>     lexer.mll token.ml parser.mly

> In parser.mly there would be code that would tell ocamlyacc to look at
> token.ml for the token type.

Now you would have to keep the token type and the grammar up to dateup
to date manually.  The parser generator also needs more informations
than just the token types: precedences, associativity, and return types
are tied to a token - where do you keep them?. I still think that
generating the token type from the grammar is the easiest way. 

-- Christian

-- 
Christian Lindig          Harvard University - DEAS
lindig@eecs.harvard.edu   33 Oxford St, MD 242, Cambridge MA 02138
phone: +1 (617) 496-7157  http://www.eecs.harvard.edu/~lindig/
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-23 17:44     ` Christian Lindig
@ 2001-09-23 19:32       ` Vesa Karvonen
  2001-09-23 20:09         ` Christian Lindig
  2001-10-22 16:47       ` John Max Skaller
  1 sibling, 1 reply; 12+ messages in thread
From: Vesa Karvonen @ 2001-09-23 19:32 UTC (permalink / raw)
  To: caml-list

From: "Christian Lindig" <lindig@eecs.harvard.edu>
[snip]
> You can pass the map to the lexer such that it does not has to be
> global:
>
>     rule token = parse
>         eof         { fun map -> P.EOF          }
>       | ws+         { fun map -> token lexbuf map }
>       | tab         { fun map -> tab lexbuf map; token lexbuf map }
>       | nl          { fun map -> nl lexbuf map ; token lexbuf map }
>       | nl '#'      { fun map -> line lexbuf map 0; token lexbuf map }
>       ....
>
> The lexer built from the above specification takes a lexbuf and map as
> arguments.

That is neat, although there is a lot of repetition due to the function
definitions. Thanks for the tip, I'll have to try it. Can this technique be
used for adding context to parsers generated using ocamlyacc, too?

I'd prefer that the lexer generator would be extended so that additional
arguments could be added in a manner similar to this:

    rule token map = parse
        eof         { P.EOF }
      | ws+         { token map lexbuf }
      | tab         { tab map lexbuf; token map lexbuf }
      | nl          { nl map lexbuf ; token map lexbuf }
      | nl '#'      { line map lexbuf 0; token map lexbuf }
       ...

I think that it is quite common to need to pass additional context to the
lexer. Direct support for this in in the lexer generator is quite justified.

> > Furthermore, the records need to be manually removed (in order to save
> > memory) after a file has been processed completely and the recorded
> > connections for the file are no longer needed.
>
> I assume that in a functional programming style without a global mutable
> value the garbage collector will remove the map once I cannot access it
> any longer.

Yes. Using the above functional technique that goal can be achieved.

> > The basic idea was to put the token type definition into a separate
> > module.  Instead of two source files, you would have three source
> > files:
> >
> >     lexer.mll token.ml parser.mly
>
> > In parser.mly there would be code that would tell ocamlyacc to look at
> > token.ml for the token type.
>
> Now you would have to keep the token type and the grammar up to dateup
> to date manually.

The parser/lexer generators, and especially the Ocaml compiler, would of
course perform sufficient type checking. So, should any file get out of sync,
it would be noticed at the next compile.

I don't see how having the %token definitions in the .mly file would make the
process of keeping the token type and the grammar in sync more automatic. The
%token definitions are effectively a completely separate part of the .mly
file.

>  The parser generator also needs more informations
> than just the token types: precedences, associativity, and return types
> are tied to a token - where do you keep them?.

First of all, the .mly file would not have any %token definitions. The %token
definitions are basically an elaborated kludge for defining a variant type
named 'token' so that the parser generator does not have to understand Ocaml.

The %left, %right, %nonassoc (and %type) definitions would still be in the
.mly file. The information in %left, %right and %nonassoc definitions is not
intrinsic to the tokens. They are just a way of resolving ambiquities in the
grammar.

Also note that the grammar does not define the tokens, although the grammar
does depend on the abstract tokens. The sets of lexemes compromising tokens
are defined by the lexer.

> I still think that
> generating the token type from the grammar is the easiest way.

I agree that it may be somewhat easier for the parser generator, but I find
that separating the token type definition from the grammar definition can be
justified using quantitative technical arguments.

Separating the token type definition from the parser definition breaks the
undesirable dependency from the lexer to the parser. This has at least the
following benefits:

1. You trivially choose in which order you develop the lexer and parser.

Starting from the lexer, currently you either write a token type definition
and throw it away when you start writing the parser or begin by writing a
dummy parser from which the token type is generated.

2. Modifying the grammar does not result in recompiling the lexer.

Kludges, such as modifying file dates, will not be need to prevent
recompilation.

The only liability is that an additional file will be needed. However, the
type checking of the Ocaml compiler is already powerful enough to make sure
that the files do not get out of sync.

I recommend reading the chapters 3 to 7 from the book Large Scale C++ Software
Design by John Lakos, ISBN 0-201-63362-0. Although the language used in the
book is C++ (and management of physical dependencies is perhaps more important
in C++ than in Ocaml) the principles and techniques apply to practically every
programming language.

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-23 19:32       ` Vesa Karvonen
@ 2001-09-23 20:09         ` Christian Lindig
  2001-09-23 20:51           ` Vesa Karvonen
  2001-10-22 17:09           ` John Max Skaller
  0 siblings, 2 replies; 12+ messages in thread
From: Christian Lindig @ 2001-09-23 20:09 UTC (permalink / raw)
  To: Vesa Karvonen; +Cc: Caml Mailing List

On Sun, Sep 23, 2001 at 10:32:23PM +0300, Vesa Karvonen wrote:
> From: "Christian Lindig" <lindig@eecs.harvard.edu>
> I'd prefer that the lexer generator would be extended so that additional
> arguments could be added in a manner similar to this:
> 
>     rule token map = parse
>         eof         { P.EOF }
>       | ws+         { token map lexbuf }
>       | tab         { tab map lexbuf; token map lexbuf }
>       | nl          { nl map lexbuf ; token map lexbuf }
>       | nl '#'      { line map lexbuf 0; token map lexbuf }
>        ...

I lobbied for this three years ago and had a patch for ocamllex:

    http://www.eecs.harvard.edu/~lindig/software/lex-patch.html

> Can this technique be used for adding context to parsers generated
> using ocamlyacc, too?

I'm not sure what you mean here. A Yacc parser works bottom up - do you
want to inject "context" into the tokens that are received from the
lexer?

> I agree that it may be somewhat easier for the parser generator, but I
> find that separating the token type definition from the grammar
> definition can be justified using quantitative technical arguments.

I agree that this alternative avoids the dependency of the type
definition on the grammar. But I am not sure that manually keeping the
type definition and the %token declarations in the parser in sync is
better than automatic recompiles or a little Make hack.

-- Christian

-- 
Christian Lindig          Harvard University - DEAS
lindig@eecs.harvard.edu   33 Oxford St, MD 242, Cambridge MA 02138
phone: +1 (617) 496-7157  http://www.eecs.harvard.edu/~lindig/
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-23 20:09         ` Christian Lindig
@ 2001-09-23 20:51           ` Vesa Karvonen
  2001-10-22 17:09           ` John Max Skaller
  1 sibling, 0 replies; 12+ messages in thread
From: Vesa Karvonen @ 2001-09-23 20:51 UTC (permalink / raw)
  To: caml-list

From: "Christian Lindig" <lindig@eecs.harvard.edu>
> On Sun, Sep 23, 2001 at 10:32:23PM +0300, Vesa Karvonen wrote:

> I lobbied for this three years ago and had a patch for ocamllex:
>
>     http://www.eecs.harvard.edu/~lindig/software/lex-patch.html

Yes. I just visited your homepage. I find it odd that it has not yet been
adopted. Looking back at Ocaml mailing list archives, I think that this
extension should perhaps be proposed again. Perhaps it was caused by my search
paratemeters, but I found no responses from Ocaml team members to this
proposal. Perhaps they missed it or just forgot about it.

> > Can this technique be used for adding context to parsers generated
> > using ocamlyacc, too?
>
> I'm not sure what you mean here. A Yacc parser works bottom up - do you
> want to inject "context" into the tokens that are received from the
> lexer?

I want good error messages from the parser. I want the parser to have access
to the line number information generated by the lexer. A simple way to let the
parser have access to the information would be to pass it as a parameter to
the parsing action code.

So, is there a simple way to have a reentrant lexer and parser generated by
ocamllex and ocamlyacc that would have line number information?

I don't want to annotate all tokens by a line number. Furthermore, the Parsing
module needs to be replaced at any rate because it has global state.

The compiler we are implementing may be used by rather novice programmers (or
actually non-programmers), so I'm willing to spend extra time polishing just
the error messages, so that I don't have to spend the same amount of time (or
possibly a lot more) explaining the errors to the users.

> I agree that this alternative avoids the dependency of the type
> definition on the grammar. But I am not sure that manually keeping the
> type definition and the %token declarations in the parser in sync is
> better than automatic recompiles or a little Make hack.

The idea is that the .mly file would no longer have %token definitions.
Instead it would have a definition such as follows:

    %token_type My_token_module.my_token_type

This would cause ocamlyacc to read the my_token_type constructor names from
the My_token_type module.

Alternatively, the technique could be more low level. For instance:

    %token_type my_token_module.mli

Which would simply read the first type definition from the my_token.mli file.

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-22 21:09 [Caml-list] On ocamlyacc and ocamllex Vesa Karvonen
  2001-09-23  1:10 ` Christian Lindig
@ 2001-09-24  1:05 ` Christian RINDERKNECHT
  2001-09-24 11:17   ` Vesa Karvonen
  1 sibling, 1 reply; 12+ messages in thread
From: Christian RINDERKNECHT @ 2001-09-24  1:05 UTC (permalink / raw)
  To: Vesa Karvonen; +Cc: caml-list

Hi,

Vesa Karvonen wrote:
> Currently the generated lexer is dependent on the parser, because the
> parser generates the token type. This means that each time the grammar is
> modified, but not the token definitions, the lexer is recompiled. 

Sometimes ago, I wrote a software handling more than six lexers and
parsers, and hence maintenance was complex. So I decided to enhance
ocamllex with the following facilities through the command-line: 

  (1) It can produce a functorized lexer whose argument defines the
      tokens. You need to use to same %token clauses as in ocamlyacc.
      This way different parsers can share the same lexer (the only
      constraint is that the %token clause in the ocamllyacc-generated
      parsers must be given in the same order).

  (2) It is possible to specify a signature to be shared among
      different generated lexers.

  (3) If you don't want a functorized lexer to be produced, you can
      nevertheless specify the module defining the tokens (not
      necessarily the ocamlyacc-generated parser). This allows you to
      produce from the same ocamllex specification either a
      functorized lexer or a non-functorized lexer.

  (3bis) You can even allow the functorized lexer to import a module
      whose signature is given by a new clause: %import.

Note also that:

  (4) By default, the behaviour is exactly the same as for the
      distributed ocamllex.

  (5) I integrated the patch of Christian Lindig, allowing you to give
      arguments to your lexing rules (e.g. rule skip_line (loc) =
      parse ...). In think this will help you to write functionnal
      lexers in a readable way.

  (6) I documented the options in an enhanced version of the man page.

  (7) It is up-to-date with respect to the latest distribution of
      ocamllex. 

If you are interested, please send me an e-mail.

Best regards,

-- 

Christian

------------------------------------------------------------------------
Christian Rinderknecht                     Phone  +82 42 866 6147
Network Architecture Laboratory            Fax    +82 42 866 6154
Information and Communications University  WWW    http://nalab.icu.ac.kr
58-4 Hwaam-dong, Yuseong-gu, Daejeon,      e-mail rinderkn@icu.ac.kr
305-752, Korea
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-24  1:05 ` Christian RINDERKNECHT
@ 2001-09-24 11:17   ` Vesa Karvonen
  2001-10-22 17:24     ` John Max Skaller
  0 siblings, 1 reply; 12+ messages in thread
From: Vesa Karvonen @ 2001-09-24 11:17 UTC (permalink / raw)
  To: Christian RINDERKNECHT, caml-list

From: "Christian RINDERKNECHT" <rinderkn@hugo.int-evry.fr>
[...]
> Sometimes ago, I wrote a software handling more than six lexers and
> parsers, and hence maintenance was complex. So I decided to enhance
> ocamllex with the following facilities through the command-line:
[lots of useful features]
> If you are interested, please send me an e-mail.

Hi,

Based on your description, I wasn't able to determine whether your lexing
module also support extending the lexer state (like in Lindig's proposal)?

    http://www.eecs.harvard.edu/~lindig/software/lexing.html

At any rate, I'm definitely interested. However, I'd like to see these
features included in the ocamllex that comes with the Ocaml distribution.

Is it too much to ask that these ocamllex enhancements (including the
enhancements by Christian Lindig) would be proposed for the Ocaml
distribution?

If necessary, I'm willing to share some of the necessary work in order to get
these enhancements into the Ocaml distribution.

I've already made a reentrant replacement for the Parsing module. I think that
parsers could also benefit from the ability to extend the parser state.

Best regards,
  Vesa Karvonen

-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-23 17:44     ` Christian Lindig
  2001-09-23 19:32       ` Vesa Karvonen
@ 2001-10-22 16:47       ` John Max Skaller
  1 sibling, 0 replies; 12+ messages in thread
From: John Max Skaller @ 2001-10-22 16:47 UTC (permalink / raw)
  To: Christian Lindig; +Cc: Vesa Karvonen, caml-list

Christian Lindig wrote:
> You can pass the map to the lexer such that it does not has to be
> global:
> 
>     rule token = parse
>         eof         { fun map -> P.EOF          }
>       | ws+         { fun map -> token lexbuf map }
>       | tab         { fun map -> tab lexbuf map; token lexbuf map }
>       | nl          { fun map -> nl lexbuf map ; token lexbuf map }
>       | nl '#'      { fun map -> line lexbuf map 0; token lexbuf map }
>       ....

I use this technique, works fine. Pass an (OO style) object.
A bit boring writing 'fun map ->' in front of everything,
but it works.

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au 
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
New generation programming language Felix  http://felix.sourceforge.net
Literate Programming tool Interscript     
http://Interscript.sourceforge.net
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-23 20:09         ` Christian Lindig
  2001-09-23 20:51           ` Vesa Karvonen
@ 2001-10-22 17:09           ` John Max Skaller
  1 sibling, 0 replies; 12+ messages in thread
From: John Max Skaller @ 2001-10-22 17:09 UTC (permalink / raw)
  To: Christian Lindig; +Cc: Vesa Karvonen, Caml Mailing List

Christian Lindig wrote:

> > I agree that it may be somewhat easier for the parser generator, but I
> > find that separating the token type definition from the grammar
> > definition can be justified using quantitative technical arguments.
> 
> I agree that this alternative avoids the dependency of the type
> definition on the grammar. But I am not sure that manually keeping the
> type definition and the %token declarations in the parser in sync is
> better than automatic recompiles or a little Make hack.

Adding the tokens to the .mly file is a pain.
I have two other files which also need to list all tokens:
one to print them, and one to extract the source file/line/column
information. Both have to be manually tracked anyhow.

Worse, the %token command makes a normal union: how to use
polymorphic variants instead?

Even worse, the lexer and parser are improperly connected,
with the parser incorrectly taking a lexer and lexbuf as
arguments -- a right pain if you want to preprocess the
tokens. I have to create a dummy lexbuf/lexer to drive
the parser. The parser should take a callback function
as an argument. A patch to make this alternative calling
technique available would be useful.

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au 
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
New generation programming language Felix  http://felix.sourceforge.net
Literate Programming tool Interscript     
http://Interscript.sourceforge.net
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Caml-list] On ocamlyacc and ocamllex
  2001-09-24 11:17   ` Vesa Karvonen
@ 2001-10-22 17:24     ` John Max Skaller
  0 siblings, 0 replies; 12+ messages in thread
From: John Max Skaller @ 2001-10-22 17:24 UTC (permalink / raw)
  To: Vesa Karvonen; +Cc: Christian RINDERKNECHT, caml-list

Vesa Karvonen wrote:

> I've already made a reentrant replacement for the Parsing module. I think that
> parsers could also benefit from the ability to extend the parser state.

	Reentrancy is mandatory IMHO: need for multithread parsing.
For example DB server threads parsing SQL.

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au 
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
New generation programming language Felix  http://felix.sourceforge.net
Literate Programming tool Interscript     
http://Interscript.sourceforge.net
-------------------
Bug reports: http://caml.inria.fr/bin/caml-bugs  FAQ: http://caml.inria.fr/FAQ/
To unsubscribe, mail caml-list-request@inria.fr  Archives: http://caml.inria.fr


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2001-10-22 17:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-09-22 21:09 [Caml-list] On ocamlyacc and ocamllex Vesa Karvonen
2001-09-23  1:10 ` Christian Lindig
2001-09-23 16:27   ` Vesa Karvonen
2001-09-23 17:44     ` Christian Lindig
2001-09-23 19:32       ` Vesa Karvonen
2001-09-23 20:09         ` Christian Lindig
2001-09-23 20:51           ` Vesa Karvonen
2001-10-22 17:09           ` John Max Skaller
2001-10-22 16:47       ` John Max Skaller
2001-09-24  1:05 ` Christian RINDERKNECHT
2001-09-24 11:17   ` Vesa Karvonen
2001-10-22 17:24     ` John Max Skaller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).