caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* lexer, parser
@ 1999-06-03 11:55   ` John Skaller
  1999-06-12 12:02     ` Markus Mottl
  1999-06-14  7:15     ` Christian Lindig
  0 siblings, 2 replies; 5+ messages in thread
From: John Skaller @ 1999-06-03 11:55 UTC (permalink / raw)
  To: caml-list

Is there an 'object' version of the lexer and parser,
or can I use or adapt the existing code?

I need to maintain state, but the component also need
to be re-entrant.

-------------------------------------------------------
John Skaller    email: skaller@maxtal.com.au
		http://www.maxtal.com.au/~skaller
		phone: 61-2-96600850
		snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia





^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lexer, parser
  1999-06-03 11:55   ` lexer, parser John Skaller
@ 1999-06-12 12:02     ` Markus Mottl
  1999-06-15  2:01       ` Jacques GARRIGUE
  1999-06-14  7:15     ` Christian Lindig
  1 sibling, 1 reply; 5+ messages in thread
From: Markus Mottl @ 1999-06-12 12:02 UTC (permalink / raw)
  To: John Skaller; +Cc: OCAML

> Is there an 'object' version of the lexer and parser,
> or can I use or adapt the existing code?
> 
> I need to maintain state, but the component also need
> to be re-entrant.

I guess you want to have just the syntactic parts in the scanner
and parser files, but semantics shall be kept within objects (very
convenient).

My solution to this is to have the parser return streams of functions.
These functions accept as final argument an object which implements the
appropriate semantics.

E.g.:

module (file) "Foo":

  class semantics = object
    method do_something param = ...
    ...
  end

  type 'a trans_fun  = semantics -> 'a -> unit
  and  'a trans_strm = 'a trans_fun Stream.t

  let do_something param obj = obj#do_something param
  ...

The parser (e.g.):

  %start main
  %type <'a Foo.trans_strm> main

  main
    : left_part A_TOKEN { [< $1; 'Foo.do_something $2 >] }
    | A_TOKEN           { [< $1 >] }
    | ...

  left_part
    : ANOTHER_TOKEN     { [< $1 >] }
    | ...

The main program (e.g.):

  let main () =
    let lexbuf     = Lexing.from_channel stdin
    and sm         = new semantics in
    let msg_stream = Parser.main Scanner.start lexbuf
    Stream.iter (fun f -> f sm) msg_stream (* this executes semantics *)

It is possible to "rearrange" streams very fast, because substreams
can be combined arbitrarily (e.g. concatenated) without loss of
efficiency. This is very important in parsers. Thus, I prefer them over
lists (of functions).

Execution of "semantics" is also much faster than versions that use
algebraic datatypes and pattern matching, because here we only have to
call methods (wrapped in the functions of the stream) on objects instead
of match abstract syntax trees or else.

Another advantage: the streams can be directed to any kind of object that
matches the interface of "semantics". Thus, we could have "different"
semantics and have the parsed program evaluated over them.  This adds
greatly to the modularity of the semantics implementation.

I have found this approach very expressive, efficient and easy to maintain
- I hope it will also help you.

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lexer, parser
  1999-06-03 11:55   ` lexer, parser John Skaller
  1999-06-12 12:02     ` Markus Mottl
@ 1999-06-14  7:15     ` Christian Lindig
  1 sibling, 0 replies; 5+ messages in thread
From: Christian Lindig @ 1999-06-14  7:15 UTC (permalink / raw)
  To: John Skaller, Jean-Marc EBER; +Cc: Caml Mailing List

Quoting John Skaller (skaller@maxtal.com.au):
> I need to maintain state, but the component also need
> to be re-entrant.

Quoting Jean-Marc EBER (Jean-Marc.Eber@socgen.com):
> You should certainly be in contact with Christian Lindig, who has
> done interesting work on OCamlLex.
> 
> lindig@ips.cs.tu-bs.de
> http://www.cs.tu-bs.de/softech/people/lindig/index.html

Hi,

Like Jean-Marc Eber pointed out have I suggested two extension for
OCamlLex that makes maintaining state easier:

http://www.cs.tu-bs.de/softech/people/lindig/software/lex-patch.html

A patched version of OCamlLex allows for additional parameters to
rules:

    rule scan x y = parse 
    ...

Parameters like x and y make it easier to accumulate results across
many calls of the same or different rules in a lexer.  They help to
avoid global variables which make a scanner no longer re-entrant.

Using the patched version of OCamlLex is fairly uncritical because
only the generated code if different from the original version. 

Maintaining state across different invocations of a lexer requires
some more effort:

http://www.cs.tu-bs.de/softech/people/lindig/software/lexing.html

I have proposed a new Lexing module with a new `lexbuf' type.  The
lexbuf type is used by scanners to maintain state but currently does
not hold user provided state.  The proposed module permits to store
user state in lexbuf as well. 

Unfortunately is it not possible to simply exchange the new module for
the old.  The whole OCaml system must be recompiled with the new
module.  However, all old code compiles with the new system.

I still would like to see both proposals added to the OCaml system
since both are backward compatible at the source code level.

-- Christian

-- 
 Christian Lindig       Technische Universitaet Braunschweig, Germany
                        http://www.cs.tu-bs.de/softech/people/lindig
                        lindig@ips.cs.tu-bs.de phone:+49-531-391-3276




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lexer, parser
  1999-06-12 12:02     ` Markus Mottl
@ 1999-06-15  2:01       ` Jacques GARRIGUE
  1999-06-15 10:20         ` Markus Mottl
  0 siblings, 1 reply; 5+ messages in thread
From: Jacques GARRIGUE @ 1999-06-15  2:01 UTC (permalink / raw)
  To: mottl; +Cc: caml-list

From: Markus Mottl <mottl@miss.wu-wien.ac.at>

> Execution of "semantics" is also much faster than versions that use
> algebraic datatypes and pattern matching, because here we only have to
> call methods (wrapped in the functions of the stream) on objects instead
> of match abstract syntax trees or else.

Sorry to answer pinpoint, but I just want to avoid a confusion.

In caml pattern-matching is much more efficient than calling a method.
Calling a method is a dynamic operation, involving looking-up a table
and calling a possibly curried function, while pattern-matching is
completely statically compiled.

OO may help you structure your program, but when algebraic datatypes
are handy, I would suggest sticking with them.

	Jacques
---------------------------------------------------------------------------
Jacques Garrigue      Kyoto University     garrigue at kurims.kyoto-u.ac.jp
		<A HREF=http://wwwfun.kurims.kyoto-u.ac.jp/~garrigue/>JG</A>




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: lexer, parser
  1999-06-15  2:01       ` Jacques GARRIGUE
@ 1999-06-15 10:20         ` Markus Mottl
  0 siblings, 0 replies; 5+ messages in thread
From: Markus Mottl @ 1999-06-15 10:20 UTC (permalink / raw)
  To: Jacques GARRIGUE; +Cc: OCAML

> > Execution of "semantics" is also much faster than versions that use
> > algebraic datatypes and pattern matching, because here we only have to
> > call methods (wrapped in the functions of the stream) on objects instead
> > of match abstract syntax trees or else.
> 
> Sorry to answer pinpoint, but I just want to avoid a confusion.
> 
> In caml pattern-matching is much more efficient than calling a method.
> Calling a method is a dynamic operation, involving looking-up a table
> and calling a possibly curried function, while pattern-matching is
> completely statically compiled.

Sorry for having caused confusion - my statement above was indeed
misleading. I thought that the original writer wanted to use objects
anyway, so instead of calling methods and matching patterns in the object,
calling the methods alone would be faster even if there is a curried
function inbetween. Matching patterns without using objects would be
faster, of course - but in my eyes much less flexible.

Here an example:

---------------------------------------------------------------------------
(* definitions for version with algebraic datatypes *)
type action = Inc | Dec

class c1 = object
  val mutable x = 0
  method x = x
  method interpret = function
    | Inc -> x <- x + 1
    | Dec -> x <- x - 1
end

let action1 () =
  match Random.int 2 with
  | 0 -> Inc
  | 1 -> Dec
  | _ -> failwith "impossible pattern"

(* definitions for version with functions encapsulating method calls *)
class c2 = object
  val mutable x = 0
  method x = x
  method inc = x <- x + 1
  method dec = x <- x - 1
end

let inc obj = obj#inc
let dec obj = obj#inc

let action2 () =
  match Random.int 2 with
  | 0 -> inc
  | 1 -> dec
  | _ -> failwith "impossible pattern"

(* generate the stream - this is the optimized version with
   linear time behaviour *)
let rec action_stream action =
  Stream.lcons action (Stream.slazy (fun _ -> action_stream action))

(* uncomment the appropriate part to test the specific version *)
let _ =
  let o = new c1
  and strm = action_stream action1 in
  for i = 1 to 5000000 do Stream.next strm done;
(*
  let o = new c2
  and strm = action_stream action2 in
  for i = 1 to 5000000 do (Stream.next strm) o done;
*)
  Printf.printf "o#x: %d\n" o#x
---------------------------------------------------------------------------

If we count away the time that is spent during traversal of the stream,
the (compiled) version using encapsulated method calls is (on my machine)
a bit more than 20% faster.

In my opinion the version without pattern matching is much more natural:
although we have to define the encapsulating functions once, it is never
necessary to explicitely match the action type - we could thus easily
redirect the stream to other objects that match the signature.

The OO-version is much easier to maintain, because the different actions
are really different methods, whereas in the other version the correct
"semantics" has to be chosen in one function (where the patterns are
matched). Thus, we can (by inheritence, parameterized objects, etc..)
easily use abstraction mechanism for generating new "semantics".

Best regards,
Markus Mottl

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl




^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~1999-06-15 20:48 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0580637621241002*/c=FR/admd=ATLAS/prmd=SG/o=INFI/s=EBER/g=JEAN-MARC/@MHS>
     [not found] ` <0579137620FCB001*/c=FR/admd=ATLAS/prmd=SG/o=INFI/s=EBER/g=JEAN-MARC/@MHS>
1999-06-03 11:55   ` lexer, parser John Skaller
1999-06-12 12:02     ` Markus Mottl
1999-06-15  2:01       ` Jacques GARRIGUE
1999-06-15 10:20         ` Markus Mottl
1999-06-14  7:15     ` Christian Lindig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).