* lexer, parser @ 1999-06-03 11:55 ` John Skaller 1999-06-12 12:02 ` Markus Mottl 1999-06-14 7:15 ` Christian Lindig 0 siblings, 2 replies; 5+ messages in thread From: John Skaller @ 1999-06-03 11:55 UTC (permalink / raw) To: caml-list Is there an 'object' version of the lexer and parser, or can I use or adapt the existing code? I need to maintain state, but the component also need to be re-entrant. ------------------------------------------------------- John Skaller email: skaller@maxtal.com.au http://www.maxtal.com.au/~skaller phone: 61-2-96600850 snail: 10/1 Toxteth Rd, Glebe NSW 2037, Australia ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: lexer, parser 1999-06-03 11:55 ` lexer, parser John Skaller @ 1999-06-12 12:02 ` Markus Mottl 1999-06-15 2:01 ` Jacques GARRIGUE 1999-06-14 7:15 ` Christian Lindig 1 sibling, 1 reply; 5+ messages in thread From: Markus Mottl @ 1999-06-12 12:02 UTC (permalink / raw) To: John Skaller; +Cc: OCAML > Is there an 'object' version of the lexer and parser, > or can I use or adapt the existing code? > > I need to maintain state, but the component also need > to be re-entrant. I guess you want to have just the syntactic parts in the scanner and parser files, but semantics shall be kept within objects (very convenient). My solution to this is to have the parser return streams of functions. These functions accept as final argument an object which implements the appropriate semantics. E.g.: module (file) "Foo": class semantics = object method do_something param = ... ... end type 'a trans_fun = semantics -> 'a -> unit and 'a trans_strm = 'a trans_fun Stream.t let do_something param obj = obj#do_something param ... The parser (e.g.): %start main %type <'a Foo.trans_strm> main main : left_part A_TOKEN { [< $1; 'Foo.do_something $2 >] } | A_TOKEN { [< $1 >] } | ... left_part : ANOTHER_TOKEN { [< $1 >] } | ... The main program (e.g.): let main () = let lexbuf = Lexing.from_channel stdin and sm = new semantics in let msg_stream = Parser.main Scanner.start lexbuf Stream.iter (fun f -> f sm) msg_stream (* this executes semantics *) It is possible to "rearrange" streams very fast, because substreams can be combined arbitrarily (e.g. concatenated) without loss of efficiency. This is very important in parsers. Thus, I prefer them over lists (of functions). Execution of "semantics" is also much faster than versions that use algebraic datatypes and pattern matching, because here we only have to call methods (wrapped in the functions of the stream) on objects instead of match abstract syntax trees or else. Another advantage: the streams can be directed to any kind of object that matches the interface of "semantics". Thus, we could have "different" semantics and have the parsed program evaluated over them. This adds greatly to the modularity of the semantics implementation. I have found this approach very expressive, efficient and easy to maintain - I hope it will also help you. Best regards, Markus Mottl -- Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: lexer, parser 1999-06-12 12:02 ` Markus Mottl @ 1999-06-15 2:01 ` Jacques GARRIGUE 1999-06-15 10:20 ` Markus Mottl 0 siblings, 1 reply; 5+ messages in thread From: Jacques GARRIGUE @ 1999-06-15 2:01 UTC (permalink / raw) To: mottl; +Cc: caml-list From: Markus Mottl <mottl@miss.wu-wien.ac.at> > Execution of "semantics" is also much faster than versions that use > algebraic datatypes and pattern matching, because here we only have to > call methods (wrapped in the functions of the stream) on objects instead > of match abstract syntax trees or else. Sorry to answer pinpoint, but I just want to avoid a confusion. In caml pattern-matching is much more efficient than calling a method. Calling a method is a dynamic operation, involving looking-up a table and calling a possibly curried function, while pattern-matching is completely statically compiled. OO may help you structure your program, but when algebraic datatypes are handy, I would suggest sticking with them. Jacques --------------------------------------------------------------------------- Jacques Garrigue Kyoto University garrigue at kurims.kyoto-u.ac.jp <A HREF=http://wwwfun.kurims.kyoto-u.ac.jp/~garrigue/>JG</A> ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: lexer, parser 1999-06-15 2:01 ` Jacques GARRIGUE @ 1999-06-15 10:20 ` Markus Mottl 0 siblings, 0 replies; 5+ messages in thread From: Markus Mottl @ 1999-06-15 10:20 UTC (permalink / raw) To: Jacques GARRIGUE; +Cc: OCAML > > Execution of "semantics" is also much faster than versions that use > > algebraic datatypes and pattern matching, because here we only have to > > call methods (wrapped in the functions of the stream) on objects instead > > of match abstract syntax trees or else. > > Sorry to answer pinpoint, but I just want to avoid a confusion. > > In caml pattern-matching is much more efficient than calling a method. > Calling a method is a dynamic operation, involving looking-up a table > and calling a possibly curried function, while pattern-matching is > completely statically compiled. Sorry for having caused confusion - my statement above was indeed misleading. I thought that the original writer wanted to use objects anyway, so instead of calling methods and matching patterns in the object, calling the methods alone would be faster even if there is a curried function inbetween. Matching patterns without using objects would be faster, of course - but in my eyes much less flexible. Here an example: --------------------------------------------------------------------------- (* definitions for version with algebraic datatypes *) type action = Inc | Dec class c1 = object val mutable x = 0 method x = x method interpret = function | Inc -> x <- x + 1 | Dec -> x <- x - 1 end let action1 () = match Random.int 2 with | 0 -> Inc | 1 -> Dec | _ -> failwith "impossible pattern" (* definitions for version with functions encapsulating method calls *) class c2 = object val mutable x = 0 method x = x method inc = x <- x + 1 method dec = x <- x - 1 end let inc obj = obj#inc let dec obj = obj#inc let action2 () = match Random.int 2 with | 0 -> inc | 1 -> dec | _ -> failwith "impossible pattern" (* generate the stream - this is the optimized version with linear time behaviour *) let rec action_stream action = Stream.lcons action (Stream.slazy (fun _ -> action_stream action)) (* uncomment the appropriate part to test the specific version *) let _ = let o = new c1 and strm = action_stream action1 in for i = 1 to 5000000 do Stream.next strm done; (* let o = new c2 and strm = action_stream action2 in for i = 1 to 5000000 do (Stream.next strm) o done; *) Printf.printf "o#x: %d\n" o#x --------------------------------------------------------------------------- If we count away the time that is spent during traversal of the stream, the (compiled) version using encapsulated method calls is (on my machine) a bit more than 20% faster. In my opinion the version without pattern matching is much more natural: although we have to define the encapsulating functions once, it is never necessary to explicitely match the action type - we could thus easily redirect the stream to other objects that match the signature. The OO-version is much easier to maintain, because the different actions are really different methods, whereas in the other version the correct "semantics" has to be chosen in one function (where the patterns are matched). Thus, we can (by inheritence, parameterized objects, etc..) easily use abstraction mechanism for generating new "semantics". Best regards, Markus Mottl -- Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: lexer, parser 1999-06-03 11:55 ` lexer, parser John Skaller 1999-06-12 12:02 ` Markus Mottl @ 1999-06-14 7:15 ` Christian Lindig 1 sibling, 0 replies; 5+ messages in thread From: Christian Lindig @ 1999-06-14 7:15 UTC (permalink / raw) To: John Skaller, Jean-Marc EBER; +Cc: Caml Mailing List Quoting John Skaller (skaller@maxtal.com.au): > I need to maintain state, but the component also need > to be re-entrant. Quoting Jean-Marc EBER (Jean-Marc.Eber@socgen.com): > You should certainly be in contact with Christian Lindig, who has > done interesting work on OCamlLex. > > lindig@ips.cs.tu-bs.de > http://www.cs.tu-bs.de/softech/people/lindig/index.html Hi, Like Jean-Marc Eber pointed out have I suggested two extension for OCamlLex that makes maintaining state easier: http://www.cs.tu-bs.de/softech/people/lindig/software/lex-patch.html A patched version of OCamlLex allows for additional parameters to rules: rule scan x y = parse ... Parameters like x and y make it easier to accumulate results across many calls of the same or different rules in a lexer. They help to avoid global variables which make a scanner no longer re-entrant. Using the patched version of OCamlLex is fairly uncritical because only the generated code if different from the original version. Maintaining state across different invocations of a lexer requires some more effort: http://www.cs.tu-bs.de/softech/people/lindig/software/lexing.html I have proposed a new Lexing module with a new `lexbuf' type. The lexbuf type is used by scanners to maintain state but currently does not hold user provided state. The proposed module permits to store user state in lexbuf as well. Unfortunately is it not possible to simply exchange the new module for the old. The whole OCaml system must be recompiled with the new module. However, all old code compiles with the new system. I still would like to see both proposals added to the OCaml system since both are backward compatible at the source code level. -- Christian -- Christian Lindig Technische Universitaet Braunschweig, Germany http://www.cs.tu-bs.de/softech/people/lindig lindig@ips.cs.tu-bs.de phone:+49-531-391-3276 ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~1999-06-15 20:48 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <0580637621241002*/c=FR/admd=ATLAS/prmd=SG/o=INFI/s=EBER/g=JEAN-MARC/@MHS> [not found] ` <0579137620FCB001*/c=FR/admd=ATLAS/prmd=SG/o=INFI/s=EBER/g=JEAN-MARC/@MHS> 1999-06-03 11:55 ` lexer, parser John Skaller 1999-06-12 12:02 ` Markus Mottl 1999-06-15 2:01 ` Jacques GARRIGUE 1999-06-15 10:20 ` Markus Mottl 1999-06-14 7:15 ` Christian Lindig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).