From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id A88C9BC69 for ; Fri, 20 Oct 2006 00:01:36 +0200 (CEST) Received: from h2o.hex.no (93.80-203-243.nextgentel.com [80.203.243.93]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id k9JM1YGf029614 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Fri, 20 Oct 2006 00:01:35 +0200 Received: from localhost (localhost [127.0.0.1]) (uid 1000) by h2o.hex.no with local; Fri, 20 Oct 2006 00:01:31 +0200 id 00261CC4.4537F5BB.0000343F Date: Fri, 20 Oct 2006 00:01:31 +0200 From: Jorgen Hermanrud Fjeld To: caml-list@yquem.inria.fr Subject: Haskell parser combinators in OCaml? Message-ID: <20061019220131.GA18656@hex.no> Mail-Followup-To: caml-list@yquem.inria.fr Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="=_h2o-13375-1161295291-0001-2" Content-Disposition: inline User-Agent: Mutt/1.5.12-2006-07-14 X-Miltered: at concorde with ID 4537F5BE.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; haskell:01 parser:01 combinators:01 ocaml:01 haskell:01 combinator:01 parsers:01 citeseer:01 combinator:01 parsers:01 lalr:01 grammars:01 ocaml:01 parser:01 forall:01 X-Attachments: type="application/pgp-signature" name="signature.asc" This is a MIME-formatted message. If you see this text it means that your E-mail software does not support MIME-formatted messages. --=_h2o-13375-1161295291-0001-2 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi. =46rom the world of Haskell, the work of S. Doaitse Swierstra in the paper "Combinator Parsers: From Toys to Tools"=20 "http://citeseer.ist.psu.edu/363886.html", introduces some very nice combinator parsers that parse LALR(k) grammars, and give good error messages. I would love too express something equivalent in OCaml, but I'm not sure how to translate the concepts used into concepts in OCaml. I am hoping some of the type theorists out there would glance at the paper, and bestow some reflection, advice or warning upon me. There are several issues: 1) How to express the lazy lookahead data structure? 3) How to express the type of the parser in OCaml? Some details: 1) The lazy data structure in 4.1 can not be expressed directly, and I believe some kind of explicit fixed point is needed. Would one need fixed points with deBruijn indexes? Do you know of any similar examples that I may look at for inspiration? 2) The parser has the haskell type=20 type Parser a =3D forall b result . Future b result -> Stack a b -> Errs -> Input -> Steps result which I can not express in OCaml. My attempts at encoding this=20 using an encoding that express existential types, have so far not=20 worked out. I always end up with a type error, and do not see how to better design it.=20 ######## The type error File "parser.ml", line 154, characters 21-26: This field value has type ('a, 'a) future -> (symbol, 'a) stack -> (errors -> errors) -> input -> ('a * errors) steps which is less general than 'b 'c. ('b, 'c) future -> ('d, 'b) stack -> (errors -> errors) -> input -> ('c * errors) steps ######## Begin code module BraunTree =3D=20 struct type ('key,'value) braun_tree =3D=20 | Node of ('key,'value) braun_tree * ('key * 'value) * ('key,'value= ) braun_tree | Nil ;; let tree_of_list (l:('key*'value) list) : ('key,'value) braun_tree =3D= =20 let rec tree_of_list len l =3D=20 match l with | [] -> (Nil,[]) | (h::[]) -> (Node (Nil,h,Nil),[]) | (h::t) -> =20 let left_len =3D (len - 1) / 2 in let right_len =3D len - 1 - left_len in let (left_tree,left_list) =3D tree_of_list left_len l in match left_list with | [] -> assert false | (left_head::left_tail) -> let (right_tree,right_tail) =3D tree_of_list ri= ght_len left_tail in (Node (left_tree,left_head,right_tree),right_ta= il) in let (tree,l) =3D tree_of_list (List.length l) l in match l with | [] -> tree | _ -> assert false ;; let find ~(key:'key) ~(tree:('key,'value) braun_tree) : 'value option = =3D=20 let rec find tree =3D=20 match tree with | Nil -> None | Node (left,(found_key,value),right) ->=20 match compare key found_key with | 0 -> Some value | 1 -> find left | -1 -> find right | _ -> assert false in find tree ;; end module ContinuationTrieParser =3D=20 struct type symbol =3D string type input =3D symbol list type 'result steps =3D=20 Ok of 'result steps | Fail of 'result steps=20 | Stop of 'result type ('a,'b) stack =3D 'a -> 'b;; type ('cont,'result) future =3D 'cont -> (errors->errors) -> input -> 're= sult steps and errors =3D=20 | Deleted of symbol * string * errors | Inserted of symbol * string * errors | Notused of string=20 type 'p automaton =3D=20 | Shift of 'p * (symbol * 'p automaton) list | ShiftReduce of 'p automaton * 'p automaton | Reduce of 'p | Found of 'p * 'p automaton type 'a combinator_parser =3D { parse:'cont 'result. ('cont,'result) future-> ('a,'cont) stack -> (errors->errors) -> inpu= t -> ('result*errors) steps } type 'a parser_generator =3D { automaton : ('a combinator_parser) automaton; generated : 'a combinator_parser } exception Ambigous_grammar ;; let rec best : 'result steps -> 'result steps -> 'result steps =3D=20 fun left right -> match (left,right) with (Ok left,Ok right) -> Ok (best left right)=20 | (Fail left,Fail right) -> Fail (best left right)=20 | (Ok _,Fail _) -> left=20 | (Fail _,Ok _) -> right=20 | (Stop _,_) -> left=20 | (_,Stop _) -> right=20 ;; let best_parser (left:'a combinator_parser) (right:'a combinator_parser) = : 'a combinator_parser =3D=20 let parse cont stack errors input =3D best (left.parse cont stack errors input) (right.parse cont stack= errors input) in {parse=3Dparse} ;; (** Also known as a catamorphism *) let transform_automaton ((transform_shift ,transform_shiftreduce ,transform_reduce ,transform_found ): ((('p * (symbol*'p automaton) list) -> 'b) *(('p automaton * 'p automaton) -> 'b) *('p -> 'b) *(('p * 'p automaton) -> 'b))) (automaton:'a automaton) : 'b =3D let rec transform (automaton:'a automaton) =3D=20 match automaton with | Shift (p,choices) ->=20 let rec foreach choices collected =3D=20 match choices with | [] -> collected | ((symbol,choice)::tail) ->=20 let collected =3D (symbol,transform = choice)::collected in foreach tail collected=20 in=20 transform_shift (p,foreach choices [])=20 | ShiftReduce (shift,reduce) -> transform_shiftreduce (transform shift,tra= nsform reduce) | Reduce reduce -> transform_reduce reduce | Found (found,more) -> transform_found (found,tra= nsform more) in transform automaton ;; let map_automaton (f:'a->'b) (automaton: 'a automaton) : 'b automaton =3D= =20 let transform_shift (p,choices) =3D Shift (f p,choices) in let transform_shiftreduce (shift,reduce) =3D ShiftReduce (shift,reduce) in let transform_reduce reduce =3D Reduce (f reduce) in let transform_found (found,more) =3D Found (f found,more) in transform_automaton (transform_shift,transform_shiftreduce,transform_reduce,transform_fou= nd) automaton ;; let rec mkparser (automaton: string automaton) : string parser_generator = =3D let choose (input:input) : string combinator_parser =3D=20 let transform_shift ((p,choices) :symbol * (symbol*symbol automaton) list) : string combinator= _parser =3D=20 let table : (symbol,symbol automaton) BraunTree.braun_tree = =3D BraunTree.tree_of_list choices in let find key =3D BraunTree.find ~key ~tree:table in let parse cont stack errors input : ('a*errors) steps =3D=20 match input with | [] ->=20 let error =3D=20 errors (Inserted (p,"Insert at end of file",N= otused "")) in Stop (stack p,error) | (h::t) -> begin match find h with | Some automaton -> Ok ((mkparser automaton).generated.parse cont= stack errors t) | None ->=20 let errors error =3D errors (Deleted (h,"Deleted = symbol",error)) in let errors error =3D errors (Inserted (p,"Insert = symbol",error)) in Fail (Fail (Stop (stack p,errors (Notused h)))) end=20 in=20 {parse=3Dparse} in let transform_shiftreduce ((shift,reduce) : symbol automaton * sy= mbol automaton ) =3D let parse cont stack errors input =3D=20 (best_parser (mkparser shift).generated (mkparser reduce)= =2Egenerated).parse cont stack errors input in=20 parse in let transform_reduce (reduce:symbol) =3D let parse cont stack errors input =3D reduce cont stack error= s input in parse in let transform_found ((found,more):symbol*symbol automaton) =3D let parse cont stack errors input =3D found cont stack errors= input in {parse=3Dparse} in transform_automaton (transform_shift,transform_shiftreduce,transform_reduce,transform= _found) automaton=20 in let parse cont stack errors input =3D (choose input).parse cont stack errors input in {automaton=3Dautomaton;parse=3Dparse} ;; (** <|> *) let either : ('a parser_generator * 'a parser_generator) -> 'a parser_gen= erator =3D=20 fun (p,q) ->=20 mkparser (merge_ch p.automaton q.automaton) ;; let rec combine (lefts: (symbol * 'p automaton) list ) (rights: (symbol * 'p automaton) list ) : (symbol * 'p automaton) list =3D=20 match (lefts,rights) with | ((((left_symbol,left_sentence) as left_head)::left_tail) ,(((right_symbol,right_sentence) as right_head)::right_tail) ) ->=20 begin match compare left_symbol right_symbol with | 1 -> left_head::(combine left_tail rights) | -1 -> right_head::(combine lefts right_tail) | 0 ->=20 let head =3D (left_symbol,either(left_sentence,right= _sentence)) in let tail =3D (combine left_tail right_tail) in head::tail | _ -> assert false end | ([],_) -> rights | (_,[]) -> lefts ;; (** <*> *) let rec both : ('a parser_generator * 'a parser_generator) -> 'a parser_g= enerator =3D fun (p,q) -> (** Use two combinator parsers in sequence * a both for combinator parsers *) let both_combinator_parsers first second =3D=20 let parse cont stack errors input =3D let stack f x =3D stack (f x) in first.parse (second.parse cont) stack errors input in {parse=3Dparse} in let transform_shift (p,choices) =3D Shift (both_combinator_parser= s p q.generated,choices) in let transform_shiftreduce (shift,reduce) =3D merge_ch shift reduce in let transform_reduce reduce =3D=20 let worker x =3D fwby reduce x in map_automaton worker q.automaton in let transform_found (found,more) =3D Found (both_combinator_parse= rs found q.generated,more) in let automaton =3D transform_automaton (transform_shift ,transform_shiftreduce ,transform_reduce ,transform_found) p.automaton in mkparser automaton ;; let merge_ch left right =3D=20 match (left,right) with | (Shift (left_parser,left_choices),Shift (right_parser,right_choices= )) -> let best =3D best_parser left_parser right_parser in let choices =3D combine left_choices right_choices in Shift (best,choices) | (Shift _,ShiftReduce (shift,reduce)) -> ShiftReduce (merge_ch left shift,reduce) | (Shift _,Reduce _) -> ShiftReduce (left,right) | (Shift _,Found (_,more)) ->merge_ch left more | (Found (_,more),_) -> merge_ch more right | (_,Shift _) -> merge_ch right left | (ShiftReduce _,_)=20 | (Reduce _,_) -> raise Ambigous_grammar ;; let symbol (a:symbol) : symbol combinator_parser =3D=20 let rec parse cont stack errors input =3D match input with | x::xs ->=20 if a =3D x=20 then Ok (cont (stack a) errors xs) else=20 let deleted_x =3D let errors e =3D errors (Deleted (x,position xs,e)) in parse cont stack errors xs in let inserted_a =3D let errors e =3D errors (Inserted (a,show_symbol a,e)= ) in cont (stack x) errors input in Fail (best deleted_x inserted_a) | [] ->=20 let errors e =3D errors (Inserted (a,eof,e)) in let inserted_a =3D cont (stack a) errors input in Fail inserted_a in let accept cont stack errors input =3D=20 match input with | [] -> assert false | (x::xs) ->=20 assert (a =3D x) ; Ok (cont (stack a) errors xs) in let shift =3D Shift ({parse=3Dparse},[(a,Reduce {parse=3Daccept})]) in let found =3D Found ({parse=3Dparse},shift) in mkparser found ;; let succeed f =3D let parse cont stack errors input =3D cont (stack f) errors input in mkparser (End {parse=3Dparse}) ;; end ;; ######## End code =20 --=20 Sincerely | Homepage: J=F8rgen | http://www.hex.no/jhf | Public GPG key: | http://www.hex.no/jhf/key.txt --=_h2o-13375-1161295291-0001-2 Content-Type: application/pgp-signature; name="signature.asc" Content-Transfer-Encoding: 7bit Content-Description: Digital signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFFN/W79jvTqPy5VsoRAr2BAJ9YoSFw/90kZyqeKBvuTYK+mexKQwCdFreC D4HlEsFydY2DMwEW0BFlT5U= =qeIE -----END PGP SIGNATURE----- --=_h2o-13375-1161295291-0001-2--