From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id E7F62BC6B for ; Fri, 22 Jun 2007 01:20:42 +0200 (CEST) Received: from pih-relay04.plus.net (pih-relay04.plus.net [212.159.14.131]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l5LNKgtO012185 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Fri, 22 Jun 2007 01:20:42 +0200 Received: from [80.229.56.224] (helo=beast.local) by pih-relay04.plus.net with esmtp (Exim) id 1I1VxC-0006p3-0Q for caml-list@yquem.inria.fr; Fri, 22 Jun 2007 00:20:42 +0100 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Parsing Date: Fri, 22 Jun 2007 00:14:39 +0100 User-Agent: KMail/1.9.7 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200706220014.39510.jon@ffconsultancy.com> X-Miltered: at concorde with ID 467B07CA.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; parsing:01 parser:01 camlp:01 parsing:01 syntax:01 parsers:01 grammars:01 ocaml:01 parser:01 'int:01 expr:01 expr:01 val:01 genlex:01 val:01 I'm not a parser junkie, so please bear with me if this is silly... The camlp4 stream parsing syntax extension is a very cool way to write efficient parsers for simple grammars: http://www.ffconsultancy.com/ocaml/benefits/parsing.html # let rec parse_atom = parser | [< 'Int n >] -> Num n | [< 'Kwd "("; e=parse_expr; 'Kwd ")" >] -> e and parse_factor = parser | [< e1=parse_atom; stream >] -> (parser | [< 'Kwd "*"; e2=parse_factor >] -> e1 *: e2 | [< >] -> e1) stream and parse_expr = parser | [< e1=parse_factor; stream >] -> (parser | [< 'Kwd "+"; e2=parse_expr >] -> e1 +: e2 | [< 'Kwd "-"; e2=parse_expr >] -> e1 -: e2 | [< >] -> e1) stream;; val parse_atom : Genlex.token Stream.t -> expr = val parse_factor : Genlex.token Stream.t -> expr = val parse_expr : Genlex.token Stream.t -> expr = However, you must factor heads in the pattern matches. So instead of: and parse_expr = parser | [< e1=parse_factor; 'Kwd "+"; e2=parse_expr >] -> e1 +: e2 | [< e1=parse_factor; 'Kwd "-"; e2=parse_expr >] -> e1 -: e2 | [< e1=parse_factor >] -> e1 you must write: and parse_expr = parser | [< e1=parse_factor; stream >] -> (parser | [< 'Kwd "+"; e2=parse_expr >] -> e1 +: e2 | [< 'Kwd "-"; e2=parse_expr >] -> e1 -: e2 | [< >] -> e1) stream Why is this? Is it because the streams are destructive? Could the camlp4 macro not factor the pattern match for me? Are there other parser tools that integrate into the language and do a better job? -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. The OCaml Journal http://www.ffconsultancy.com/products/ocaml_journal/?e