From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 04705BC0A for ; Tue, 19 Jun 2007 03:48:20 +0200 (CEST) Received: from ipmail03.adl2.internode.on.net (ipmail03.adl2.internode.on.net [203.16.214.135]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l5J1mH5P015391 for ; Tue, 19 Jun 2007 03:48:19 +0200 X-IronPort-AV: E=Sophos;i="4.16,436,1175437800"; d="scan'208";a="104750241" Received: from ppp59-167-2-129.lns1.syd7.internode.on.net (HELO [192.168.1.201]) ([59.167.2.129]) by ipmail03.adl2.internode.on.net with ESMTP; 19 Jun 2007 11:18:13 +0930 Subject: camlp5? From: skaller To: caml-list Content-Type: text/plain Date: Tue, 19 Jun 2007 11:48:11 +1000 Message-Id: <1182217691.29571.117.camel@rosella.wigram> Mime-Version: 1.0 X-Mailer: Evolution 2.10.1 Content-Transfer-Encoding: 7bit X-Miltered: at discorde with ID 467735E1.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; camlp:01 ocaml:01 parser:01 parsers:01 grammars:01 lalr:01 ocamlyacc:01 ocaml:01 ocamlyacc:01 compilation:01 parsing:01 syntax:01 syntax:01 non-terminal:01 tokens:01 Hi, I thought I'd briefly report progress on a subsystem of Felix which may prove a useful technology for Ocaml. I am using Dypgen http://dypgen.free.fr/ and Ocs scheme http://will.iki.fi/software/ocs/ together for building Felix AST terms in a flexible way. Dypgen is an *extensible* GLR parser generator which is now working reasonably well. Syntactic forms in the input language can be parsed so that the user actions extend the grammar being parsed, including adding new non-terminals and handling conflicts. GLR parsers can parse any context free language, and work with many grammars for such languages, and although some care is required obtaining reasonable performance, an Ocamlyacc-like (LALR1) grammar seems to parse about the same speed as Ocamlyacc. The big problem with extensions is not the grammar, but the user actions. To solve this problem, I have chosen to use Scheme programs as user actions. Ocs_scheme is written in Ocaml and is very easy to work with, and appears quite fast. With this combination, user action codes are written as 'text' which is then compiled and executed to produce an Ocs_types.sval or scheme value. These values are basically s-expressions. Felix converts them to a simple s-expression format, then converts them to Felix AST_format with a laboriously coded mapping. Of course the typing is dynamic, up to the building of the AST. The AST built is the same one the same Ocamlyacc grammar would have yielded. At this point I am moving 95% of the Felix grammar into a user library, so that the 'initial' hard coded grammar is 'more or less' reduced to: (a) the top level compilation unit (start symbol) (b) some code for parsing syntax extension syntax (c) an effective top level dummy non-terminal (d) a set of bottom non-terminals which wrap tokens with attributes, such as literals and names The bits 'in between' are all programmed in the input source, that is, user written library code. The same idea could be used for Ocaml. Instead of using ocamlp4 to separately build extensions, the extensions are defined directly in the source code that needs them, and ocaml parses it. Note: dypgen syntax extensions are properly scoped. Felix uses the format: syntax schemer { s_term := sprefixed =># "_1"; s_term := s_term / spower =># "`(ast_apply (div (,_1 ,_3)))"; s_term := s_term % spower =># "`(ast_apply (mod (,_1 ,_3)))"; } open syntax schemer; so that the grammar definitions are defined separately from use. At the moment Felix #includes this input into the program that needs it, however the layout is design to support separate compilation of the grammar definitions, so that only the 'open syntax' is required. That statement provides the linkage to the language sorely missing from the current ocamlp4 engineering. However the above system is much more flexible because it is properly scoped -- the opened syntax is only accessible in the scope an Ocaml 'open' would normally make names available. Thus, you could easily define 'macros' to help implement modules without impacting other modules, even in the same translation unit. Whilst Scheme isn't the best language for processing Ocaml AST terms or extensions to them, it has the advantage of being a complete, well known, high power symbol processing languages. To use the system you only need to learn a bit of Scheme plus your base AST mapping. The technology appears different from MetaOcaml because the staging is localised, that is, the phasing is recursed bottom up not top down (but I may be wrong on that!) as well as using a distinct meta-language which is of course unpleasant, particularly the dynamic typing. When Ocaml 3.11 is released I hope to extend the technology to also support dynamic loading of semantic extensions. It's probable these extensions will be plugged directly into Ocs_scheme as primitives, so that the Ocaml based AST is discarded permanently. [The whole compiler will become a Scheme program, albiet with very specialised primitives written in Ocaml] Most term rewriting systems are 'effectively' dynamically typed, even if they use some lower level static typing as support. Rewrite rules have both pre- and post- conditions which can't be represented with static typing, and excess static typing appears to seriously impeded modification of the term rewriting systems. These systems are 'interpreting' terms no matter what -- that is, they're intrinsically dynamically typed. Using s-expressions is a rather extreme encoding, and is basically TOO dynamic IMHO, but it does seem to represent a good compromise when there is an emphasis on extension and experiment. The point is that whilst a more static encoding is safer for simpler extensions .. it really gets in the way badly when the extension is a complex function of an already complex term encoding. It would be interesting if someone wanted to use this idea to make a new camlp5 front end to Ocaml. -- John Skaller Felix, successor to C++: http://felix.sf.net