caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: oleg@okmij.org
To: caml-list@inria.fr
Subject: [Caml-list] ANN: Brand-new BER MetaOCaml for OCaml 4.00.1
Date: 31 Jan 2013 07:49:03 -0000	[thread overview]
Message-ID: <20130131074903.16892.qmail@www1.g3.pair.com> (raw)


BER MetaOCaml N100 is now available. It is a strict superset of OCaml
4.00.1, extending it with staging annotations to construct and run
typed code values. BER MetaOCaml has been completely re-implemented
and thus caught up with OCaml. For those who don't know what staging
or MetaOCaml is, a short introduction follows the news summary.

BER MetaOCaml is in the spirit of the original MetaOCaml by Taha and
Calcagno, but has been completely re-written using different
algorithms and techniques. BER MetaOCaml has been re-structured to
minimize the number of the changes to the OCaml type-checker and to
separate the `kernel' (type-checking and constructing code values)
from the `user-level'. Various ways of running the code -- compiling
to byte-code or machine instructions and executing them, or
translating code values to C or LLVM, or printing them -- can be done
in `user-level' libraries, without any need to hack into (Meta)OCaml.
(Printing and byte-code execution are included in BER N100.)  This
release of BER MetaOCaml is meant to incite future research into
type-safe meta-programming. 

To the user, the two major differences of BER N100 from the old
MetaOCaml are:

 -- constructor restriction: all data constructors and record labels 
 used within brackets must come from the types that are declared in 
 separately compiled modules.

 -- scope extrusion check: attempting to build code values with
 unbound or mistakenly bound variables (which is possible with
 mutation or other effects) is caught early, raising an exception
 with good diagnostics.

Both are explained after the short introduction to staging.  Smaller
visible differences are better printing of cross-stage persistent
values and the full support for labeled arguments. A long-standing
problem with records with polymorphic fields has fixed itself. The BER
N100 code is now extensively commented, and has a regression test
suite. BER N100 is much less invasive into OCaml: compare the size of
the patch to the main OCaml type-checker typing/typecore.ml, which
contains the bulk of the changes for type checking the staging
constructs. In the previous version BER N004, the patch had 564 lines
of additions, deletions and context; now, only 328 lines.
(The core MetaOCaml kernel is trx.ml, with 1800 lines.)


BER MetaOCaml N100 is available:

-- as a set of patches to the OCaml 4.00.1 distribution. 
        http://okmij.org/ftp/ML/ber-metaocaml-100.tar.gz
See the INSTALL document in that archive. You need the source
distribution of OCaml 4.00.1, see the following URL for details.
        http://ocaml.org/install.html

-- as a GIT bundle containing the changes relative to OCaml 4.00.1
        http://okmij.org/ftp/ML/metaocaml.bundle
First, you have to obtain the base
       git clone https://github.com/ocaml/ocaml.git -b 4.00 ometa4
then switch to tag 4.00.1 and then apply the bundle.

Jacques Carette has extensively re-written the printing of code
values, and is currently maintaining this part. I'm grateful to him
for encouragement and discussions.


Introduction to staging and MetaOCaml

The standard example of meta-programming -- the running example of
A.P.Ershov's 1977 paper that begat meta-programming -- is the power
function, computing x^n. In OCaml:

    let square x = x * x
    let rec power n x =
      if n = 0 then 1
      else if n mod 2 = 0 then square (power (n/2) x)
      else x * (power (n-1) x)
    (* val power : int -> int -> int = <fun> *)

Suppose our program has to compute x^7 many times. We may define
    let power7 x = power 7 x

In MetaOCaml, we may also specialize the power function to a
particular value n, obtaining the code which will later receive x
and compute x^n. We re-write power n x annotating expressions as
computed `now' (when n becomes known) and `later' (when x is given).

    let rec spower n x =
      if n = 0 then .<1>.
      else if n mod 2 = 0 then .<square .~(spower (n/2) x)>.
      else .<.~x * .~(spower (n-1) x)>.
    (* val spower : int -> ('cl, int) code -> ('cl, int) code = <fun> *)

The two annotations, or staging constructs, are brackets .< e >. and
escape .~e . Brackets .< e >. `quasi-quote' the expression e,
annotating it as computed later. Escape .~e, which must be used within
brackets, tells that e is computed now but produces the result for
later. That result, the code, is spliced-in the containing bracket. 
The inferred type of spower is different. The result is no longer an
int, but ('cl, int) code -- the code of expressions that compute an int.
The first type argument to 'code', often named 'cl, is a so-called
environment classifier and can be skipped on first reading. The type
of spower spells out which argument is received now, and which
later. To specialize spower to 7, we define

    let spower7_code = .<fun x -> .~(spower 7 .<x>.)>.;;
(*
    val spower7_code : ('cl, int -> int) code = .<
      fun x_1 ->
       (x_1 *
         (((* cross-stage persistent value (id: square) *))
           (x_1 * (((* cross-stage persistent value (id: square) *)) (x_1 * 1)))))>.
*)
and obtain the code of a function that will receive x and return
x^7. Code, even of functions, can be printed, which is what MetaOCaml
toplevel did. The print-out contains so-called cross-stage persistent
values, or CSP, which `quote' present-stage values such as square to
be used later. One may think of CSP as references to `external
libraries' -- only in our case the program acts as a library for the
code it generates.

If we want to use thus specialized x^7 now, in our code, we should
compile spower7_code and link it back to our program. This is called
`running the code'

    let spower7 = .! spower7_code
    (*
    val spower7 : int -> int = <fun>
    *)

The specialized spower7 has the same type as the partially applied
power7 above. They behave identically. However, power7 x will do
recursion on n, checking n's parity, etc. In contrast, specialized
spower7 has no recursion (as can be seen from spower7_code). All
operations on n have been done when the spower7_code was computed,
producing the straight-lined code spower7 that operates only on x.

MetaOCaml supports arbitrary number of later stages, letting us write
not only code generators but also generators of code generators, etc.


Data constructor restriction

BER MetaOCaml N100 imposes the restriction that all data constructors
and record labels used within brackets must come from the types that 
are declared in separately compiled modules. For example, the
following all work:

    .<true>.;;
    .<raise Not_found>.;;
    .<Some [1]>.;;
    .<{Complex.re = 1.0; im = 2.0}>.;;
    let open Complex in .<{re = 1.0; im = 2.0}>.;;
    .<let open Complex in {re = 1.0; im = 2.0}>.;;

because data types bool, option, list, Complex are either Pervasive or
defined in the (separately compiled) standard library. However, the
following are not allowed and flagged as compile-time error:

       type foo = Bar;;
       .<Bar>.

       module Foo = struct exception E end;;
      .<raise Foo.E>.

The type declaration foo or the module declaration Foo must be moved
into a separate file. The corresponding .cmi file must also be
available at run-time: either placed into the same directory as the
executable, or somewhere within the OCaml library search path.

Scope extrusion test

Although MetaOCaml permits manipulation and splicing of open code, its
type system statically ensures that only closed code can be printed or
run: We can't run the code we haven't finished constructing. For example,
     .<fun x -> .~(let y = .! .<x>. in .<y>.)>.;;
gives a type error since .<x>. is obviously open code.

This static guarantee holds only for pure code. Effects such as
storing code values in mutable cells void the guarantee. Here is the
example using the _old_ MetaOCaml from 2006 (version 3.09.1 alpha
030).  (The problem can be illustrated simpler, but the following
example is more realistic and devious.)

    let c =
     let r = ref .<fun z->z>. in 
     let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in 
     .<fun x -> .~f (.~(!r) 1)>. ;;
    (*
    val c : ('a, '_b -> int) code =
      .<fun x_4 -> ((fun x_2 -> 0) ((fun y_3 -> x_2) 1))>.
    *)

One must look hard to see that x_2 is actually unbound in the resulting
code. The problem is revealed when we attempt to run that code:

    .! c;;
    (*
    Characters 77-78:
      Unbound value x_2

    Exception: Trx.TypeCheckingError.
    *)

Since we get the error anyway (without much diagnostics though), one
may discount the problem. Alas, sometimes scope extrusion results in
no error -- just in wrong results.

    let c1 =
     let r = ref .<fun z->z>. in 
     let _ = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in 
     !r;;
    (*
    val c1 : ('a, '_b -> '_b) code = .<fun y_3 -> x_2>.
    *)

we then use c1 to construct the code c2:

    let c2 = .<fun y -> fun x -> .~c1>.;;
    (*
    val c2 : ('a, 'b -> 'c -> '_d -> '_d) code =
      .<fun y_1 -> fun x_2 -> fun y_3 -> x_2>.
    *)

which contains no unbound variables and can be run without problems.

    (.! c2) 1 2 3;;
    (* - : int = 2 *)

It is most likely that the user did not intend for 'fun x ->' in c2 to
bind x in c1. This is the blatant violation of lexical scope. And yet we
get no error or other overt indication that something went wrong.

BER MetaOCaml N100 has none of this. Although the type system still
permits code values with escaped variables, attempting to use such
code in any way -- splice, print or run -- immediately raises an
exception. For example, entering the expression c in the top-level BER
MetaOCaml N100 gives

Exception: Failure
Scope extrusion at Characters 89-99:
       let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in 
                                    ^^^^^^^^^^
 for the identifier x_7 bound at Characters 74-75:
       let f = .<fun x -> .~(r := .<fun y -> x>.; .<0>.)>. in 
                     ^
The exception message tells which variable got away, where it was
bound and where it eloped.


The file NOTES.txt in the BER MetaOCaml distribution describes the
features of BER MetaOCaml in more detail and outlines directions for
further development. Hopefully the release of BER MetaOCaml N100 would
stimulate using and researching typed meta-programming.


             reply	other threads:[~2013-01-31  7:49 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-31  7:49 oleg [this message]
2013-01-31 12:23 ` rixed
2013-02-01  2:12 ` Francois Berenger
2013-02-01  6:53 oleg
2013-02-01  7:53 ` Francois Berenger
2013-02-26 18:09   ` Anil Madhavapeddy
2013-02-19  3:37 bob zhang
2013-02-20  1:51 ` Jacques Carette
2013-02-20 14:03   ` bob zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130131074903.16892.qmail@www1.g3.pair.com \
    --to=oleg@okmij.org \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).