From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104]) by sympa.inria.fr (Postfix) with ESMTPS id 0025C7EE99 for ; Sun, 19 Jan 2014 19:01:32 +0100 (CET) Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of bob.atkey@gmail.com) identity=pra; client-ip=209.85.217.176; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="bob.atkey@gmail.com"; x-sender="bob.atkey@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail3-smtp-sop.national.inria.fr: domain of bob.atkey@gmail.com designates 209.85.217.176 as permitted sender) identity=mailfrom; client-ip=209.85.217.176; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="bob.atkey@gmail.com"; x-sender="bob.atkey@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-lb0-f176.google.com) identity=helo; client-ip=209.85.217.176; receiver=mail3-smtp-sop.national.inria.fr; envelope-from="bob.atkey@gmail.com"; x-sender="postmaster@mail-lb0-f176.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtIEAPgR3FLRVdmwlGdsb2JhbABZg0NWuzAEgQAIFg4BAQEBBwsLCRIqglMZARseAxIQXAEBEQEFATUIh2gBAxENmFWDCIxcgwmSFgoZJw1khHIRAQUMjF+BKIUrBJY2gWyBMosrg04YKYRZPIE1 X-IPAS-Result: AtIEAPgR3FLRVdmwlGdsb2JhbABZg0NWuzAEgQAIFg4BAQEBBwsLCRIqglMZARseAxIQXAEBEQEFATUIh2gBAxENmFWDCIxcgwmSFgoZJw1khHIRAQUMjF+BKIUrBJY2gWyBMosrg04YKYRZPIE1 X-IronPort-AV: E=Sophos;i="4.95,685,1384297200"; d="scan'208";a="45346173" Received: from mail-lb0-f176.google.com ([209.85.217.176]) by mail3-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 19 Jan 2014 19:01:32 +0100 Received: by mail-lb0-f176.google.com with SMTP id w7so3813470lbi.35 for ; Sun, 19 Jan 2014 10:01:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=0AVuL6QN7Th30n5QnVMw7w1Sffpa65U25ynyWTn4fmI=; b=fty6QDfOnJAuEsQXC8Ps49/5cnKlaVciFurI279FdLJYimuygz9Xt8zqElujWAf2/V 8pdUdwlQLtpgN414mN0jXSmu/hkoxClflZUtRHMW6QQFFKQWvl9CTztluhYtNdOqJCJS b/nZokFKvPo+GQ2w0LOLTaKdTcUVpSmh+vnb++muvaPCGjh3Z4tdW6lI/GlUJLP3ykpq xNRHaKKWXKHWzAS9h0iciVpWRkqd0CKwT458Z4wYyZOsyr7pbgRmAQcvBlCocJA6NPhN rD014AXREfDlK04mUT7Evrvz98uoGrP29jm3kHt4+iLjGCpkMmI0b3tXxpkdC75LrMMM V+8w== MIME-Version: 1.0 X-Received: by 10.152.206.104 with SMTP id ln8mr52243lac.67.1390154491125; Sun, 19 Jan 2014 10:01:31 -0800 (PST) Received: by 10.112.57.195 with HTTP; Sun, 19 Jan 2014 10:01:31 -0800 (PST) Date: Sun, 19 Jan 2014 18:01:31 +0000 Message-ID: From: Bob Atkey To: caml-list@inria.fr Content-Type: text/plain; charset=ISO-8859-1 Subject: [Caml-list] ANN: unboxed-arrays-0.1: Unboxed arrays for OCaml Hello all, Apropos the recent discussion about optimizing values of type 'a option, I'd like to announce my little prototype library for the related problem of defining unboxed array types in OCaml. The library uses GADTs to describe the memory layout of unboxed elements, and the module system to hide the unsafe operations used to implement the arrays. Arrays of monomorphic and polymorphic values are supported. The library is available on GitHub: https://github.com/bobatkey/unboxed-arrays For your convenience, I have reproduced the README below (in markdown format), which includes simple example of usage: # Unboxed Arrays An implementation of unboxed arrays for OCaml by Robert Atkey . The runtime representation of OCaml arrays usually stores the elements using a 'boxed' representation, meaning that if the value of a particular element cannot be stored within a single machine word (i.e., it is not an argument-less constructor or an `int` value), then the array actually contains a pointer to the representation of the element elsewhere in the heap. (Note that `float` arrays are special cased.) This boxing representation leads to an extra indirection when accessing elements of the array, means that memory is wasted, and decreases data locality, to the detriment of efficent cache usage. The `unboxed-arrays` library implements functors that generate new abstract types representing arrays, whose run-time representation is 'unboxed', so that each of the elements in stored inline in the array. The library uses unsafe operations from the `Obj` module in its implementation, but this unsafety is completely hidden from clients behind abstract types. Note that this implementation has not yet been benchmarked, and so can only be regarded as a proof-of-concept for implementing unboxed arrays. In particular, it is not clear to me that the overhead of allocating `Data` constructor (see the `Element_Descriptor` signature below) is not too expensive. In the future, it would be nice to explore alternative representations for more specialised representations of unboxed arrays. For example, using bitmaps to representation occupancy information for unboxed `'a option array`s. ## To build ```` $ ocaml setup.ml -build ```` There are no dependencies beyond plain OCaml (version 4.01.0 tested). ## Interface New unboxed array implementations are generated by using the following functor: ````ocaml module Make (Elt : Element_Descriptor) : S with type elt = Elt.t ```` where the signature `S` describes a basic imperative array interface: ````ocaml module type S = sig type t type elt val create : int -> elt -> t val init : int -> (int -> elt) -> t val get : t -> int -> elt val set : t -> int -> elt -> unit val length : t -> int end ```` The element type is described using modules matching the following signature. Modules implementing this signature set out (a) the allowable constructors for the elements; (b) the types of the data associated with each constructor; (c) the maximum size of an element; and (d) the specific width associated with each constructor. ````ocaml module type Element_Descriptor = sig type 'a constructor type size val size : size is_a_natural val width_of : 'a constructor -> ('a, size) width type t = Data : 'a constructor * 'a -> t end ```` The type `'a constructor` is usually a GADT (Generalised Algebraic Datatype) that uses the type parameter to describe the types of the data associated with that constructor. The *type* `size` is expected to be of the form `zero succ ... succ`, and represents in unary notation the maximum size of any element. The *value* `size` is a runtime representation of the number represented by `size`. The function `width_of` describes the necessary width to store the data associated with each constructor, with a static check that all the widths are less than `size`. Finally, the type `t` defines a GADT that represents 'boxed' representations used to represent element values outwith the array. The types `is_a_natural` and `width` are defined by the `UnboxedArray` library. There is also another functor `Make1` that allows for the generation of arrays with polymorphic element types. ## A Simple Example The following implementation of `Element_Descriptor` describes a datatype with three constructors: ````ocaml module Elt = struct open UnboxedArray type 'a constructor = | Empty : unit constructor | Deleted : unit constructor | Value : (int * string) constructor type size = zero succ succ let size = Succ (Succ Zero) let width_of (type d) (c : d constructor) : (d,size) width = match c with | Empty -> Width0 | Deleted -> Width0 | Value -> Width2 type t = Data : 'd constructor * 'd -> t end module A = UnboxedArray.Make (Elt) open Elt ```` Values of type `A.t` now represent boxed values of type `Elt.t`, and can be manipulated through the interface `S with type elt = Elt.t`, in a fairly natural way: ````ocaml # let arr = A.create 10 (Data (Empty, ()));; val arr : A.t = # A.set arr 0 (Data (Value, (1, "foo")));; - : unit = () # A.set arr 5 (Data (Deleted, ()));; - : unit = () # let elt_to_string arr i = match A.get arr i with | Data (Empty, ()) -> "Empty" | Data (Deleted, ()) -> "Deleted" | Data (Value, (k,v)) -> Printf.sprintf "Value (%d,%S)" k v;; val elt_to_string : A.t -> int -> string = # elt_to_string arr 0 |> print_endline;; Value (1,"foo") - : unit = () # elt_to_string arr 1 |> print_endline;; Empty - : unit = () # elt_to_string arr 5 |> print_endline;; Deleted - : unit = () # ```` There is a more substantial example using unboxed arrays to implement a hashtable with linear probing in the file `hashtable.ml`. This example also demonstrates the use of the `UnboxedArray.Make1` functor for generated polymorphic unboxed array types. Bob