caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Bob Atkey <bob.atkey@gmail.com>
To: caml-list@inria.fr
Subject: [Caml-list] ANN: unboxed-arrays-0.1: Unboxed arrays for OCaml
Date: Sun, 19 Jan 2014 18:01:31 +0000	[thread overview]
Message-ID: <CA+aevF8OTm2644EhnZ5pkr44yDXOSwJF1Y3OzuuHNJzEm6oSdg@mail.gmail.com> (raw)

Hello all,

Apropos the recent discussion about optimizing values of type 'a
option, I'd like to announce my little prototype library for the
related problem of defining unboxed array types in OCaml. The library
uses GADTs to describe the memory layout of unboxed elements, and the
module system to hide the unsafe operations used to implement the
arrays. Arrays of monomorphic and polymorphic values are supported.

The library is available on GitHub:
   https://github.com/bobatkey/unboxed-arrays

For your convenience, I have reproduced the README below (in markdown
format), which includes simple example of usage:


# Unboxed Arrays

An implementation of unboxed arrays for OCaml by Robert Atkey
<bob.atkey@gmail.com>.

The runtime representation of OCaml arrays usually stores the elements
using a 'boxed' representation, meaning that if the value of a
particular element cannot be stored within a single machine word
(i.e., it is not an argument-less constructor or an `int` value), then
the array actually contains a pointer to the representation of the
element elsewhere in the heap. (Note that `float` arrays are special
cased.) This boxing representation leads to an extra indirection when
accessing elements of the array, means that memory is wasted, and
decreases data locality, to the detriment of efficent cache usage.

The `unboxed-arrays` library implements functors that generate new
abstract types representing arrays, whose run-time representation is
'unboxed', so that each of the elements in stored inline in the
array. The library uses unsafe operations from the `Obj` module in its
implementation, but this unsafety is completely hidden from clients
behind abstract types.

Note that this implementation has not yet been benchmarked, and so can
only be regarded as a proof-of-concept for implementing unboxed
arrays. In particular, it is not clear to me that the overhead of
allocating `Data` constructor (see the `Element_Descriptor` signature
below) is not too expensive.

In the future, it would be nice to explore alternative representations
for more specialised representations of unboxed arrays. For example,
using bitmaps to representation occupancy information for unboxed `'a
option array`s.

## To build

````
$ ocaml setup.ml -build
````

There are no dependencies beyond plain OCaml (version 4.01.0 tested).

## Interface

New unboxed array implementations are generated by using the following
functor:

````ocaml
module Make (Elt : Element_Descriptor) : S with type elt = Elt.t
````

where the signature `S` describes a basic imperative array interface:

````ocaml
module type S = sig
  type t
  type elt
  val create : int -> elt -> t
  val init   : int -> (int -> elt) -> t
  val get    : t -> int -> elt
  val set    : t -> int -> elt -> unit
  val length : t -> int
end
````

The element type is described using modules matching the following
signature. Modules implementing this signature set out (a) the
allowable constructors for the elements; (b) the types of the data
associated with each constructor; (c) the maximum size of an element;
and (d) the specific width associated with each constructor.

````ocaml
module type Element_Descriptor = sig
  type 'a constructor
  type size
  val size : size is_a_natural
  val width_of : 'a constructor -> ('a, size) width
  type t = Data : 'a constructor * 'a -> t
end
````

The type `'a constructor` is usually a GADT (Generalised Algebraic
Datatype) that uses the type parameter to describe the types of the
data associated with that constructor. The *type* `size` is expected
to be of the form `zero succ ... succ`, and represents in unary
notation the maximum size of any element. The *value* `size` is a
runtime representation of the number represented by `size`. The
function `width_of` describes the necessary width to store the data
associated with each constructor, with a static check that all the
widths are less than `size`. Finally, the type `t` defines a GADT that
represents 'boxed' representations used to represent element values
outwith the array.

The types `is_a_natural` and `width` are defined by the `UnboxedArray`
library.

There is also another functor `Make1` that allows for the generation
of arrays with polymorphic element types.

## A Simple Example

The following implementation of `Element_Descriptor` describes a
datatype with three constructors:

````ocaml
module Elt = struct
  open UnboxedArray

  type 'a constructor =
    | Empty   : unit constructor
    | Deleted : unit constructor
    | Value   : (int * string) constructor

  type size = zero succ succ
  let size = Succ (Succ Zero)

  let width_of (type d) (c : d constructor) : (d,size) width =
      match c with
        | Empty   -> Width0
        | Deleted -> Width0
        | Value   -> Width2

  type t = Data : 'd constructor * 'd -> t
end

module A = UnboxedArray.Make (Elt)

open Elt
````

Values of type `A.t` now represent boxed values of type `Elt.t`, and
can be manipulated through the interface `S with type elt = Elt.t`, in
a fairly natural way:

````ocaml
# let arr = A.create 10 (Data (Empty, ()));;
val arr : A.t = <abstr>
# A.set arr 0 (Data (Value, (1, "foo")));;
- : unit = ()
# A.set arr 5 (Data (Deleted, ()));;
- : unit = ()
# let elt_to_string arr i =
    match A.get arr i with
      | Data (Empty, ()) -> "Empty"
      | Data (Deleted, ()) -> "Deleted"
      | Data (Value, (k,v)) -> Printf.sprintf "Value (%d,%S)" k v;;
val elt_to_string : A.t -> int -> string = <fun>
# elt_to_string arr 0 |> print_endline;;
Value (1,"foo")
- : unit = ()
# elt_to_string arr 1 |> print_endline;;
Empty
- : unit = ()
# elt_to_string arr 5 |> print_endline;;
Deleted
- : unit = ()
#
````

There is a more substantial example using unboxed arrays to implement
a hashtable with linear probing in the file `hashtable.ml`. This
example also demonstrates the use of the `UnboxedArray.Make1` functor
for generated polymorphic unboxed array types.



Bob

             reply	other threads:[~2014-01-19 18:01 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-19 18:01 Bob Atkey [this message]
2014-01-20  8:47 ` Mark Shinwell
2014-01-20 12:24   ` Bob Atkey
2014-01-20 12:33     ` Mark Shinwell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+aevF8OTm2644EhnZ5pkr44yDXOSwJF1Y3OzuuHNJzEm6oSdg@mail.gmail.com \
    --to=bob.atkey@gmail.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).