I have a few questions related to the internal representation of
recursive data types, having implications on the suitability of Ocaml
for highly efficient data structures.

One of the most efficient comparison-based search data structures in
the presence of memory hierachies (caching/paging) are (a,b)-trees.
In an (a,b)-tree, the number d of children of a non-root node is in {a..b}
and all leaves (search items) are at the same height of the tree. In case
the keys can be stored in the node itself (e.g. int or float), and suitable
values for (a, b), the child to follow from a node can be determined without
additional cache faults. In other words, by adapting parameters a and b
a favorable cache-aware data structure can be constructed.

After measuring performance and analyzing reasons, I found that the
favorable cache locality of the data structure is for a large part destroyed
by the way the Ocaml (native code) compiler (on i386) chooses the internal
representation of the nodes and trees. To find out about this quantitatively,
I then implemented (a,b)-trees again---this time working around the type
system (i.e. using Obj) in order to choose a more compact internal
representation of the trees. I will detail the layouts below.

It turns out that this made the program about 30% faster (and this also
means the result is only about a factor 3 slower than an optimized C++
implementation). This potential for optimization is a little too much to
ignore, but letting go of the type system is also not an attractive option.

Now my question is
a) Is this effect known?
b) Is it considered relevant?
c) What were the design considerations that have lead to the design
of the current compiler in this regard?

Any ideas? I can provide the measurements and also the programs
(provided my employer, Philips Research Eindhoven, agrees to that)
for anyone to have a look for him- or herself in case there is any interest.

Sebastian.



----------

1. Here is a piece of the 'good' Ocaml program defining the trees:

-- snip --
(* --- (a,b)-trees --- *)

assert ((!a >= 2) && (!b >= 2 * !a - 1));;

type ('key, 'value) item =
       {         k: 'key;
         mutable v: 'value
         (* <<more fiels, e.g. linking>> *)
       };;

type ('key, 'value) node =
       { mutable d: int;                      (* degree d in {2..b}  *)
                 s: 'key array;               (* splitters s[0..d-2] *)
                 c: ('key, 'value) tree array (* children  c[0..d-1] *)
       }

and  ('key, 'value) tree =
       | Empty                               (* the empty (a,b)-tree *)
       | Item of ('key, 'value) item         (* an item as leaf      *)
       | Node of ('key, 'value) node         (* a splitting node     *)
       ;;

type ('key, 'value) t =
       { mutable root: ('key, 'value) tree
         (* <<more fields, e.g. size, parameters>> *)
       };;
 
(* Invariants of the representation:
   (1) The key of an Item reachable through child c[i], i in {0..d-1},
       of a Node of degree d with splitters s[0..d-2] satisfies

         s[i-1] < key <= s[i].
 
       It is understood that s[-1] = -inf and s[d-1] = +inf,
       but neither of these sentinel values is ever stored in s.
   (2) All Items have the same depth (distance from the root).
   (3) The degree d of a a non-root Node is in {a..b}, where
       (a,b) are constant parameters with a >= 2, b >= 2 a - 1.
       The degree of the root Node is in {2..b}.
   (4) The arrays s and c of a Node have length b-1 and b.
   (5) If the tree contains Empty then it is equal to Empty.
*)
-- snip --

If I analyzed it correctly, then these definitions result in the following
internal representation for trees and nodes

('key, 'value) tree =
        Empty = (unboxed) int 0
        Item of i = pointer to (heap-allocated) block with tag 0 and single field i: .. item
        Node of n = point to block with tag 1 and single field n: ('key, 'value) node

('key, 'value) node =
        pointer to block with tag 0 and 3 fields for d, s, and c.
        d = unboxed int
        s, c = pointers to arrays of length b

------------

2. Here is a piece of the 'bad' Ocaml program defining the trees:

-- snip --
(* --- (a,b)-trees --- *)

assert ((!a >= 2) && (!b >= 2 * !a - 1));;

(* The internal representation of an (a,b)-tree is

     empty: the unboxed integer 0,
     item:  an array [-1, key, value], or
     node:  an array [d, s[0..d-2], 0..0, c[0..d-1], 0..0]
            where c[0] is located at position b.

   Here
     d         : degree, unboxed integer in {2..b}
     s[0..d-2] : splitters of type key
     c[0..d-1] : children of type tree

   Invariants of the representation:
   (1) The key of an item reachable through child c[i], i in {0..d-1},
       of a node of degree d with splitters s[0..d-2] satisfies

         s[i-1] < key <= s[i].
 
       It is understood that s[-1] = -inf and s[d-1] = +inf,
       but neither of these sentinel values is ever stored in s.
   (2) All items have the same depth (distance from the root).
   (3) The degree d of a a non-root node is in {a..b}, where
       (a,b) are constant parameters with a >= 2, b >= 2 a - 1.
       The degree of the root node is in {2..b}.
   (4) The arrays s and c of a node have length b-1 and b.
   (5) If the tree contains empty then it is equal to empty.

   Remark:
   * The internal representation escapes the type system. This
     is necessary in order to avoid intermediate pointers, e.g.
     from a node record to the array of splitters, which would
     otherwise produce lots of cache misses.
   * We do not use the tag bits of heap-allocated blocks to
     distinguish between items and nodes in order to avoid the
     overhead of calling the C-function caml_obj_tag(). Instead
     we store our own tag into field 0.
   * We use only Obj.field and Obj.set_field to access the objects.
*)

type ('key, 'value) tree = Obj.t;;

type ('key, 'value) t =
       { mutable root: ('key, 'value) tree
         (* <<more fields, e.g. size, parameters>> *)
       };;
-- snip --

The internal representation of trees and nodes is like this:

('key, 'value) tree =
        Empty = unboxed int 0
        Item = pointer to block with first field the unboxed int -1
        Node = pointer to node, i.e. block with first field unboxed int >=0

node =
        array of length 2 b consisting of degree d, d-1 splitters and
        d pointers to child-trees, where  d in {2..b}.

Remark: I chose not to use the tag bits of a block to distinguish items
and nodes because Obj.tag results in a C-call whereas getting the
degree-field results in a single machine instruction.        

----
Dr. Sebastian Egner
Senior Scientist
Philips Research Laboratories
Prof. Holstlaan 4 (WDC 1-051, 1st floor, room 51)
5656 AA Eindhoven
The Netherlands
tel:       +31 40 27-43166
fax:      +31 40 27-44004
email: sebastian.egner@philips.com