I have a few questions related to the internal representation of recursive data types, having implications on the suitability of Ocaml for highly efficient data structures. One of the most efficient comparison-based search data structures in the presence of memory hierachies (caching/paging) are (a,b)-trees. In an (a,b)-tree, the number d of children of a non-root node is in {a..b} and all leaves (search items) are at the same height of the tree. In case the keys can be stored in the node itself (e.g. int or float), and suitable values for (a, b), the child to follow from a node can be determined without additional cache faults. In other words, by adapting parameters a and b a favorable cache-aware data structure can be constructed. After measuring performance and analyzing reasons, I found that the favorable cache locality of the data structure is for a large part destroyed by the way the Ocaml (native code) compiler (on i386) chooses the internal representation of the nodes and trees. To find out about this quantitatively, I then implemented (a,b)-trees again---this time working around the type system (i.e. using Obj) in order to choose a more compact internal representation of the trees. I will detail the layouts below. It turns out that this made the program about 30% faster (and this also means the result is only about a factor 3 slower than an optimized C++ implementation). This potential for optimization is a little too much to ignore, but letting go of the type system is also not an attractive option. Now my question is a) Is this effect known? b) Is it considered relevant? c) What were the design considerations that have lead to the design of the current compiler in this regard? Any ideas? I can provide the measurements and also the programs (provided my employer, Philips Research Eindhoven, agrees to that) for anyone to have a look for him- or herself in case there is any interest. Sebastian. ---------- 1. Here is a piece of the 'good' Ocaml program defining the trees: -- snip -- (* --- (a,b)-trees --- *) assert ((!a >= 2) && (!b >= 2 * !a - 1));; type ('key, 'value) item = { k: 'key; mutable v: 'value (* <> *) };; type ('key, 'value) node = { mutable d: int; (* degree d in {2..b} *) s: 'key array; (* splitters s[0..d-2] *) c: ('key, 'value) tree array (* children c[0..d-1] *) } and ('key, 'value) tree = | Empty (* the empty (a,b)-tree *) | Item of ('key, 'value) item (* an item as leaf *) | Node of ('key, 'value) node (* a splitting node *) ;; type ('key, 'value) t = { mutable root: ('key, 'value) tree (* <> *) };; (* Invariants of the representation: (1) The key of an Item reachable through child c[i], i in {0..d-1}, of a Node of degree d with splitters s[0..d-2] satisfies s[i-1] < key <= s[i]. It is understood that s[-1] = -inf and s[d-1] = +inf, but neither of these sentinel values is ever stored in s. (2) All Items have the same depth (distance from the root). (3) The degree d of a a non-root Node is in {a..b}, where (a,b) are constant parameters with a >= 2, b >= 2 a - 1. The degree of the root Node is in {2..b}. (4) The arrays s and c of a Node have length b-1 and b. (5) If the tree contains Empty then it is equal to Empty. *) -- snip -- If I analyzed it correctly, then these definitions result in the following internal representation for trees and nodes ('key, 'value) tree = Empty = (unboxed) int 0 Item of i = pointer to (heap-allocated) block with tag 0 and single field i: .. item Node of n = point to block with tag 1 and single field n: ('key, 'value) node ('key, 'value) node = pointer to block with tag 0 and 3 fields for d, s, and c. d = unboxed int s, c = pointers to arrays of length b ------------ 2. Here is a piece of the 'bad' Ocaml program defining the trees: -- snip -- (* --- (a,b)-trees --- *) assert ((!a >= 2) && (!b >= 2 * !a - 1));; (* The internal representation of an (a,b)-tree is empty: the unboxed integer 0, item: an array [-1, key, value], or node: an array [d, s[0..d-2], 0..0, c[0..d-1], 0..0] where c[0] is located at position b. Here d : degree, unboxed integer in {2..b} s[0..d-2] : splitters of type key c[0..d-1] : children of type tree Invariants of the representation: (1) The key of an item reachable through child c[i], i in {0..d-1}, of a node of degree d with splitters s[0..d-2] satisfies s[i-1] < key <= s[i]. It is understood that s[-1] = -inf and s[d-1] = +inf, but neither of these sentinel values is ever stored in s. (2) All items have the same depth (distance from the root). (3) The degree d of a a non-root node is in {a..b}, where (a,b) are constant parameters with a >= 2, b >= 2 a - 1. The degree of the root node is in {2..b}. (4) The arrays s and c of a node have length b-1 and b. (5) If the tree contains empty then it is equal to empty. Remark: * The internal representation escapes the type system. This is necessary in order to avoid intermediate pointers, e.g. from a node record to the array of splitters, which would otherwise produce lots of cache misses. * We do not use the tag bits of heap-allocated blocks to distinguish between items and nodes in order to avoid the overhead of calling the C-function caml_obj_tag(). Instead we store our own tag into field 0. * We use only Obj.field and Obj.set_field to access the objects. *) type ('key, 'value) tree = Obj.t;; type ('key, 'value) t = { mutable root: ('key, 'value) tree (* <> *) };; -- snip -- The internal representation of trees and nodes is like this: ('key, 'value) tree = Empty = unboxed int 0 Item = pointer to block with first field the unboxed int -1 Node = pointer to node, i.e. block with first field unboxed int >=0 node = array of length 2 b consisting of degree d, d-1 splitters and d pointers to child-trees, where d in {2..b}. Remark: I chose not to use the tag bits of a block to distinguish items and nodes because Obj.tag results in a C-call whereas getting the degree-field results in a single machine instruction. ---- Dr. Sebastian Egner Senior Scientist Philips Research Laboratories Prof. Holstlaan 4 (WDC 1-051, 1st floor, room 51) 5656 AA Eindhoven The Netherlands tel: +31 40 27-43166 fax: +31 40 27-44004 email: sebastian.egner@philips.com