I have a few questions related to the
internal representation of
recursive data types, having implications
on the suitability of Ocaml
for highly efficient data structures.
One of the most efficient comparison-based
search data structures in
the presence of memory hierachies (caching/paging)
are (a,b)-trees.
In an (a,b)-tree, the number d of children
of a non-root node is in {a..b}
and all leaves (search items) are at
the same height of the tree. In case
the keys can be stored in the node itself
(e.g. int or float), and suitable
values for (a, b), the child to follow
from a node can be determined without
additional cache faults. In other words,
by adapting parameters a and b
a favorable cache-aware data structure
can be constructed.
After measuring performance and analyzing
reasons, I found that the
favorable cache locality of the data
structure is for a large part destroyed
by the way the Ocaml (native code) compiler
(on i386) chooses the internal
representation of the nodes and trees.
To find out about this quantitatively,
I then implemented (a,b)-trees again---this
time working around the type
system (i.e. using Obj) in order to
choose a more compact internal
representation of the trees. I will
detail the layouts below.
It turns out that this made the program
about 30% faster (and this also
means the result is only about a factor
3 slower than an optimized C++
implementation). This potential for
optimization is a little too much to
ignore, but letting go of the type system
is also not an attractive option.
Now my question is
a) Is this effect known?
b) Is it considered relevant?
c) What were the design considerations
that have lead to the design
of the current compiler in this regard?
Any ideas? I can provide the measurements
and also the programs
(provided my employer, Philips Research
Eindhoven, agrees to that)
for anyone to have a look for him- or
herself in case there is any interest.
Sebastian.
----------
1. Here is a piece of the 'good' Ocaml
program defining the trees:
-- snip --
(* --- (a,b)-trees --- *)
assert ((!a >= 2) && (!b >= 2 * !a -
1));;
type ('key, 'value) item =
{
k: 'key;
mutable v: 'value
(* <<more
fiels, e.g. linking>> *)
};;
type ('key, 'value) node =
{ mutable d: int;
(*
degree d in {2..b} *)
s: 'key array; (*
splitters s[0..d-2] *)
c: ('key, 'value) tree array (* children c[0..d-1] *)
}
and ('key, 'value) tree =
| Empty
(* the empty (a,b)-tree *)
| Item of ('key, 'value)
item (* an item as leaf *)
| Node of ('key, 'value)
node (* a splitting node *)
;;
type ('key, 'value) t =
{ mutable root: ('key,
'value) tree
(* <<more
fields, e.g. size, parameters>> *)
};;
(* Invariants of the representation:
(1) The key of an Item reachable through
child c[i], i in {0..d-1},
of a Node of degree d with
splitters s[0..d-2] satisfies
s[i-1] < key
<= s[i].
It is understood that s[-1]
= -inf and s[d-1] = +inf,
but neither of these sentinel
values is ever stored in s.
(2) All Items have the same depth (distance
from the root).
(3) The degree d of a a non-root Node
is in {a..b}, where
(a,b) are constant parameters
with a >= 2, b >= 2 a - 1.
The degree of the root
Node is in {2..b}.
(4) The arrays s and c of a Node have
length b-1 and b.
(5) If the tree contains Empty then it
is equal to Empty.
*)
-- snip --
If I analyzed it correctly, then these
definitions result in the following
internal representation for trees and
nodes
('key, 'value) tree =
Empty
= (unboxed) int 0
Item
of i = pointer to (heap-allocated) block with tag 0 and single field i:
.. item
Node
of n = point to block with tag 1 and single field n: ('key, 'value) node
('key, 'value) node =
pointer
to block with tag 0 and 3 fields for d, s, and c.
d
= unboxed int
s,
c = pointers to arrays of length b
------------
2. Here is a piece of the 'bad' Ocaml
program defining the trees:
-- snip --
(* --- (a,b)-trees --- *)
assert ((!a >= 2) && (!b >= 2 * !a -
1));;
(* The internal representation of an (a,b)-tree is
empty: the unboxed integer 0,
item: an array [-1, key,
value], or
node: an array [d, s[0..d-2],
0..0, c[0..d-1], 0..0]
where c[0]
is located at position b.
Here
d
: degree, unboxed integer in {2..b}
s[0..d-2] : splitters of type
key
c[0..d-1] : children of type tree
Invariants of the representation:
(1) The key of an item reachable through
child c[i], i in {0..d-1},
of a node of degree d with
splitters s[0..d-2] satisfies
s[i-1] < key
<= s[i].
It is understood that s[-1]
= -inf and s[d-1] = +inf,
but neither of these sentinel
values is ever stored in s.
(2) All items have the same depth (distance
from the root).
(3) The degree d of a a non-root node
is in {a..b}, where
(a,b) are constant parameters
with a >= 2, b >= 2 a - 1.
The degree of the root
node is in {2..b}.
(4) The arrays s and c of a node have
length b-1 and b.
(5) If the tree contains empty then it
is equal to empty.
Remark:
* The internal representation escapes
the type system. This
is necessary in order to avoid
intermediate pointers, e.g.
from a node record to the array
of splitters, which would
otherwise produce lots of cache
misses.
* We do not use the tag bits of heap-allocated
blocks to
distinguish between items and
nodes in order to avoid the
overhead of calling the C-function
caml_obj_tag(). Instead
we store our own tag into field
0.
* We use only Obj.field and Obj.set_field
to access the objects.
*)
type ('key, 'value) tree = Obj.t;;
type ('key, 'value) t =
{ mutable root: ('key,
'value) tree
(* <<more
fields, e.g. size, parameters>> *)
};;
-- snip --
The internal representation of trees
and nodes is like this:
('key, 'value) tree =
Empty
= unboxed int 0
Item
= pointer to block with first field the unboxed int -1
Node
= pointer to node, i.e. block with first field unboxed int >=0
node =
array
of length 2 b consisting of degree d, d-1 splitters and
d
pointers to child-trees, where d in {2..b}.
Remark: I chose not to use the tag bits
of a block to distinguish items
and nodes because Obj.tag results in
a C-call whereas getting the
degree-field results in a single machine
instruction.
----
Dr. Sebastian Egner
Senior Scientist
Philips Research Laboratories
Prof. Holstlaan 4 (WDC 1-051, 1st floor, room 51)
5656 AA Eindhoven
The Netherlands
tel: +31 40 27-43166
fax: +31 40 27-44004
email: sebastian.egner@philips.com