No but it does have bindings for leveldb which does just what I want. The question is, how do I hide those dirty db mutations.


On Sun, Apr 13, 2014 at 11:43 PM, Francois Berenger <berenger@riken.jp> wrote:
And that's when I realize that "Oh! No! We don't have bindings to Berkeley DB in OPAM!". That's unfortunate. :(


On 2014年04月14日 14:25, Arthur Breitman wrote:
Hello all,

This is my first post to the mailing list, I hope it will be relevant and
in line with the posting guidelines. If not, please let me know and pardon
my ignorance.

I am working on a large problem that I'm attempting to tackle in Ocaml. It
requires me to optimize a function over a tree

- the tree has on the order of 100,000 to 1,000,000 nodes
- each node on the tree contains on the order of 100kB to 1MB of data

So total, I'm looking at 10GB to 1TB one data which is typically stored in
a database on disk. The tree is loaded in memory using keys that represent
entries in the database.

Another structure I'm dealing with is a key value store, whose size is on
the same order of magnitude as the size of the data above. Initially, this
key-value store is empty.

I am given an "apply" function that, for a given key-value store and for a
given node in the tree produces a new key-value store. Typically by
modifying a few keys in the key-value store. The amount of bits changed is
on the same order of magnitude as the size of the node. When I reach a leaf
of the tree, I can compute the optimality of that leaf from the state of
the key-value store. I want to find the best leaf.

Here's my problem...
If I write the following in a imperative fashion, it seems straightforward
enough. I should be doing a DFS over my tree, mutating my key=value store
as I go, and saving on disk "backup" information to backtrack through the
tree. This doesn't feel very elegant. I've been considering two solutions

1) since there is a one to one mapping between nodes in the tree and
key-value stores, I could make my key-value store type to actually be the
same as a node in the tree. I would put it in a module along with a
reference to the node representing which key-value store is being
represented on disk. The "apply" function would be straightforward and
lazy, but whenever the keys of a key-value store are to be accessed, the
following would happen

a) Check if the key-value store on disk is the same as the one requested,
if so read the key
b) If that's not the case, find the least common ancestor in the tree,
migrate the disk cache to that version then down to the one requested. This
would require me to have an "undo" function to complement the "apply"
function, which is an acceptable compromise.

2) the second solution would be to represent the key-value store as a
binary tree. Since each operation changes only a few keys, I wouldn't be
making a dumb copy of the entire key-value store at each step, only of the
keys that matter. The problem is that though that works great in memory,
I'm not sure how to make it work with an on-disk storage, a necessity given
the size of it.

Thanks,
Arthurto



--
Best regards,
Francois Berenger.

--
Caml-list mailing list.  Subscription management and archives:
https://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



--
Arthur Breitman