caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Francois Berenger <berenger@riken.jp>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Keeping A big data optimization problem functional
Date: Mon, 14 Apr 2014 14:43:35 +0900	[thread overview]
Message-ID: <534B7587.6090301@riken.jp> (raw)
In-Reply-To: <CAAYUt0P4AzcYPbNLYsHRpCnBGa=oJT3MRfKwZv-wE4vv-JXNkw@mail.gmail.com>

And that's when I realize that "Oh! No! We don't have bindings to 
Berkeley DB in OPAM!". That's unfortunate. :(

On 2014年04月14日 14:25, Arthur Breitman wrote:
> Hello all,
>
> This is my first post to the mailing list, I hope it will be relevant and
> in line with the posting guidelines. If not, please let me know and pardon
> my ignorance.
>
> I am working on a large problem that I'm attempting to tackle in Ocaml. It
> requires me to optimize a function over a tree
>
> - the tree has on the order of 100,000 to 1,000,000 nodes
> - each node on the tree contains on the order of 100kB to 1MB of data
>
> So total, I'm looking at 10GB to 1TB one data which is typically stored in
> a database on disk. The tree is loaded in memory using keys that represent
> entries in the database.
>
> Another structure I'm dealing with is a key value store, whose size is on
> the same order of magnitude as the size of the data above. Initially, this
> key-value store is empty.
>
> I am given an "apply" function that, for a given key-value store and for a
> given node in the tree produces a new key-value store. Typically by
> modifying a few keys in the key-value store. The amount of bits changed is
> on the same order of magnitude as the size of the node. When I reach a leaf
> of the tree, I can compute the optimality of that leaf from the state of
> the key-value store. I want to find the best leaf.
>
> Here's my problem...
> If I write the following in a imperative fashion, it seems straightforward
> enough. I should be doing a DFS over my tree, mutating my key=value store
> as I go, and saving on disk "backup" information to backtrack through the
> tree. This doesn't feel very elegant. I've been considering two solutions
>
> 1) since there is a one to one mapping between nodes in the tree and
> key-value stores, I could make my key-value store type to actually be the
> same as a node in the tree. I would put it in a module along with a
> reference to the node representing which key-value store is being
> represented on disk. The "apply" function would be straightforward and
> lazy, but whenever the keys of a key-value store are to be accessed, the
> following would happen
>
> a) Check if the key-value store on disk is the same as the one requested,
> if so read the key
> b) If that's not the case, find the least common ancestor in the tree,
> migrate the disk cache to that version then down to the one requested. This
> would require me to have an "undo" function to complement the "apply"
> function, which is an acceptable compromise.
>
> 2) the second solution would be to represent the key-value store as a
> binary tree. Since each operation changes only a few keys, I wouldn't be
> making a dumb copy of the entire key-value store at each step, only of the
> keys that matter. The problem is that though that works great in memory,
> I'm not sure how to make it work with an on-disk storage, a necessity given
> the size of it.
>
> Thanks,
> Arthurto
>


-- 
Best regards,
Francois Berenger.

  reply	other threads:[~2014-04-14  5:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14  5:25 Arthur Breitman
2014-04-14  5:43 ` Francois Berenger [this message]
2014-04-14  6:07   ` Arthur Breitman
2014-04-14  8:12 ` Goswin von Brederlow
2014-04-14 19:32   ` Arthur Breitman
2014-04-15  9:05     ` Goswin von Brederlow
2014-04-15 13:16       ` Arthur Breitman
2014-04-15 13:50     ` Thomas Gazagnaire
2014-04-16 11:02       ` Arthur Breitman
2014-04-14 20:38 ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=534B7587.6090301@riken.jp \
    --to=berenger@riken.jp \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).