caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Arthur Breitman <arthur.breitman@gmail.com>
To: Goswin von Brederlow <goswin-v-b@web.de>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Keeping A big data optimization problem functional
Date: Mon, 14 Apr 2014 13:32:18 -0600	[thread overview]
Message-ID: <CAAYUt0OA_59j+E70TWAgDq+2dWStFOLgNa=NJku-jtQ7y_7nXA@mail.gmail.com> (raw)
In-Reply-To: <20140414081214.GD25532@frosties>

[-- Attachment #1: Type: text/plain, Size: 2326 bytes --]

Responses in-line

On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow <goswin-v-b@web.de>wrote:

> On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote:
> > Hello all,
>
> Another structure I'm dealing with is a key value store, whose size is on
> > the same order of magnitude as the size of the data above. Initially,
> this
> > key-value store is empty.
>
> You probably don't expect to keep 1TB data in ram. Does the key-value
> store just reference values in the DB?


Indeed it does. That's part of the difficulty.


 > I am given an "apply" function that, for a given key-value store and for
> a
> > given node in the tree produces a new key-value store. Typically by
> > modifying a few keys in the key-value store. The amount of bits changed
> is
> > on the same order of magnitude as the size of the node. When I reach a
> leaf
> > of the tree, I can compute the optimality of that leaf from the state of
> > the key-value store. I want to find the best leaf.
>
> Not sure what you mean there. Maybe you could give an example with a
> small data set (maybe 10 keys).
>

It would be hard to make it a small example, but abstractly I have:

type node = {value: string; children: node list option}
module type Context = begin
  type t
  val apply: t -> node -> t
  val get: t -> string -> string option
  val set: t -> string -> string -> t
  val fitness -> int
  val empty -> t
end

Given a tree of nodes, and starting with Context.empty find the leaf node
with the highest fitness. You can think of every node in the tree as an
operation acting on the context.


>
> Wouldn't a simple Map work as your key-value store? The apply function
> would then modify the map, creating a new map that shares most of the
> unchanged data with the original map. That way keeping the results of
> many apply functions in memory should be possible and applying should
> be fast.
>

It would if I didn't potentially have a terabyte of data.



>
> But if memory becomes a problem then you probably want to use some
> form of B-Tree to.
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>



-- 
Arthur Breitman

[-- Attachment #2: Type: text/html, Size: 4022 bytes --]

  reply	other threads:[~2014-04-14 19:32 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14  5:25 Arthur Breitman
2014-04-14  5:43 ` Francois Berenger
2014-04-14  6:07   ` Arthur Breitman
2014-04-14  8:12 ` Goswin von Brederlow
2014-04-14 19:32   ` Arthur Breitman [this message]
2014-04-15  9:05     ` Goswin von Brederlow
2014-04-15 13:16       ` Arthur Breitman
2014-04-15 13:50     ` Thomas Gazagnaire
2014-04-16 11:02       ` Arthur Breitman
2014-04-14 20:38 ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAYUt0OA_59j+E70TWAgDq+2dWStFOLgNa=NJku-jtQ7y_7nXA@mail.gmail.com' \
    --to=arthur.breitman@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=goswin-v-b@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).