caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Keeping A big data optimization problem functional
Date: Tue, 15 Apr 2014 11:05:42 +0200	[thread overview]
Message-ID: <20140415090542.GB10918@frosties> (raw)
In-Reply-To: <CAAYUt0OA_59j+E70TWAgDq+2dWStFOLgNa=NJku-jtQ7y_7nXA@mail.gmail.com>

On Mon, Apr 14, 2014 at 01:32:18PM -0600, Arthur Breitman wrote:
> Responses in-line
> 
> On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow <goswin-v-b@web.de>wrote:
> 
> > On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote:
> > > Hello all,
> >
> > Another structure I'm dealing with is a key value store, whose size is on
> > > the same order of magnitude as the size of the data above. Initially,
> > this
> > > key-value store is empty.
> >
> > You probably don't expect to keep 1TB data in ram. Does the key-value
> > store just reference values in the DB?
> 
> 
> Indeed it does. That's part of the difficulty.
> 
> 
>  > I am given an "apply" function that, for a given key-value store and for
> > a
> > > given node in the tree produces a new key-value store. Typically by
> > > modifying a few keys in the key-value store. The amount of bits changed
> > is
> > > on the same order of magnitude as the size of the node. When I reach a
> > leaf
> > > of the tree, I can compute the optimality of that leaf from the state of
> > > the key-value store. I want to find the best leaf.
> >
> > Not sure what you mean there. Maybe you could give an example with a
> > small data set (maybe 10 keys).
> >
> 
> It would be hard to make it a small example, but abstractly I have:
> 
> type node = {value: string; children: node list option}

A list can be [], no need for an option usualy. Or is there a
difference between None and Some []?

> module type Context = begin
>   type t
>   val apply: t -> node -> t
>   val get: t -> string -> string option
>   val set: t -> string -> string -> t
>   val fitness -> int
>   val empty -> t
> end

Shouldn't that be:

module type Context = begin
  type t
  type db_ref
  val apply: t -> node -> t
  val get: t -> string -> db_ref option
  val set: t -> string -> db_ref -> t
  val fitness -> int
  val empty -> t
end
 
> Given a tree of nodes, and starting with Context.empty find the leaf node
> with the highest fitness. You can think of every node in the tree as an
> operation acting on the context.
> 
> 
> >
> > Wouldn't a simple Map work as your key-value store? The apply function
> > would then modify the map, creating a new map that shares most of the
> > unchanged data with the original map. That way keeping the results of
> > many apply functions in memory should be possible and applying should
> > be fast.
> >
> 
> It would if I didn't potentially have a terabyte of data.

But you wouldn't put the data into the map, only the reference to the
DB. And maybe have a cache of recently used DB references so you don't
have to lookup the same object over and over.

> > But if memory becomes a problem then you probably want to use some
> > form of B-Tree to.
> >
> > MfG
> >         Goswin
> -- 
> Arthur Breitman

MfG
	Goswin

  reply	other threads:[~2014-04-15  9:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14  5:25 Arthur Breitman
2014-04-14  5:43 ` Francois Berenger
2014-04-14  6:07   ` Arthur Breitman
2014-04-14  8:12 ` Goswin von Brederlow
2014-04-14 19:32   ` Arthur Breitman
2014-04-15  9:05     ` Goswin von Brederlow [this message]
2014-04-15 13:16       ` Arthur Breitman
2014-04-15 13:50     ` Thomas Gazagnaire
2014-04-16 11:02       ` Arthur Breitman
2014-04-14 20:38 ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140415090542.GB10918@frosties \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).