caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Arthur Breitman <arthur.breitman@gmail.com>
To: Goswin von Brederlow <goswin-v-b@web.de>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Keeping A big data optimization problem functional
Date: Tue, 15 Apr 2014 07:16:22 -0600	[thread overview]
Message-ID: <CAAYUt0PLMS0j9dvQvMtAc-kOhNpk8oFCskaNfKiRKwspKFLOeg@mail.gmail.com> (raw)
In-Reply-To: <20140415090542.GB10918@frosties>

[-- Attachment #1: Type: text/plain, Size: 3613 bytes --]

On Apr 15, 2014 5:06 AM, "Goswin von Brederlow" <goswin-v-b@web.de> wrote:
>
> On Mon, Apr 14, 2014 at 01:32:18PM -0600, Arthur Breitman wrote:
> > Responses in-line
> >
> > On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow <goswin-v-b@web.de
>wrote:
> >
> > > On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote:
> > > > Hello all,
> > >
> > > Another structure I'm dealing with is a key value store, whose size
is on
> > > > the same order of magnitude as the size of the data above.
Initially,
> > > this
> > > > key-value store is empty.
> > >
> > > You probably don't expect to keep 1TB data in ram. Does the key-value
> > > store just reference values in the DB?
> >
> >
> > Indeed it does. That's part of the difficulty.
> >
> >
> >  > I am given an "apply" function that, for a given key-value store and
for
> > > a
> > > > given node in the tree produces a new key-value store. Typically by
> > > > modifying a few keys in the key-value store. The amount of bits
changed
> > > is
> > > > on the same order of magnitude as the size of the node. When I
reach a
> > > leaf
> > > > of the tree, I can compute the optimality of that leaf from the
state of
> > > > the key-value store. I want to find the best leaf.
> > >
> > > Not sure what you mean there. Maybe you could give an example with a
> > > small data set (maybe 10 keys).
> > >
> >
> > It would be hard to make it a small example, but abstractly I have:
> >
> > type node = {value: string; children: node list option}
>
> A list can be [], no need for an option usualy. Or is there a
> difference between None and Some []?

1) yes the "option" part is useless, I missed it in the wrote

> > module type Context = begin
> >   type t
> >   val apply: t -> node -> t
> >   val get: t -> string -> string option
> >   val set: t -> string -> string -> t
> >   val fitness -> int
> >   val empty -> t
> > end
>
> Shouldn't that be:
>
> module type Context = begin
>   type t
>   type db_ref
>   val apply: t -> node -> t
>   val get: t -> string -> db_ref option
>   val set: t -> string -> db_ref -> t
>   val fitness -> int
>   val empty -> t
> end
>

2) I'm not sure what you mean by a "db_ref", but I don't think the module
signature you describe is what I want. I stand by the signature I wrote.
What I want is a functional map that just happens to be so large it needs a
disk database to be stored.


> > Given a tree of nodes, and starting with Context.empty find the leaf
node
> > with the highest fitness. You can think of every node in the tree as an
> > operation acting on the context.
> >
> >
> > >
> > > Wouldn't a simple Map work as your key-value store? The apply function
> > > would then modify the map, creating a new map that shares most of the
> > > unchanged data with the original map. That way keeping the results of
> > > many apply functions in memory should be possible and applying should
> > > be fast.
> > >
> >
> > It would if I didn't potentially have a terabyte of data.
>
> But you wouldn't put the data into the map, only the reference to the
> DB.

That would work if I had a small number of keys with large value types.
Instead I have on the order of a billion keyvalue pairs.

>
> > > But if memory becomes a problem then you probably want to use some
> > > form of B-Tree to.
> > >
> > > MfG
> > >         Goswin
> > --
> > Arthur Breitman
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

[-- Attachment #2: Type: text/html, Size: 5264 bytes --]

  reply	other threads:[~2014-04-15 13:16 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-14  5:25 Arthur Breitman
2014-04-14  5:43 ` Francois Berenger
2014-04-14  6:07   ` Arthur Breitman
2014-04-14  8:12 ` Goswin von Brederlow
2014-04-14 19:32   ` Arthur Breitman
2014-04-15  9:05     ` Goswin von Brederlow
2014-04-15 13:16       ` Arthur Breitman [this message]
2014-04-15 13:50     ` Thomas Gazagnaire
2014-04-16 11:02       ` Arthur Breitman
2014-04-14 20:38 ` Richard W.M. Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAYUt0PLMS0j9dvQvMtAc-kOhNpk8oFCskaNfKiRKwspKFLOeg@mail.gmail.com \
    --to=arthur.breitman@gmail.com \
    --cc=caml-list@inria.fr \
    --cc=goswin-v-b@web.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).