On Apr 15, 2014 5:06 AM, "Goswin von Brederlow" <goswin-v-b@web.de> wrote:
>
> On Mon, Apr 14, 2014 at 01:32:18PM -0600, Arthur Breitman wrote:
> > Responses in-line
> >
> > On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow <goswin-v-b@web.de>wrote:
> >
> > > On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote:
> > > > Hello all,
> > >
> > > Another structure I'm dealing with is a key value store, whose size is on
> > > > the same order of magnitude as the size of the data above. Initially,
> > > this
> > > > key-value store is empty.
> > >
> > > You probably don't expect to keep 1TB data in ram. Does the key-value
> > > store just reference values in the DB?
> >
> >
> > Indeed it does. That's part of the difficulty.
> >
> >
> >  > I am given an "apply" function that, for a given key-value store and for
> > > a
> > > > given node in the tree produces a new key-value store. Typically by
> > > > modifying a few keys in the key-value store. The amount of bits changed
> > > is
> > > > on the same order of magnitude as the size of the node. When I reach a
> > > leaf
> > > > of the tree, I can compute the optimality of that leaf from the state of
> > > > the key-value store. I want to find the best leaf.
> > >
> > > Not sure what you mean there. Maybe you could give an example with a
> > > small data set (maybe 10 keys).
> > >
> >
> > It would be hard to make it a small example, but abstractly I have:
> >
> > type node = {value: string; children: node list option}
>
> A list can be [], no need for an option usualy. Or is there a
> difference between None and Some []?

1) yes the "option" part is useless, I missed it in the wrote

> > module type Context = begin
> >   type t
> >   val apply: t -> node -> t
> >   val get: t -> string -> string option
> >   val set: t -> string -> string -> t
> >   val fitness -> int
> >   val empty -> t
> > end
>
> Shouldn't that be:
>
> module type Context = begin
>   type t
>   type db_ref
>   val apply: t -> node -> t
>   val get: t -> string -> db_ref option
>   val set: t -> string -> db_ref -> t
>   val fitness -> int
>   val empty -> t
> end
>

2) I'm not sure what you mean by a "db_ref", but I don't think the module signature you describe is what I want. I stand by the signature I wrote. What I want is a functional map that just happens to be so large it needs a disk database to be stored.


> > Given a tree of nodes, and starting with Context.empty find the leaf node
> > with the highest fitness. You can think of every node in the tree as an
> > operation acting on the context.
> >
> >
> > >
> > > Wouldn't a simple Map work as your key-value store? The apply function
> > > would then modify the map, creating a new map that shares most of the
> > > unchanged data with the original map. That way keeping the results of
> > > many apply functions in memory should be possible and applying should
> > > be fast.
> > >
> >
> > It would if I didn't potentially have a terabyte of data.
>
> But you wouldn't put the data into the map, only the reference to the
> DB.

That would work if I had a small number of keys with large value types. Instead I have on the order of a billion keyvalue pairs.

>
> > > But if memory becomes a problem then you probably want to use some
> > > form of B-Tree to.
> > >
> > > MfG
> > >         Goswin
> > --
> > Arthur Breitman
>
> MfG
>         Goswin
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs