On Apr 15, 2014 5:06 AM, "Goswin von Brederlow" wrote: > > On Mon, Apr 14, 2014 at 01:32:18PM -0600, Arthur Breitman wrote: > > Responses in-line > > > > On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow wrote: > > > > > On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote: > > > > Hello all, > > > > > > Another structure I'm dealing with is a key value store, whose size is on > > > > the same order of magnitude as the size of the data above. Initially, > > > this > > > > key-value store is empty. > > > > > > You probably don't expect to keep 1TB data in ram. Does the key-value > > > store just reference values in the DB? > > > > > > Indeed it does. That's part of the difficulty. > > > > > > > I am given an "apply" function that, for a given key-value store and for > > > a > > > > given node in the tree produces a new key-value store. Typically by > > > > modifying a few keys in the key-value store. The amount of bits changed > > > is > > > > on the same order of magnitude as the size of the node. When I reach a > > > leaf > > > > of the tree, I can compute the optimality of that leaf from the state of > > > > the key-value store. I want to find the best leaf. > > > > > > Not sure what you mean there. Maybe you could give an example with a > > > small data set (maybe 10 keys). > > > > > > > It would be hard to make it a small example, but abstractly I have: > > > > type node = {value: string; children: node list option} > > A list can be [], no need for an option usualy. Or is there a > difference between None and Some []? 1) yes the "option" part is useless, I missed it in the wrote > > module type Context = begin > > type t > > val apply: t -> node -> t > > val get: t -> string -> string option > > val set: t -> string -> string -> t > > val fitness -> int > > val empty -> t > > end > > Shouldn't that be: > > module type Context = begin > type t > type db_ref > val apply: t -> node -> t > val get: t -> string -> db_ref option > val set: t -> string -> db_ref -> t > val fitness -> int > val empty -> t > end > 2) I'm not sure what you mean by a "db_ref", but I don't think the module signature you describe is what I want. I stand by the signature I wrote. What I want is a functional map that just happens to be so large it needs a disk database to be stored. > > Given a tree of nodes, and starting with Context.empty find the leaf node > > with the highest fitness. You can think of every node in the tree as an > > operation acting on the context. > > > > > > > > > > Wouldn't a simple Map work as your key-value store? The apply function > > > would then modify the map, creating a new map that shares most of the > > > unchanged data with the original map. That way keeping the results of > > > many apply functions in memory should be possible and applying should > > > be fast. > > > > > > > It would if I didn't potentially have a terabyte of data. > > But you wouldn't put the data into the map, only the reference to the > DB. That would work if I had a small number of keys with large value types. Instead I have on the order of a billion keyvalue pairs. > > > > But if memory becomes a problem then you probably want to use some > > > form of B-Tree to. > > > > > > MfG > > > Goswin > > -- > > Arthur Breitman > > MfG > Goswin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs