From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 7C9347F722 for ; Tue, 15 Apr 2014 11:05:46 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=pra; client-ip=212.227.15.14; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of goswin-v-b@web.de) identity=mailfrom; client-ip=212.227.15.14; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="goswin-v-b@web.de"; x-conformance=sidf_compatible Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mout.web.de) identity=helo; client-ip=212.227.15.14; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="goswin-v-b@web.de"; x-sender="postmaster@mout.web.de"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AmIBAJP1TFPU4w8OnGdsb2JhbABYx0CBIBYOAQEBAQEGDQkJFCiCJQEBAQMBJxNECwsYCSUPBSg0h2cBAwkMxH0fKyKGTheOahaDDoEUBJR1g2yGPhKPKQ X-IPAS-Result: AmIBAJP1TFPU4w8OnGdsb2JhbABYx0CBIBYOAQEBAQEGDQkJFCiCJQEBAQMBJxNECwsYCSUPBSg0h2cBAwkMxH0fKyKGTheOahaDDoEUBJR1g2yGPhKPKQ X-IronPort-AV: E=Sophos;i="4.97,862,1389740400"; d="scan'208";a="68295245" Received: from mout.web.de ([212.227.15.14]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/DHE-RSA-AES256-SHA; 15 Apr 2014 11:05:46 +0200 Received: from frosties.localnet ([78.43.112.216]) by smtp.web.de (mrweb003) with ESMTPSA (Nemesis) id 0Lpeys-1XCsIw09lG-00fQGe for ; Tue, 15 Apr 2014 11:05:44 +0200 Received: from mrvn by frosties.localnet with local (Exim 4.80) (envelope-from ) id 1WZzJG-00032X-GW for caml-list@inria.fr; Tue, 15 Apr 2014 11:05:42 +0200 Date: Tue, 15 Apr 2014 11:05:42 +0200 From: Goswin von Brederlow To: caml-list@inria.fr Message-ID: <20140415090542.GB10918@frosties> References: <20140414081214.GD25532@frosties> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Provags-ID: V03:K0:UzEeayfUfbxcNnr5Ql57SObtJvnqRfJj/as4694LA5d1fC7j3IU ZnKoyg9Dexyl06QIqx4LcmxrXKm+F0uP5zRZMaGaVnxTI5znLi+SYPI+oEDMY5qruu6dAQi B2Dm/iFfFGcVAFqhzjMa4BNbabTijyM7pt696tOFGi4l5Xz887XZm1IrI6GjNPLFXx7ptna xkp92SVHr3OPoe0aQpfKQ== Subject: Re: [Caml-list] Keeping A big data optimization problem functional On Mon, Apr 14, 2014 at 01:32:18PM -0600, Arthur Breitman wrote: > Responses in-line > > On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow wrote: > > > On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote: > > > Hello all, > > > > Another structure I'm dealing with is a key value store, whose size is on > > > the same order of magnitude as the size of the data above. Initially, > > this > > > key-value store is empty. > > > > You probably don't expect to keep 1TB data in ram. Does the key-value > > store just reference values in the DB? > > > Indeed it does. That's part of the difficulty. > > > > I am given an "apply" function that, for a given key-value store and for > > a > > > given node in the tree produces a new key-value store. Typically by > > > modifying a few keys in the key-value store. The amount of bits changed > > is > > > on the same order of magnitude as the size of the node. When I reach a > > leaf > > > of the tree, I can compute the optimality of that leaf from the state of > > > the key-value store. I want to find the best leaf. > > > > Not sure what you mean there. Maybe you could give an example with a > > small data set (maybe 10 keys). > > > > It would be hard to make it a small example, but abstractly I have: > > type node = {value: string; children: node list option} A list can be [], no need for an option usualy. Or is there a difference between None and Some []? > module type Context = begin > type t > val apply: t -> node -> t > val get: t -> string -> string option > val set: t -> string -> string -> t > val fitness -> int > val empty -> t > end Shouldn't that be: module type Context = begin type t type db_ref val apply: t -> node -> t val get: t -> string -> db_ref option val set: t -> string -> db_ref -> t val fitness -> int val empty -> t end > Given a tree of nodes, and starting with Context.empty find the leaf node > with the highest fitness. You can think of every node in the tree as an > operation acting on the context. > > > > > > Wouldn't a simple Map work as your key-value store? The apply function > > would then modify the map, creating a new map that shares most of the > > unchanged data with the original map. That way keeping the results of > > many apply functions in memory should be possible and applying should > > be fast. > > > > It would if I didn't potentially have a terabyte of data. But you wouldn't put the data into the map, only the reference to the DB. And maybe have a cache of recently used DB references so you don't have to lookup the same object over and over. > > But if memory becomes a problem then you probably want to use some > > form of B-Tree to. > > > > MfG > > Goswin > -- > Arthur Breitman MfG Goswin