From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@sympa.inria.fr Delivered-To: caml-list@sympa.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by sympa.inria.fr (Postfix) with ESMTPS id 75D977F720 for ; Mon, 14 Apr 2014 21:32:39 +0200 (CEST) Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of arthur.breitman@gmail.com) identity=pra; client-ip=209.85.217.180; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="arthur.breitman@gmail.com"; x-sender="arthur.breitman@gmail.com"; x-conformance=sidf_compatible Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of arthur.breitman@gmail.com designates 209.85.217.180 as permitted sender) identity=mailfrom; client-ip=209.85.217.180; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="arthur.breitman@gmail.com"; x-sender="arthur.breitman@gmail.com"; x-conformance=sidf_compatible; x-record-type="v=spf1" Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender authenticity information available from domain of postmaster@mail-lb0-f180.google.com) identity=helo; client-ip=209.85.217.180; receiver=mail2-smtp-roc.national.inria.fr; envelope-from="arthur.breitman@gmail.com"; x-sender="postmaster@mail-lb0-f180.google.com"; x-conformance=sidf_compatible X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AiMDAA83TFPRVdm0Ymdsb2JhbABZgwY7V4MQqR2ODIh0gRwIFgMWCwsJPIIlAQEBAwEjBBkBGxILAQMBCwYDAgsNDR0CAiIBEQEFAQoSBhMSh1UBAwkIDYxdkAuMDlGDDpdiChknAwoVT4V/EQEFDI5eBAeCb4FJBIlcixiDbYE2jx4YKYR+Hg X-IPAS-Result: AiMDAA83TFPRVdm0Ymdsb2JhbABZgwY7V4MQqR2ODIh0gRwIFgMWCwsJPIIlAQEBAwEjBBkBGxILAQMBCwYDAgsNDR0CAiIBEQEFAQoSBhMSh1UBAwkIDYxdkAuMDlGDDpdiChknAwoVT4V/EQEFDI5eBAeCb4FJBIlcixiDbYE2jx4YKYR+Hg X-IronPort-AV: E=Sophos;i="4.97,858,1389740400"; d="scan'208";a="68212473" Received: from mail-lb0-f180.google.com ([209.85.217.180]) by mail2-smtp-roc.national.inria.fr with ESMTP; 14 Apr 2014 21:32:38 +0200 Received: by mail-lb0-f180.google.com with SMTP id 10so5961683lbg.39 for ; Mon, 14 Apr 2014 12:32:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=CzGVTC3HpK6yhA3JritdaPfJLmiMtBYuClv1v2LIsvw=; b=SORdsE1662mvlhkycd8oKl4z33RVSSWfluQiFm1sAf0q4RBXTxHm5E/ElfgP9z3PQT dc1XX6iH4rmvZPGYbZKD3sjWf1yNRhAMJJpvTrHQ9eUh0kYcmo4BOr28IB9gZUMZKAYc MHZPuhplzb6fb24MNO2gso73vfQ8thpMksmAsZkOk/l4De2Tep74zvGeCGe6bfSQMD3d ap79yiDYy9zFnLPQY8z95Tw+AbTBZECkwrXptsgxSXFDd9hBQR95SvZeYVGnHbrL+EdP 46tNla8zluWx4i6WK/jNC5HyKe6RStRVO02lf84PXwEYlIxUWSScrmc1Z7FX6g2z8zAC gdhQ== X-Received: by 10.153.4.134 with SMTP id ce6mr30738061lad.21.1397503958656; Mon, 14 Apr 2014 12:32:38 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.158.130 with HTTP; Mon, 14 Apr 2014 12:32:18 -0700 (PDT) In-Reply-To: <20140414081214.GD25532@frosties> References: <20140414081214.GD25532@frosties> From: Arthur Breitman Date: Mon, 14 Apr 2014 13:32:18 -0600 Message-ID: To: Goswin von Brederlow Cc: caml-list Content-Type: multipart/alternative; boundary=001a1134de96072a4704f705bff4 Subject: Re: [Caml-list] Keeping A big data optimization problem functional --001a1134de96072a4704f705bff4 Content-Type: text/plain; charset=UTF-8 Responses in-line On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow wrote: > On Sun, Apr 13, 2014 at 11:25:18PM -0600, Arthur Breitman wrote: > > Hello all, > > Another structure I'm dealing with is a key value store, whose size is on > > the same order of magnitude as the size of the data above. Initially, > this > > key-value store is empty. > > You probably don't expect to keep 1TB data in ram. Does the key-value > store just reference values in the DB? Indeed it does. That's part of the difficulty. > I am given an "apply" function that, for a given key-value store and for > a > > given node in the tree produces a new key-value store. Typically by > > modifying a few keys in the key-value store. The amount of bits changed > is > > on the same order of magnitude as the size of the node. When I reach a > leaf > > of the tree, I can compute the optimality of that leaf from the state of > > the key-value store. I want to find the best leaf. > > Not sure what you mean there. Maybe you could give an example with a > small data set (maybe 10 keys). > It would be hard to make it a small example, but abstractly I have: type node = {value: string; children: node list option} module type Context = begin type t val apply: t -> node -> t val get: t -> string -> string option val set: t -> string -> string -> t val fitness -> int val empty -> t end Given a tree of nodes, and starting with Context.empty find the leaf node with the highest fitness. You can think of every node in the tree as an operation acting on the context. > > Wouldn't a simple Map work as your key-value store? The apply function > would then modify the map, creating a new map that shares most of the > unchanged data with the original map. That way keeping the results of > many apply functions in memory should be possible and applying should > be fast. > It would if I didn't potentially have a terabyte of data. > > But if memory becomes a problem then you probably want to use some > form of B-Tree to. > > MfG > Goswin > > -- > Caml-list mailing list. Subscription management and archives: > https://sympa.inria.fr/sympa/arc/caml-list > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > -- Arthur Breitman --001a1134de96072a4704f705bff4 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Responses in-line

On Mon, Apr 14, 2014 at 2:12 AM, Goswin von Brederlow <go= swin-v-b@web.de> wrote:
On Sun, Apr 13, 2014 at 11:2= 5:18PM -0600, Arthur Breitman wrote:
> Hello all,=C2=A0
> Another structure I'm dealing with is a key value stor= e, whose size is on
> the same order of magnitude as the size of the data above. Initially, = this
> key-value store is empty.

You probably don't expect to keep 1TB data in ram. Does the key-v= alue
store just reference values in the DB?
=C2=A0
In= deed it does. That's part of the difficulty.

<= br>
> I am given an "apply" function that, for a given key-value s= tore and for a
> given node in the tree produces a new key-value store. Typically by
> modifying a few keys in the key-value store. The amount of bits change= d is
> on the same order of magnitude as the size of the node. When I reach a= leaf
> of the tree, I can compute the optimality of that leaf from the state = of
> the key-value store. I want to find the best leaf.

Not sure what you mean there. Maybe you could give an example with a<= br> small data set (maybe 10 keys).

It woul= d be hard to make it a small example, but abstractly I have:

=
type node =3D {value: string; children: node list option} =C2=A0=
module type Context =3D begin
=C2=A0 type t
=C2=A0= val apply: t -> node -> t=C2=A0
=C2=A0 val get: t -> st= ring -> string option=C2=A0
=C2=A0 val set: t -> string -&g= t; string -> t
=C2=A0 val fitness -> int
=C2=A0 val empty -> t
e= nd

Given a tree of nodes, and starting with Contex= t.empty find the leaf node with the highest fitness. You can think of every= node in the tree as an operation acting on the context.
=C2=A0

Wouldn't a simple Map work as your key-value store? The app= ly function
would then modify the map, creating a new map that shares most of the
unchanged data with the original map. That way keeping the results of
many apply functions in memory should be possible and applying should
be fast.

It would if I didn't poten= tially have a terabyte of data.

=C2=A0

But if memory becomes a problem then you probably want to use some
form of B-Tree to.

MfG
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = Goswin

--
Caml-list mailing list. =C2=A0Subscription management and archives:
ht= tps://sympa.inria.fr/sympa/arc/caml-list
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs



--
=
Arthur Breitman
--001a1134de96072a4704f705bff4--