From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <arthur.breitman@gmail.com>
X-Original-To: caml-list@sympa.inria.fr
Delivered-To: caml-list@sympa.inria.fr
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83])
	by sympa.inria.fr (Postfix) with ESMTPS id CADD97F720
	for <caml-list@sympa.inria.fr>; Mon, 14 Apr 2014 08:08:04 +0200 (CEST)
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  arthur.breitman@gmail.com) identity=pra;
  client-ip=209.85.215.53;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="arthur.breitman@gmail.com";
  x-sender="arthur.breitman@gmail.com";
  x-conformance=sidf_compatible
Received-SPF: Pass (mail2-smtp-roc.national.inria.fr: domain of
  arthur.breitman@gmail.com designates 209.85.215.53 as
  permitted sender) identity=mailfrom; client-ip=209.85.215.53;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="arthur.breitman@gmail.com";
  x-sender="arthur.breitman@gmail.com";
  x-conformance=sidf_compatible; x-record-type="v=spf1"
Received-SPF: None (mail2-smtp-roc.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@mail-la0-f53.google.com) identity=helo;
  client-ip=209.85.215.53;
  receiver=mail2-smtp-roc.national.inria.fr;
  envelope-from="arthur.breitman@gmail.com";
  x-sender="postmaster@mail-la0-f53.google.com";
  x-conformance=sidf_compatible
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApADABJ7S1PRVdc1Y2dsb2JhbABagwY7V4MQqQWODIh0gRYIFgMWDQkJFSeCJQEBAQMBIx0BGxILAQMBCwYFCw0NHQICIgERAQUBChIGEwgKCYdMAQMJCA2cQowOUYMOlxkKGScDCmSFfxEBBQyJOoQ9IkUEB4JvgUkEiVyLGINtgTaPHhgpgm6CEB6BLQ
X-IPAS-Result: ApADABJ7S1PRVdc1Y2dsb2JhbABagwY7V4MQqQWODIh0gRYIFgMWDQkJFSeCJQEBAQMBIx0BGxILAQMBCwYFCw0NHQICIgERAQUBChIGEwgKCYdMAQMJCA2cQowOUYMOlxkKGScDCmSFfxEBBQyJOoQ9IkUEB4JvgUkEiVyLGINtgTaPHhgpgm6CEB6BLQ
X-IronPort-AV: E=Sophos;i="4.97,855,1389740400"; 
   d="scan'208";a="68062258"
Received: from mail-la0-f53.google.com ([209.85.215.53])
  by mail2-smtp-roc.national.inria.fr with ESMTP; 14 Apr 2014 08:07:50 +0200
Received: by mail-la0-f53.google.com with SMTP id b8so5362021lan.12
        for <caml-list@inria.fr>; Sun, 13 Apr 2014 23:07:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :cc:content-type;
        bh=5BEycZwf/gm6u8qjhNN5H3IxR5tzZuFoMs8scsNd8io=;
        b=nmeMQkQlZJrJTxTVMkOJOWCoHpDJQ2dvXIpZPMNi9/MluUqTN6kW8KAxwGFsurZZCt
         GCPs0WHSIl/msmsW+utQrItWhPqMEk6MjjIQigk/+J+OBZyNNlrXjgdWwSg9PchXchVR
         0HGYgwDaFWC9yYdBBMiRnDepw4fj7G80CtILumrBfbBHJDAoNMocDVEA33P9qGskXd0p
         p8GvmANUhX4ePYOJaJPdUOeEzFsBvBF2hfqLwaJQMRa2WqMsWEPs8ypzhBbWzk+IOePO
         dTSLGuIXVQZsZ9BgfMcr73E3KNSUIUYn3ldY3JvkTBbcIf5fsQHg1MSQdausFyn3kNUN
         rsGg==
X-Received: by 10.152.7.97 with SMTP id i1mr429946laa.36.1397455670121; Sun,
 13 Apr 2014 23:07:50 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.114.158.130 with HTTP; Sun, 13 Apr 2014 23:07:30 -0700 (PDT)
In-Reply-To: <534B7587.6090301@riken.jp>
References: <CAAYUt0P4AzcYPbNLYsHRpCnBGa=oJT3MRfKwZv-wE4vv-JXNkw@mail.gmail.com>
 <534B7587.6090301@riken.jp>
From: Arthur Breitman <arthur.breitman@gmail.com>
Date: Mon, 14 Apr 2014 00:07:30 -0600
Message-ID: <CAAYUt0NLS+w-ZtS0JgnonzvkY4YcsigCq2fGvuf6dvWd5N6sPw@mail.gmail.com>
To: Francois Berenger <berenger@riken.jp>
Cc: caml-list <caml-list@inria.fr>
Content-Type: multipart/alternative; boundary=001a11c27b7ace99ec04f6fa80ce
Subject: Re: [Caml-list] Keeping A big data optimization problem functional

--001a11c27b7ace99ec04f6fa80ce
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

No but it does have bindings for leveldb which does just what I want. The
question is, how do I hide those dirty db mutations.


On Sun, Apr 13, 2014 at 11:43 PM, Francois Berenger <berenger@riken.jp>wrot=
e:

> And that's when I realize that "Oh! No! We don't have bindings to Berkeley
> DB in OPAM!". That's unfortunate. :(
>
>
> On 2014=E5=B9=B404=E6=9C=8814=E6=97=A5 14:25, Arthur Breitman wrote:
>
>> Hello all,
>>
>> This is my first post to the mailing list, I hope it will be relevant and
>> in line with the posting guidelines. If not, please let me know and pard=
on
>> my ignorance.
>>
>> I am working on a large problem that I'm attempting to tackle in Ocaml. =
It
>> requires me to optimize a function over a tree
>>
>> - the tree has on the order of 100,000 to 1,000,000 nodes
>> - each node on the tree contains on the order of 100kB to 1MB of data
>>
>> So total, I'm looking at 10GB to 1TB one data which is typically stored =
in
>> a database on disk. The tree is loaded in memory using keys that represe=
nt
>> entries in the database.
>>
>> Another structure I'm dealing with is a key value store, whose size is on
>> the same order of magnitude as the size of the data above. Initially, th=
is
>> key-value store is empty.
>>
>> I am given an "apply" function that, for a given key-value store and for=
 a
>> given node in the tree produces a new key-value store. Typically by
>> modifying a few keys in the key-value store. The amount of bits changed =
is
>> on the same order of magnitude as the size of the node. When I reach a
>> leaf
>> of the tree, I can compute the optimality of that leaf from the state of
>> the key-value store. I want to find the best leaf.
>>
>> Here's my problem...
>> If I write the following in a imperative fashion, it seems straightforwa=
rd
>> enough. I should be doing a DFS over my tree, mutating my key=3Dvalue st=
ore
>> as I go, and saving on disk "backup" information to backtrack through the
>> tree. This doesn't feel very elegant. I've been considering two solutions
>>
>> 1) since there is a one to one mapping between nodes in the tree and
>> key-value stores, I could make my key-value store type to actually be the
>> same as a node in the tree. I would put it in a module along with a
>> reference to the node representing which key-value store is being
>> represented on disk. The "apply" function would be straightforward and
>> lazy, but whenever the keys of a key-value store are to be accessed, the
>> following would happen
>>
>> a) Check if the key-value store on disk is the same as the one requested,
>> if so read the key
>> b) If that's not the case, find the least common ancestor in the tree,
>> migrate the disk cache to that version then down to the one requested.
>> This
>> would require me to have an "undo" function to complement the "apply"
>> function, which is an acceptable compromise.
>>
>> 2) the second solution would be to represent the key-value store as a
>> binary tree. Since each operation changes only a few keys, I wouldn't be
>> making a dumb copy of the entire key-value store at each step, only of t=
he
>> keys that matter. The problem is that though that works great in memory,
>> I'm not sure how to make it work with an on-disk storage, a necessity
>> given
>> the size of it.
>>
>> Thanks,
>> Arthurto
>>
>>
>
> --
> Best regards,
> Francois Berenger.
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


--=20
Arthur Breitman

--001a11c27b7ace99ec04f6fa80ce
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">No but it does have bindings for leveldb which does just w=
hat I want. The question is, how do I hide those dirty db mutations.</div><=
div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Sun, Apr 13=
, 2014 at 11:43 PM, Francois Berenger <span dir=3D"ltr">&lt;<a href=3D"mail=
to:berenger@riken.jp" target=3D"_blank">berenger@riken.jp</a>&gt;</span> wr=
ote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">And that&#39;s when I realize that &quot;Oh!=
 No! We don&#39;t have bindings to Berkeley DB in OPAM!&quot;. That&#39;s u=
nfortunate. :(<div class=3D"HOEnZb">

<div class=3D"h5"><br>
<br>
On 2014=E5=B9=B404=E6=9C=8814=E6=97=A5 14:25, Arthur Breitman wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
Hello all,<br>
<br>
This is my first post to the mailing list, I hope it will be relevant and<b=
r>
in line with the posting guidelines. If not, please let me know and pardon<=
br>
my ignorance.<br>
<br>
I am working on a large problem that I&#39;m attempting to tackle in Ocaml.=
 It<br>
requires me to optimize a function over a tree<br>
<br>
- the tree has on the order of 100,000 to 1,000,000 nodes<br>
- each node on the tree contains on the order of 100kB to 1MB of data<br>
<br>
So total, I&#39;m looking at 10GB to 1TB one data which is typically stored=
 in<br>
a database on disk. The tree is loaded in memory using keys that represent<=
br>
entries in the database.<br>
<br>
Another structure I&#39;m dealing with is a key value store, whose size is =
on<br>
the same order of magnitude as the size of the data above. Initially, this<=
br>
key-value store is empty.<br>
<br>
I am given an &quot;apply&quot; function that, for a given key-value store =
and for a<br>
given node in the tree produces a new key-value store. Typically by<br>
modifying a few keys in the key-value store. The amount of bits changed is<=
br>
on the same order of magnitude as the size of the node. When I reach a leaf=
<br>
of the tree, I can compute the optimality of that leaf from the state of<br>
the key-value store. I want to find the best leaf.<br>
<br>
Here&#39;s my problem...<br>
If I write the following in a imperative fashion, it seems straightforward<=
br>
enough. I should be doing a DFS over my tree, mutating my key=3Dvalue store=
<br>
as I go, and saving on disk &quot;backup&quot; information to backtrack thr=
ough the<br>
tree. This doesn&#39;t feel very elegant. I&#39;ve been considering two sol=
utions<br>
<br>
1) since there is a one to one mapping between nodes in the tree and<br>
key-value stores, I could make my key-value store type to actually be the<b=
r>
same as a node in the tree. I would put it in a module along with a<br>
reference to the node representing which key-value store is being<br>
represented on disk. The &quot;apply&quot; function would be straightforwar=
d and<br>
lazy, but whenever the keys of a key-value store are to be accessed, the<br>
following would happen<br>
<br>
a) Check if the key-value store on disk is the same as the one requested,<b=
r>
if so read the key<br>
b) If that&#39;s not the case, find the least common ancestor in the tree,<=
br>
migrate the disk cache to that version then down to the one requested. This=
<br>
would require me to have an &quot;undo&quot; function to complement the &qu=
ot;apply&quot;<br>
function, which is an acceptable compromise.<br>
<br>
2) the second solution would be to represent the key-value store as a<br>
binary tree. Since each operation changes only a few keys, I wouldn&#39;t b=
e<br>
making a dumb copy of the entire key-value store at each step, only of the<=
br>
keys that matter. The problem is that though that works great in memory,<br>
I&#39;m not sure how to make it work with an on-disk storage, a necessity g=
iven<br>
the size of it.<br>
<br>
Thanks,<br>
Arthurto<br>
<br>
</blockquote>
<br>
<br></div></div><span class=3D"HOEnZb"><font color=3D"#888888">
-- <br>
Best regards,<br>
Francois Berenger.<br>
<br>
-- <br>
Caml-list mailing list. =C2=A0Subscription management and archives:<br>
<a href=3D"https://sympa.inria.fr/sympa/arc/caml-list" target=3D"_blank">ht=
tps://sympa.inria.fr/sympa/<u></u>arc/caml-list</a><br>
Beginner&#39;s list: <a href=3D"http://groups.yahoo.com/group/ocaml_beginne=
rs" target=3D"_blank">http://groups.yahoo.com/group/<u></u>ocaml_beginners<=
/a><br>
Bug reports: <a href=3D"http://caml.inria.fr/bin/caml-bugs" target=3D"_blan=
k">http://caml.inria.fr/bin/caml-<u></u>bugs</a><br>
</font></span></blockquote></div><br><br clear=3D"all"><div><br></div>-- <b=
r><div>Arthur Breitman</div>
</div>

--001a11c27b7ace99ec04f6fa80ce--