From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 10D32BC6B for ; Wed, 27 Jun 2007 22:35:17 +0200 (CEST) Received: from smtp.janestcapital.com (janestcapital.com [66.155.124.107]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l5RKZFlr010374 for ; Wed, 27 Jun 2007 22:35:16 +0200 Received: from [172.25.129.161] [38.96.172.125] by janestcapital.com with ESMTP (SMTPD-9.10) id AA140170; Wed, 27 Jun 2007 16:35:32 -0400 Message-ID: <4682CA02.4040006@janestcapital.com> Date: Wed, 27 Jun 2007 16:35:14 -0400 From: Brian Hurt User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) X-Accept-Language: en-us, en MIME-Version: 1.0 To: =?ISO-8859-1?Q?Qu=F4c_Peyrot?= Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Book about functional design patterns References: <200706271314.35134.jon@ffconsultancy.com> <1A1D6F56-B3DB-4552-969C-9859482175AC@lrde.epita.fr> <46826C69.5060802@functionality.de> <20070627191633.73ab011a@kerneis.info> <342AAB2F-A76C-42C8-A5F1-4139BE2FE4A9@lrde.epita.fr> <4682BF04.8090606@janestcapital.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-Miltered: at discorde with ID 4682CA03.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 allocations:01 allocating:01 stack:01 long-time:01 ocaml:01 allocating:01 stack:01 mutable:01 pointer:01 node:01 nodes:01 node:01 nodes:01 compiler:01 Quôc Peyrot wrote: > > On Jun 27, 2007, at 9:48 PM, Brian Hurt wrote: > >> In Ocaml, allocations are relatively cheap- a cost similiar to that >> of allocating on the stack. Which is why when you tell long-time >> Ocaml programmers that you want to avoid an allocation cost by >> allocating on the stack, they tend to go "um, why?" >> >> Mutable data structures have their cost as well. When you assign a >> pointer into an object old enough to be in the major heap, Ocaml >> kicks off a minor collection. For small N, this can often make O >> (log N) purely functional structures faster than their O(1) >> imperative counterparts. >> >> No to mention the correctness advantages, plus other advantages. > > > If I have a tree/map datastructure and I add an element to it, my > understanding it that, when building the new tree, all the node up to > the root are going to be replaced. Is my understanding correct? No. Only those elements that change need to be reallocated- the rest can be shared between the old and new tree. So, assuming the tree is a balanced tree, only the log N or so nodes between the root and the changed node will need to be reallocated, of the N nodes in the tree- so you're sharing N - log N nodes between the two trees. > > Now let's say I want to build a tree with millions elements, and I'm > only interested in the final result, i.e. I don't need to be able to > rollback to a previous state or fancy stuff like that (I therefore > never keep a reference to the root of the old tree, each time I add a > new element). > Is there a way of writing the building code (with a fold-type of > construct or something like that) such that the compiler will > understand it, and will generate a code equivalent to the imperative > code (i.e. we don't allocate new nodes up to the root each time we > insert an element)? > With a million element tree, you're only reallocating 20-30 nodes (assuming it's balanced), and sharing the other 999,970-999,980 nodes between the two trees. Allocating each new node is only going to be a small number of clock cycles, like 5-10 clocks. So the total allocation cost is only 100-200 clocks or so. What kills performance, and makes hash tables higher performance at this point, is the balancing logic and the cache misses you're taking walking down the tree (the younger heap stays in cache). But even there, we're not talking about huge performance differences- for all except the most time-critical code, tree based maps are generally fast enough. Brian