From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.9 required=5.0 tests=HTML_10_20,HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 59D9EBC69 for ; Wed, 18 Jul 2007 19:32:56 +0200 (CEST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.230]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6IHWrTp030401 for ; Wed, 18 Jul 2007 19:32:55 +0200 Received: by wr-out-0506.google.com with SMTP id i23so208495wra for ; Wed, 18 Jul 2007 10:32:52 -0700 (PDT) Received: by 10.143.38.6 with SMTP id q6mr59635wfj.1184779969684; Wed, 18 Jul 2007 10:32:49 -0700 (PDT) Received: by 10.142.72.4 with HTTP; Wed, 18 Jul 2007 10:32:49 -0700 (PDT) Message-ID: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> Date: Wed, 18 Jul 2007 10:32:49 -0700 From: "Luca de Alfaro" To: caml-list@yquem.inria.fr Subject: Vec: a functional implementation of extensible arrays MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_79529_4284042.1184779969666" X-Miltered: at discorde with ID 469E4EC5.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 ocaml:01 arrays:01 iterating:01 stack:01 bounded:01 stack:01 ocaml:01 iterating:01 bounded:01 iterators:01 iterators:01 overflows:01 overflows:01 extensible:01 ------=_Part_79529_4284042.1184779969666 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Dear All, I would like to share with you an Ocaml implementation of extensible arrays. The implementation is functional, based on balanced trees (and on the code for Set and Map); I called the module Vec (for vector - I like short names). You can find it at http://www.dealfaro.com/home/vec.html Module Vec provides, in log-time: - Access and modification to arbitrary elements (Vec.put n el v puts element el in position n of vector v, for instance). - Concatenation - Insertion and removal of elements from arbitrary positions (auto-enlarging and auto-shrinking the vector). as well as: - All kind of iterators and some visitor functions. - Efficient translation to/from lists and arrays. An advantage of Vec over List, for very large data structures, is that iterating over a Vec of size n requires always stack depth bounded by log n: with lists, non-tail-recursive functions can cause stack overflows. I have been using this data structure for some months, and it has been very handy in a large number of occasions. I hope it can be as useful to you. I would appreciate all advice and feedback. Also, is there a repository where I should upload it? Do you think it is worth it? All the best, Luca ------=_Part_79529_4284042.1184779969666 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Dear All,

I would like to share with you an Ocaml implementation of extensible arrays.  The implementation is functional, based on balanced trees (and on the code for Set and Map); I called the module Vec (for vector - I like
short names).  You can find it at http://www.dealfaro.com/home/vec.html
Module Vec provides, in log-time:
  •  Access and modification to arbitrary elements ( Vec.put n el v puts element el in position n of vector v, for instance).
  • Concatenation
  • Insertion and removal of elements from arbitrary positions (auto-enlarging and auto-shrinking the vector).
as well as:
  • All kind of iterators and some visitor functions.
  • Efficient translation to/from lists and arrays.
An advantage of Vec over List, for very large data structures, is that iterating over a Vec of size n requires always stack depth bounded by log n: with lists, non-tail-recursive functions can cause stack overflows.

I have been using this data structure for some months, and it has been very handy in a large number of occasions.  I hope it can be as useful to you.

I would appreciate all advice and feedback.  Also, is there a repository where I should upload it?  Do you think it is worth it?

All the best,

Luca


------=_Part_79529_4284042.1184779969666-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id C4FE3BC69 for ; Thu, 19 Jul 2007 09:45:49 +0200 (CEST) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.245]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6J7jmm5024008 for ; Thu, 19 Jul 2007 09:45:49 +0200 Received: by an-out-0708.google.com with SMTP id b15so119087ana for ; Thu, 19 Jul 2007 00:45:48 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Gl5QUjezgGVYZbDg4Oh5/dJS6/1vUljMFjW9qfaKbq/LNyLrTR0MPgH2yhI+S5AA6targwUTUwhnYS/Dvo53GZ405RzCDNptdz39AIE4oi7scVS+m9ZGc1p+/hfiVFCls+8jPqMNG/OklVWHvkpbPqh4GGY9KPhPP2pjAYUGWSo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=ES2Tku5Vqp6uNLftwsnr2OTL7mxiXkaA6g7sFg7tWsOvyCkIbCYzigURs91Uk3s0NCps2ftCTr2/W9WKOFTUsnbn9haoKLdNBzE40iO6E1wBh4yOMVShH5M39HAnyrB3gwzoF7weMSmz1IpVTyAxjsPUstTtTP/H5leGsiG8z4U= Received: by 10.100.10.20 with SMTP id 20mr759947anj.1184831148308; Thu, 19 Jul 2007 00:45:48 -0700 (PDT) Received: by 10.100.33.15 with HTTP; Thu, 19 Jul 2007 00:45:48 -0700 (PDT) Message-ID: <6f9f8f4a0707190045g609abca4kfa3bb39d93604081@mail.gmail.com> Date: Thu, 19 Jul 2007 09:45:48 +0200 From: "Loup Vaillant" To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays In-Reply-To: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> X-j-chkmail-Score: MSGID : 469F16AC.002 on discorde : j-chkmail score : X : 0/20 1 0.000 -> 1 X-Miltered: at discorde with ID 469F16AC.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 ocaml:01 arrays:01 iterating:01 stack:01 bounded:01 stack:01 worst-case:01 amortized:01 uneasy:98 iterators:01 iterators:01 overflows:01 extensible:01 extensible:01 2007/7/18, Luca de Alfaro : > Dear All, > > I would like to share with you an Ocaml implementation of extensible arrays. > The implementation is functional, based on balanced trees (and on the code > for Set and Map); I called the module Vec (for vector - I like > short names). You can find it at > http://www.dealfaro.com/home/vec.html > Module Vec provides, in log-time: > > Access and modification to arbitrary elements ( Vec.put n el v puts element > el in position n of vector v, for instance). > Concatenation > Insertion and removal of elements from arbitrary positions (auto-enlarging > and auto-shrinking the vector). > as well as: > > All kind of iterators and some visitor functions. > Efficient translation to/from lists and arrays. > An advantage of Vec over List, for very large data structures, is that > iterating over a Vec of size n requires always stack depth bounded by log n: > with lists, non-tail-recursive functions can cause stack overflows. > > I have been using this data structure for some months, and it has been very > handy in a large number of occasions. I hope it can be as useful to you. > > I would appreciate all advice and feedback. Also, is there a repository > where I should upload it? Do you think it is worth it? > > All the best, > > Luca Very interesting. I always felt uneasy about the presence of imperative arrays without a functional counterpart. I can't wait to try it. Looking at your array type definition, I assume that the timings you specified are worst-case? Is it possible to achieve better (but amortized) bounds? Do you think it would be worth the trouble? I didn't see in your specs the complexity of your iterators. Does these work in linear time, like those of the List and Array module? Regards, Loup From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.3 required=5.0 tests=AWL,SPF_SOFTFAIL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id B20ADBC69 for ; Thu, 19 Jul 2007 10:18:32 +0200 (CEST) Received: from animal.inescn.pt (ns.inescn.pt [192.35.246.1]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6J8IVXe012041 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO) for ; Thu, 19 Jul 2007 10:18:32 +0200 Received: from localhost (localhost [127.0.0.1]) by animal.inescn.pt (8.13.8/8.13.6/9) with ESMTP id l6J8IT33023225; Thu, 19 Jul 2007 09:18:29 +0100 (WEST) X-Virus-Scanned: amavisd-new at inescporto.pt Received: from animal.inescn.pt ([127.0.0.1]) by localhost (animal.inescn.pt [127.0.0.1]) (amavisd-new, port 10024) with LMTP id p3LkEVxcP212; Thu, 19 Jul 2007 09:18:04 +0100 (WEST) Received: from [194.117.30.94] (morfina.inescn.pt [194.117.30.94]) by animal.inescn.pt (8.13.8/8.13.8/11) with ESMTP id l6J8Hsu3023197; Thu, 19 Jul 2007 09:17:56 +0100 (WEST) Message-ID: <469F1E32.3040502@inescporto.pt> Date: Thu, 19 Jul 2007 09:17:54 +0100 From: Hugo Ferreira User-Agent: Thunderbird 2.0.0.4 (X11/20070618) MIME-Version: 1.0 To: Loup Vaillant Cc: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <6f9f8f4a0707190045g609abca4kfa3bb39d93604081@mail.gmail.com> In-Reply-To: <6f9f8f4a0707190045g609abca4kfa3bb39d93604081@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit X-Miltered: at concorde with ID 469F1E57.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 filliatre:01 arrays:01 backtracking:01 nodes:01 backtracking:01 lri:01 filliatr:01 publis:01 ocaml:01 iterating:01 stack:01 bounded:01 stack:01 worst-case:01 Hello, For those of you interested in functional array consider Sylvain Conchon and Jean-Christophe Filliātre paper in [1]. The Union-Find (UF) uses persistent arrays as its base data structure. I have made tests with the UF using the code provided, an implementation of k-BUF data structure (delayed backtracking) and altered version of the persistent array (fat nodes + delayed backtracking). The tests I did show that this version of persistent arrays is very efficient (especially for single threaded backtracking). Maybe Luca could pit his implementation against that of the article and report on how they fare? Regards, Hugo Ferreira. [1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps Loup Vaillant wrote: > 2007/7/18, Luca de Alfaro : >> Dear All, >> >> I would like to share with you an Ocaml implementation of extensible >> arrays. >> The implementation is functional, based on balanced trees (and on the >> code >> for Set and Map); I called the module Vec (for vector - I like >> short names). You can find it at >> http://www.dealfaro.com/home/vec.html >> Module Vec provides, in log-time: >> >> Access and modification to arbitrary elements ( Vec.put n el v puts >> element >> el in position n of vector v, for instance). >> Concatenation >> Insertion and removal of elements from arbitrary positions >> (auto-enlarging >> and auto-shrinking the vector). >> as well as: >> >> All kind of iterators and some visitor functions. >> Efficient translation to/from lists and arrays. >> An advantage of Vec over List, for very large data structures, is that >> iterating over a Vec of size n requires always stack depth bounded by >> log n: >> with lists, non-tail-recursive functions can cause stack overflows. >> >> I have been using this data structure for some months, and it has been >> very >> handy in a large number of occasions. I hope it can be as useful to you. >> >> I would appreciate all advice and feedback. Also, is there a repository >> where I should upload it? Do you think it is worth it? >> >> All the best, >> >> Luca > > Very interesting. I always felt uneasy about the presence of > imperative arrays without a functional counterpart. I can't wait to > try it. > > Looking at your array type definition, I assume that the timings you > specified are worst-case? Is it possible to achieve better (but > amortized) bounds? Do you think it would be worth the trouble? > > I didn't see in your specs the complexity of your iterators. Does > these work in linear time, like those of the List and Array module? > > Regards, > Loup > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.0 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 3A0F3BC69 for ; Fri, 20 Jul 2007 09:35:41 +0200 (CEST) Received: from an-out-0708.google.com (an-out-0708.google.com [209.85.132.250]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6K7ZesE026014 for ; Fri, 20 Jul 2007 09:35:40 +0200 Received: by an-out-0708.google.com with SMTP id b15so209605ana for ; Fri, 20 Jul 2007 00:35:40 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=hlD7QybHG2NMKmlGu+tv2YzOimgM02qyjIfcS5Q3jp8pvk7S8pa2y6My31I8F6SzbmV9WQD2j8pS7MkBh6wAFOJedWnN2fD9LAzThvEpWzybFZBOiNfqJZjDm9HWrYdTe+u43MLyi5kxk+cB8QV2Im/M4c0RUzNVNGex0jcX1kQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=U/nb+w0FDw70KSqG794EhGSJ5DlanA3Fadf5l7t7Kx1SftJqRl6KUWsowsRz1jT/nTCZAasD5TqexaG4PBiX0ZdqrqKwfgsn9W0UWsrR3vpTwCnHpTbllFE9yMBexhrJ7U8G0I7eQsXxbe3O2j86JNwFZZrzsQPqyk23drmgcl4= Received: by 10.100.124.5 with SMTP id w5mr100527anc.1184916939968; Fri, 20 Jul 2007 00:35:39 -0700 (PDT) Received: by 10.100.33.15 with HTTP; Fri, 20 Jul 2007 00:35:39 -0700 (PDT) Message-ID: <6f9f8f4a0707200035l4351c6b6g585b4edd2c51faa6@mail.gmail.com> Date: Fri, 20 Jul 2007 09:35:39 +0200 From: "Loup Vaillant" To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays In-Reply-To: <28fa90930707190959g66cb2e6wc7a446af2a6dfb72@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <6f9f8f4a0707190045g609abca4kfa3bb39d93604081@mail.gmail.com> <28fa90930707190959g66cb2e6wc7a446af2a6dfb72@mail.gmail.com> X-j-chkmail-Score: MSGID : 46A065CC.000 on discorde : j-chkmail score : X : 0/20 1 0.000 -> 1 X-Miltered: at discorde with ID 46A065CC.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 logarithmic:01 inserting:01 okasaki's:01 iterator:01 subrange:01 iterators:01 syntactic:01 extensible:01 caml-list:01 imperative:01 lazy:02 lazy:02 data:02 bounds:02 2007/7/19, Luca de Alfaro : > > For get/set, the worst case and the average case are both logarithmic: it's > a balanced tree (if you are lucky, you can find your answer at the root! I did. :) > ;-). I am open to new ideas. In part, I wanted a simple data structure > (easier to extend, among other things). Also, I use Set, Map, etc, quite > often, and those are also balanced trees: I thought that if I can live with > those, I can probably live with Vec as well. So can I. Your current implementation is already very attractive, and looks very usable. For the new idea, have you thought of making (or specifying) syntactic sugar to use your array? About improving performance, here is my guess : there is no way to lower the bounds on get and set. However, the average cost of insert may already be O(1), provided you use your array the same way you would use an imperative version of it (more accurately, not inserting an element to an old version of your array). The same may be true for remove. Therefore, if I guess right, to take advantage of persistence AND have insert perform in O(1) average, you would have to use (and pay for) lazy evaluation. How, I don't know (yet). (Note that I have stolen this idea from Okasaki's book) > For an iterator, the worst case is as follows, where n is the size of the > Vec: > > if you iterate on the whole Vec, then O(n) > if you iterate over m elements (you can iterate on a subrange), then O(m + > log n). > That's why I have iterators: you can also iterate via a for loop, using get > to access the elements, but then the time becomes O(n log n) for the first > case, and O(m log n) for the second case. That is why I wondered if lazy evaluation was worth the trouble at all : most of the time, we iterate rather than insert or remove elements. I only regret the absence of filter. Is there a way to obtain a efficient filter? (Well, if my guess above is right, a naive implementation of filter would already be quite efficient...) Regards, Loup From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.5 required=5.0 tests=AWL,HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 06726BD8A for ; Fri, 20 Jul 2007 09:55:24 +0200 (CEST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.231]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6JGpMXa031477 for ; Thu, 19 Jul 2007 18:51:23 +0200 Received: by wr-out-0506.google.com with SMTP id i21so528621wra for ; Thu, 19 Jul 2007 09:51:22 -0700 (PDT) Received: by 10.142.76.4 with SMTP id y4mr219827wfa.1184863880949; Thu, 19 Jul 2007 09:51:20 -0700 (PDT) Received: by 10.142.72.4 with HTTP; Thu, 19 Jul 2007 09:51:20 -0700 (PDT) Message-ID: <28fa90930707190951y6e324750haf4067587fe1dcc7@mail.gmail.com> Date: Thu, 19 Jul 2007 09:51:20 -0700 From: "Luca de Alfaro" To: "Hugo Ferreira" Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays Cc: "Loup Vaillant" , caml-list@yquem.inria.fr In-Reply-To: <469F1E32.3040502@inescporto.pt> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_91344_15498134.1184863880917" References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <6f9f8f4a0707190045g609abca4kfa3bb39d93604081@mail.gmail.com> <469F1E32.3040502@inescporto.pt> X-Miltered: at discorde with ID 469F968A.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 pointer:01 arrays:01 logarithmic:01 filliatre:01 backtracking:01 nodes:01 backtracking:01 lri:01 filliatr:01 publis:01 ocaml:01 iterating:01 bounded:01 worst-case:01 ------=_Part_91344_15498134.1184863880917 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Dear All, thanks for the pointer to the excellent paper. First, let me say that my Vec data structure was born to fill a need I had while working on a project= : while it has been useful to me, I certainly do not claim it is the best tha= t can be done, so I am very grateful for these suggestions! My Vec data structure is different from persistent arrays. It is likely to be less efficient for get/set use. However, it offers at logarithmic cost insertion/removal operations that ar= e not present in the standard persistent arrays. Consider a Vec a of size 10. - Vec.insert 3 d a inserts value d in position 3 of a, shifting elements 3..9 of a to positions 4..10. - Vec.remove 3 a removes the element in position 3 of a, shifting elements 4..9 to positions 3..8. Vec.pop is similar and returns the removed element as well. - Vec.concat works in log-time. These operations are necessary if you want to use a Vec as a FIFO, for example (you append elements at the end, and you get the first element via Vec.pop 0 a). In many algorithms, it is often handy to be able to remove/insert elements in the middle of a list. In summary, I don't think the Vec data structure is a replacement for array= s or persistent arrays in numerically-intensive work. But if you want a flexible data structure for the 90% of the code that is not peformance critical, they can be useful. Now the question is: can one get better get/set efficiency while retaining the ability to insert/remove elements? (I am sure that there is something better to be done...). Luca On 7/19/07, Hugo Ferreira wrote: > > Hello, > > For those of you interested in functional array consider Sylvain Conchon > and Jean-Christophe Filli=E2tre paper in [1]. The Union-Find (UF) uses > persistent arrays as its base data structure. I have made tests with the > UF using the code provided, an implementation of k-BUF data structure > (delayed backtracking) and altered version of the persistent array (fat > nodes + delayed backtracking). The tests I did show that this version of > persistent arrays is very efficient (especially for single threaded > backtracking). > > Maybe Luca could pit his implementation against that of the article and > report on how they fare? > > Regards, > Hugo Ferreira. > > [1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps > > Loup Vaillant wrote: > > 2007/7/18, Luca de Alfaro : > >> Dear All, > >> > >> I would like to share with you an Ocaml implementation of extensible > >> arrays. > >> The implementation is functional, based on balanced trees (and on the > >> code > >> for Set and Map); I called the module Vec (for vector - I like > >> short names). You can find it at > >> http://www.dealfaro.com/home/vec.html > >> Module Vec provides, in log-time: > >> > >> Access and modification to arbitrary elements ( Vec.put n el v puts > >> element > >> el in position n of vector v, for instance). > >> Concatenation > >> Insertion and removal of elements from arbitrary positions > >> (auto-enlarging > >> and auto-shrinking the vector). > >> as well as: > >> > >> All kind of iterators and some visitor functions. > >> Efficient translation to/from lists and arrays. > >> An advantage of Vec over List, for very large data structures, is that > >> iterating over a Vec of size n requires always stack depth bounded by > >> log n: > >> with lists, non-tail-recursive functions can cause stack overflows. > >> > >> I have been using this data structure for some months, and it has been > >> very > >> handy in a large number of occasions. I hope it can be as useful to > you. > >> > >> I would appreciate all advice and feedback. Also, is there a > repository > >> where I should upload it? Do you think it is worth it? > >> > >> All the best, > >> > >> Luca > > > > Very interesting. I always felt uneasy about the presence of > > imperative arrays without a functional counterpart. I can't wait to > > try it. > > > > Looking at your array type definition, I assume that the timings you > > specified are worst-case? Is it possible to achieve better (but > > amortized) bounds? Do you think it would be worth the trouble? > > > > I didn't see in your specs the complexity of your iterators. Does > > these work in linear time, like those of the List and Array module? > > > > Regards, > > Loup > > > > _______________________________________________ > > Caml-list mailing list. Subscription management: > > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > > Archives: http://caml.inria.fr > > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ------=_Part_91344_15498134.1184863880917 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Dear All,

thanks for the pointer to the excellent paper.  Firs= t, let me say that my Vec data structure was born to fill a need I had whil= e working on a project: while it has been useful to me, I certainly do not = claim it is the best that can be done, so I am very grateful for these sugg= estions!=20

My Vec data structure is different from persistent arrays.  It= is likely to be less efficient for get/set use.
However, it offers at = logarithmic cost insertion/removal operations that are not present in the s= tandard persistent arrays. =20
Consider a Vec a of size 10.
  • Vec.insert 3 d a inserts value= d in position 3 of a, shifting elements 3..9 of a to positions 4..10.
    =
  • Vec.remove 3 a removes the element in position 3 of a, shifting el= ements 4..9 to positions 3..8. =20 Vec.pop is similar and returns the removed element as well.
  • Ve= c.concat works in log-time.
These operations are necessary if= you want to use a Vec as a FIFO, for example (you append elements at the e= nd, and you get the first element via=20 Vec.pop 0 a).  In many algorithms, it is often handy to be able to rem= ove/insert elements in the middle of a list.

In summary, I don'= t think the Vec data structure is a replacement for arrays or persistent ar= rays in numerically-intensive work.  But if you want a flexible data s= tructure for the 90% of the code that is not peformance critical, they can = be useful.=20
Now the question is: can one get better get/set efficiency while retain= ing the ability to insert/remove elements?  (I am sure that there is s= omething better to be done...).

Luca

On 7/19/07, Hugo Ferreira <hmf@inescporto.pt> wrote: Hello,

For those of you interested in functional array consider Sylv= ain Conchon
and Jean-Christophe Filli=E2tre paper in [1]. The Union-Find= (UF) uses
persistent arrays as its base data structure. I have made tes= ts with the
UF using the code provided, an implementation of k-BUF data structure(delayed backtracking) and altered version of the persistent array (fatnodes + delayed backtracking). The tests I did show that this version of
persistent arrays is very efficient (especially for single threaded
= backtracking).

Maybe Luca could pit his implementation against that = of the article and
report on how they fare?

Regards,
Hugo Ferr= eira.

[1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps

Loup Vailla= nt wrote:
> 2007/7/18, Luca de Alfaro < luca@dealfaro.org>:
>> Dear All,
>>
>> I = would like to share with you an Ocaml implementation of extensible
>&= gt; arrays.
>>  The implementation is functional, based = on balanced trees (and on the
>> code
>> for Set and Map); I called the module Vec (fo= r vector - I like
>> short names).  You can find it at>> http://www.dea= lfaro.com/home/vec.html
>> Module Vec provides, in log-time:
>>
>>&= nbsp; Access and modification to arbitrary elements ( Vec.put n el v p= uts
>> element
>> el in position n of vector v, for insta= nce).
>> Concatenation
>> Insertion and removal of elements from a= rbitrary positions
>> (auto-enlarging
>> and auto-shrinki= ng the vector).
>> as well as:
>>
>> All kind of= iterators and some visitor functions.
>> Efficient translation to/from lists and arrays.
>> An= advantage of Vec over List, for very large data structures, is that
>= ;> iterating over a Vec of size n requires always stack depth bounded by
>> log n:
>> with lists, non-tail-recursive functions ca= n cause stack overflows.
>>
>> I have been using this dat= a structure for some months, and it has been
>> very
>> h= andy in a large number of occasions.  I hope it can be as useful = to you.
>>
>> I would appreciate all advice and feedback. &= nbsp;Also, is there a repository
>> where I should upload it? = ; Do you think it is worth it?
>>
>> All the best,>>
>> Luca
>
> Very interesting. I always felt uneasy about the presence = of
> imperative arrays without a functional counterpart. I can't = wait to
> try it.
>
> Looking at your array type definiti= on, I assume that the timings you
> specified are worst-case? Is it possible to achieve better (but> amortized) bounds? Do you think it would be worth the trouble?
>= ;
> I didn't see in your specs the complexity of your iterators. = Does
> these work in linear time, like those of the List and Array module= ?
>
> Regards,
> Loup
>
> ___________________= ____________________________
> Caml-list mailing list. Subscription m= anagement:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Arc= hives: http://caml.inria.fr
> Be= ginner's list:=20 http://groups.yah= oo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
_______________________________________________
Caml-list mailing li= st. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/ca= ml-list
Archives: http://caml.inria.fr=
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: <= a href=3D"http://caml.inria.fr/bin/caml-bugs"> http://caml.inria.fr/bin/caml-bugs

------=_Part_91344_15498134.1184863880917-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.3 required=5.0 tests=AWL,HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 23677BDC1 for ; Fri, 20 Jul 2007 09:55:36 +0200 (CEST) Received: from wr-out-0506.google.com (wr-out-0506.google.com [64.233.184.224]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6JHDiZF025204 for ; Thu, 19 Jul 2007 19:13:46 +0200 Received: by wr-out-0506.google.com with SMTP id i21so534431wra for ; Thu, 19 Jul 2007 10:13:44 -0700 (PDT) Received: by 10.142.77.11 with SMTP id z11mr216835wfa.1184865222504; Thu, 19 Jul 2007 10:13:42 -0700 (PDT) Received: by 10.142.72.4 with HTTP; Thu, 19 Jul 2007 10:13:42 -0700 (PDT) Message-ID: <28fa90930707191013y11fa73abx617c5b813e72b646@mail.gmail.com> Date: Thu, 19 Jul 2007 10:13:42 -0700 From: "Luca de Alfaro" To: "Hugo Ferreira" Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays Cc: "Loup Vaillant" , caml-list@yquem.inria.fr In-Reply-To: <469F1E32.3040502@inescporto.pt> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_91668_27897791.1184865222473" References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <6f9f8f4a0707190045g609abca4kfa3bb39d93604081@mail.gmail.com> <469F1E32.3040502@inescporto.pt> X-Miltered: at concorde with ID 469F9BC8.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 nonstandard:01 filliatre:01 arrays:01 backtracking:01 nodes:01 backtracking:01 lri:01 filliatr:01 publis:01 ocaml:01 iterating:01 bounded:01 worst-case:01 amortized:01 ------=_Part_91668_27897791.1184865222473 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Reading the paper, I noticed that my name "put" for assigning a value to a position is highly nonstandard. I have changed it to "set" for version 1.1 of Vec. Hopefully, changing so soon will avoid confusion... sorry. Luca On 7/19/07, Hugo Ferreira wrote: > > Hello, > > For those of you interested in functional array consider Sylvain Conchon > and Jean-Christophe Filli=E2tre paper in [1]. The Union-Find (UF) uses > persistent arrays as its base data structure. I have made tests with the > UF using the code provided, an implementation of k-BUF data structure > (delayed backtracking) and altered version of the persistent array (fat > nodes + delayed backtracking). The tests I did show that this version of > persistent arrays is very efficient (especially for single threaded > backtracking). > > Maybe Luca could pit his implementation against that of the article and > report on how they fare? > > Regards, > Hugo Ferreira. > > [1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps > > Loup Vaillant wrote: > > 2007/7/18, Luca de Alfaro : > >> Dear All, > >> > >> I would like to share with you an Ocaml implementation of extensible > >> arrays. > >> The implementation is functional, based on balanced trees (and on the > >> code > >> for Set and Map); I called the module Vec (for vector - I like > >> short names). You can find it at > >> http://www.dealfaro.com/home/vec.html > >> Module Vec provides, in log-time: > >> > >> Access and modification to arbitrary elements ( Vec.put n el v puts > >> element > >> el in position n of vector v, for instance). > >> Concatenation > >> Insertion and removal of elements from arbitrary positions > >> (auto-enlarging > >> and auto-shrinking the vector). > >> as well as: > >> > >> All kind of iterators and some visitor functions. > >> Efficient translation to/from lists and arrays. > >> An advantage of Vec over List, for very large data structures, is that > >> iterating over a Vec of size n requires always stack depth bounded by > >> log n: > >> with lists, non-tail-recursive functions can cause stack overflows. > >> > >> I have been using this data structure for some months, and it has been > >> very > >> handy in a large number of occasions. I hope it can be as useful to > you. > >> > >> I would appreciate all advice and feedback. Also, is there a > repository > >> where I should upload it? Do you think it is worth it? > >> > >> All the best, > >> > >> Luca > > > > Very interesting. I always felt uneasy about the presence of > > imperative arrays without a functional counterpart. I can't wait to > > try it. > > > > Looking at your array type definition, I assume that the timings you > > specified are worst-case? Is it possible to achieve better (but > > amortized) bounds? Do you think it would be worth the trouble? > > > > I didn't see in your specs the complexity of your iterators. Does > > these work in linear time, like those of the List and Array module? > > > > Regards, > > Loup > > > > _______________________________________________ > > Caml-list mailing list. Subscription management: > > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > > Archives: http://caml.inria.fr > > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > > Bug reports: http://caml.inria.fr/bin/caml-bugs > > > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ------=_Part_91668_27897791.1184865222473 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Reading the paper, I noticed that my name "put" for assigning a v= alue to a position is highly nonstandard.
I have changed it to "se= t" for version 1.1 of Vec.  Hopefully, changing so soon will avoi= d confusion... sorry.=20

Luca

On 7/19/07, Hugo Ferreira <hmf@inescporto.pt> wrote:
Hello,

For those of you interested in functional array consider Sylv= ain Conchon
and Jean-Christophe Filli=E2tre paper in [1]. The Union-Find= (UF) uses
persistent arrays as its base data structure. I have made tes= ts with the
UF using the code provided, an implementation of k-BUF data structure(delayed backtracking) and altered version of the persistent array (fatnodes + delayed backtracking). The tests I did show that this version of
persistent arrays is very efficient (especially for single threaded
= backtracking).

Maybe Luca could pit his implementation against that = of the article and
report on how they fare?

Regards,
Hugo Ferr= eira.

[1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps

Loup Vailla= nt wrote:
> 2007/7/18, Luca de Alfaro < luca@dealfaro.org>:
>> Dear All,
>>
>> I = would like to share with you an Ocaml implementation of extensible
>&= gt; arrays.
>>  The implementation is functional, based = on balanced trees (and on the
>> code
>> for Set and Map); I called the module Vec (fo= r vector - I like
>> short names).  You can find it at>> http://www.dea= lfaro.com/home/vec.html
>> Module Vec provides, in log-time:
>>
>>&= nbsp; Access and modification to arbitrary elements ( Vec.put n el v p= uts
>> element
>> el in position n of vector v, for insta= nce).
>> Concatenation
>> Insertion and removal of elements from a= rbitrary positions
>> (auto-enlarging
>> and auto-shrinki= ng the vector).
>> as well as:
>>
>> All kind of= iterators and some visitor functions.
>> Efficient translation to/from lists and arrays.
>> An= advantage of Vec over List, for very large data structures, is that
>= ;> iterating over a Vec of size n requires always stack depth bounded by
>> log n:
>> with lists, non-tail-recursive functions ca= n cause stack overflows.
>>
>> I have been using this dat= a structure for some months, and it has been
>> very
>> h= andy in a large number of occasions.  I hope it can be as useful = to you.
>>
>> I would appreciate all advice and feedback. &= nbsp;Also, is there a repository
>> where I should upload it? = ; Do you think it is worth it?
>>
>> All the best,>>
>> Luca
>
> Very interesting. I always felt uneasy about the presence = of
> imperative arrays without a functional counterpart. I can't = wait to
> try it.
>
> Looking at your array type definiti= on, I assume that the timings you
> specified are worst-case? Is it possible to achieve better (but> amortized) bounds? Do you think it would be worth the trouble?
>= ;
> I didn't see in your specs the complexity of your iterators. = Does
> these work in linear time, like those of the List and Array module= ?
>
> Regards,
> Loup
>
> ___________________= ____________________________
> Caml-list mailing list. Subscription m= anagement:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Arc= hives: http://caml.inria.fr
> Be= ginner's list:=20 http://groups.yah= oo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
_______________________________________________
Caml-list mailing li= st. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/ca= ml-list
Archives: http://caml.inria.fr=
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: <= a href=3D"http://caml.inria.fr/bin/caml-bugs"> http://caml.inria.fr/bin/caml-bugs

------=_Part_91668_27897791.1184865222473-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id A6B68BE29 for ; Fri, 20 Jul 2007 10:15:43 +0200 (CEST) Received: from el-out-1112.google.com (el-out-1112.google.com [209.85.162.178]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6JGxI6g000488 for ; Thu, 19 Jul 2007 18:59:19 +0200 Received: by el-out-1112.google.com with SMTP id o28so24976ele for ; Thu, 19 Jul 2007 09:59:16 -0700 (PDT) Received: by 10.142.157.15 with SMTP id f15mr213976wfe.1184864354817; Thu, 19 Jul 2007 09:59:14 -0700 (PDT) Received: by 10.142.72.4 with HTTP; Thu, 19 Jul 2007 09:59:14 -0700 (PDT) Message-ID: <28fa90930707190959g66cb2e6wc7a446af2a6dfb72@mail.gmail.com> Date: Thu, 19 Jul 2007 09:59:14 -0700 From: "Luca de Alfaro" To: "Loup Vaillant" Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays Cc: caml-list@yquem.inria.fr In-Reply-To: <6f9f8f4a0707190045g609abca4kfa3bb39d93604081@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_91464_8084132.1184864354783" References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <6f9f8f4a0707190045g609abca4kfa3bb39d93604081@mail.gmail.com> X-Miltered: at discorde with ID 469F9866.001 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 ocaml:01 arrays:01 iterating:01 bounded:01 worst-case:01 amortized:01 logarithmic:01 iterator:01 subrange:01 beginner's:01 ocaml:01 bug:01 iterating:01 bounded:01 ------=_Part_91464_8084132.1184864354783 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline On 7/19/07, Loup Vaillant wrote: > > 2007/7/18, Luca de Alfaro : > > Dear All, > > > > I would like to share with you an Ocaml implementation of extensible > arrays. > > The implementation is functional, based on balanced trees (and on the > code > > for Set and Map); I called the module Vec (for vector - I like > > short names). You can find it at > > http://www.dealfaro.com/home/vec.html > > Module Vec provides, in log-time: > > > > Access and modification to arbitrary elements ( Vec.put n el v puts > element > > el in position n of vector v, for instance). > > Concatenation > > Insertion and removal of elements from arbitrary positions > (auto-enlarging > > and auto-shrinking the vector). > > as well as: > > > > All kind of iterators and some visitor functions. > > Efficient translation to/from lists and arrays. > > An advantage of Vec over List, for very large data structures, is that > > iterating over a Vec of size n requires always stack depth bounded by > log n: > > with lists, non-tail-recursive functions can cause stack overflows. > > > > I have been using this data structure for some months, and it has been > very > > handy in a large number of occasions. I hope it can be as useful to > you. > > > > I would appreciate all advice and feedback. Also, is there a repository > > where I should upload it? Do you think it is worth it? > > > > All the best, > > > > Luca > > Very interesting. I always felt uneasy about the presence of > imperative arrays without a functional counterpart. I can't wait to > try it. > > Looking at your array type definition, I assume that the timings you > specified are worst-case? Is it possible to achieve better (but > amortized) bounds? Do you think it would be worth the trouble? > > I didn't see in your specs the complexity of your iterators. Does > these work in linear time, like those of the List and Array module? > > Regards, > Loup For get/set, the worst case and the average case are both logarithmic: it's a balanced tree (if you are lucky, you can find your answer at the root! ;-). I am open to new ideas. In part, I wanted a simple data structure (easier to extend, among other things). Also, I use Set, Map, etc, quite often, and those are also balanced trees: I thought that if I can live with those, I can probably live with Vec as well. For an iterator, the worst case is as follows, where n is the size of the Vec: - if you iterate on the whole Vec, then O(n) - if you iterate over m elements (you can iterate on a subrange), then O(m + log n). That's why I have iterators: you can also iterate via a for loop, using get to access the elements, but then the time becomes O(n log n) for the first case, and O(m log n) for the second case. Luca _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ------=_Part_91464_8084132.1184864354783 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On 7/19/07, Loup Vaillant <loup.vaillant@gmail.com> wrote:
2007/7/18, Luca de Alfaro <luca@dealfaro.org>:
> Dear All,
>
> I would like to share with you an Ocaml implementation of extensible arrays.
>  The implementation is functional, based on balanced trees (and on the code
> for Set and Map); I called the module Vec (for vector - I like
> short names).  You can find it at
> http://www.dealfaro.com/home/vec.html
> Module Vec provides, in log-time:
>
>  Access and modification to arbitrary elements ( Vec.put n el v puts element
> el in position n of vector v, for instance).
> Concatenation
> Insertion and removal of elements from arbitrary positions (auto-enlarging
> and auto-shrinking the vector).
> as well as:
>
> All kind of iterators and some visitor functions.
> Efficient translation to/from lists and arrays.
> An advantage of Vec over List, for very large data structures, is that
> iterating over a Vec of size n requires always stack depth bounded by log n:
> with lists, non-tail-recursive functions can cause stack overflows.
>
> I have been using this data structure for some months, and it has been very
> handy in a large number of occasions.  I hope it can be as useful to you.
>
> I would appreciate all advice and feedback.  Also, is there a repository
> where I should upload it?  Do you think it is worth it?
>
> All the best,
>
> Luca

Very interesting. I always felt uneasy about the presence of
imperative arrays without a functional counterpart. I can't wait to
try it.

Looking at your array type definition, I assume that the timings you
specified are worst-case? Is it possible to achieve better (but
amortized) bounds? Do you think it would be worth the trouble?

I didn't see in your specs the complexity of your iterators. Does
these work in linear time, like those of the List and Array module?

Regards,
Loup

For get/set, the worst case and the average case are both logarithmic: it's a balanced tree (if you are lucky, you can find your answer at the root! ;-).  I am open to new ideas.  In part, I wanted a simple data structure (easier to extend, among other things).  Also, I use Set, Map, etc, quite often, and those are also balanced trees: I thought that if I can live with those, I can probably live with Vec as well.

For an iterator, the worst case is as follows, where n is the size of the Vec:
  • if you iterate on the whole Vec, then O(n)
  • if you iterate over m elements (you can iterate on a subrange), then O(m + log n).
That's why I have iterators: you can also iterate via a for loop, using get to access the elements, but then the time becomes O(n log n) for the first case, and O(m log n) for the second case.

 Luca

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

------=_Part_91464_8084132.1184864354783-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id DACB5BE3C for ; Fri, 20 Jul 2007 10:23:37 +0200 (CEST) Received: from ptb-relay01.plus.net (ptb-relay01.plus.net [212.159.14.212]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6K8Nbja002355 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Fri, 20 Jul 2007 10:23:37 +0200 Received: from [80.229.56.224] (helo=beast.local) by ptb-relay01.plus.net with esmtp (Exim) id 1IBnlw-00043l-EL for caml-list@yquem.inria.fr; Fri, 20 Jul 2007 09:23:36 +0100 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays Date: Fri, 20 Jul 2007 09:14:18 +0100 User-Agent: KMail/1.9.7 References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <28fa90930707190959g66cb2e6wc7a446af2a6dfb72@mail.gmail.com> <6f9f8f4a0707200035l4351c6b6g585b4edd2c51faa6@mail.gmail.com> In-Reply-To: <6f9f8f4a0707200035l4351c6b6g585b4edd2c51faa6@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200707200914.18557.jon@ffconsultancy.com> X-Miltered: at discorde with ID 46A07109.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 camlp:01 notation:01 node:01 node:01 unbalanced:01 unbalanced:01 subnodes:01 subnode:01 ocaml:01 ocaml:01 frog:98 wrote:01 syntactic:01 extensible:01 On Friday 20 July 2007 08:35:39 Loup Vaillant wrote: > > ;-). I am open to new ideas. In part, I wanted a simple data structure > > (easier to extend, among other things). Also, I use Set, Map, etc, quite > > often, and those are also balanced trees: I thought that if I can live > > with those, I can probably live with Vec as well. This is the beginnings of an awesome data structure! > So can I. Your current implementation is already very attractive, and > looks very usable. For the new idea, have you thought of making (or > specifying) syntactic sugar to use your array? Should be very easy using the new camlp4. You might like to add a slicing notation as well. :-) > About improving performance... I have two suggestions: 1. Add an extra node representing single elements that replaces Node(Empty, _, Empty). The reduces GC stress enormously and makes the whole thing ~30% faster. 2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't need to rebalance, but "get" should force rebalancing. Extracting subarrays should return an unbalanced result. > Is there a way to obtain a efficient filter? Yes. I discovered a most-excellent way to do this. It requires arbitrary metadata in every node, a constructor that composes subnodes to create the metadata for the parent and a filter function that can cull branches from the search tree. I used this in my Mathematica implementation to provide asymptotically fast filtering based upon lazily evaluated sets of symbols in each subnode. This gave huge performance improvements with no significant performance overhead. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. OCaml for Scientists http://www.ffconsultancy.com/products/ocaml_for_scientists/?e From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 3D449BC69 for ; Fri, 20 Jul 2007 17:42:50 +0200 (CEST) Received: from qb-out-0506.google.com (qb-out-0506.google.com [72.14.204.236]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6KFgmTb019539 for ; Fri, 20 Jul 2007 17:42:49 +0200 Received: by qb-out-0506.google.com with SMTP id e11so1581344qbe for ; Fri, 20 Jul 2007 08:42:48 -0700 (PDT) Received: by 10.142.97.20 with SMTP id u20mr43655wfb.1184946167443; Fri, 20 Jul 2007 08:42:47 -0700 (PDT) Received: by 10.142.72.4 with HTTP; Fri, 20 Jul 2007 08:42:47 -0700 (PDT) Message-ID: <28fa90930707200842q742d5f77y29885d0c7ebcc74c@mail.gmail.com> Date: Fri, 20 Jul 2007 08:42:47 -0700 From: "Luca de Alfaro" To: "Jon Harrop" Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays Cc: caml-list@yquem.inria.fr In-Reply-To: <200707200914.18557.jon@ffconsultancy.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_103877_9026133.1184946167406" References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <28fa90930707190959g66cb2e6wc7a446af2a6dfb72@mail.gmail.com> <6f9f8f4a0707200035l4351c6b6g585b4edd2c51faa6@mail.gmail.com> <200707200914.18557.jon@ffconsultancy.com> X-Miltered: at discorde with ID 46A0D7F8.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 camlp:01 notation:01 node:01 node:01 unbalanced:01 unbalanced:01 sub-array:01 subranges:01 subnodes:01 subnode:01 ocaml:01 ocaml:01 beginner's:01 bug:01 ------=_Part_103877_9026133.1184946167406 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline On 7/20/07, Jon Harrop wrote: > > On Friday 20 July 2007 08:35:39 Loup Vaillant wrote: > > > ;-). I am open to new ideas. In part, I wanted a simple data > structure > > > (easier to extend, among other things). Also, I use Set, Map, etc, > quite > > > often, and those are also balanced trees: I thought that if I can live > > > with those, I can probably live with Vec as well. > > This is the beginnings of an awesome data structure! Thanks! > So can I. Your current implementation is already very attractive, and > > looks very usable. For the new idea, have you thought of making (or > > specifying) syntactic sugar to use your array? > > Should be very easy using the new camlp4. You might like to add a slicing > notation as well. :-) I have to study how to do it ... this would be very interesting. Would you be interested in helping? > About improving performance... > > I have two suggestions: > > 1. Add an extra node representing single elements that replaces > Node(Empty, _, > Empty). The reduces GC stress enormously and makes the whole thing ~30% > faster. This is easy. I can give it a try soon, and see if I get something reasonable, or if the code blows up. 2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't > need > to rebalance, but "get" should force rebalancing. Extracting subarrays > should > return an unbalanced result. This is almost easy. I would need to add a bit to each node to keep track of whether it's balanced... The penalty would be that the balancing function would need to do slightly more work to find out what has to be balanced. So perhaps it's not a good idea for append, insert, but it could make sense for concat (?), and especially for filter and sub... But I am hesitant. If one does concat, or one does sub to extract a sub-array, I wrote the code already so that sharing is maximized. What is the percentage of cases in which you get a Vec, but then don't do any get/set on it, and only iterate? Especially since you already have iterators on subranges? Do you think it's worth it? Anyone has advice? > > Is there a way to obtain a efficient filter? > > Yes. I discovered a most-excellent way to do this. It requires arbitrary > metadata in every node, a constructor that composes subnodes to create the > metadata for the parent and a filter function that can cull branches from > the > search tree. > > I used this in my Mathematica implementation to provide asymptotically > fast > filtering based upon lazily evaluated sets of symbols in each subnode. > This > gave huge performance improvements with no significant performance > overhead. I don't provide filter because..., well, I guess because I forgot: of all iterators, filter is the one I need most rarely. I should at least provide a simple implementation of it... Another operation I would like to implement is splice: splice i v1 v2 replaces the element in position i of vec v2 with vec v1. A sort of generalized insert. Dr Jon D Harrop, Flying Frog Consultancy Ltd. > OCaml for Scientists > http://www.ffconsultancy.com/products/ocaml_for_scientists/?e > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > BTW, Jon (and anyone else as well), let me know if you would like to help... I could create a Google Code project so that we get a svn repository for the code. Luca ------=_Part_103877_9026133.1184946167406 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On 7/20/07, Jon Harrop <jon@ffconsultancy.com> wrote:
On Friday 20 July 2007 08:35:39 Loup Vaillant wrote:
> > ;-).  I am open to new ideas.  In part, I wanted a simple data structure
> > (easier to extend, among other things).  Also, I use Set, Map, etc, quite
> > often, and those are also balanced trees: I thought that if I can live
> > with those, I can probably live with Vec as well.

This is the beginnings of an awesome data structure!

Thanks!

> So can I. Your current implementation is already very attractive, and
> looks very usable. For the new idea, have you thought of making (or
> specifying) syntactic sugar to use your array?

Should be very easy using the new camlp4. You might like to add a slicing
notation as well. :-)

I have to study how to do it ... this would be very interesting. 
Would you be interested in helping?

> About improving performance...

I have two suggestions:

1. Add an extra node representing single elements that replaces Node(Empty, _,
Empty). The reduces GC stress enormously and makes the whole thing ~30%
faster.

This is easy.  I can give it a try soon, and see if I get something reasonable, or if the code blows up.  

2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't need
to rebalance, but "get" should force rebalancing. Extracting subarrays should
return an unbalanced result.

This is almost easy.  I would need to add a bit to each node to keep track of whether it's balanced...
The penalty would be that the balancing function would need to do slightly more work to find out what has to be balanced.
So perhaps it's not a good idea for append, insert, but it could make sense for concat (?), and especially for filter and sub...
But I am hesitant.  If one does concat, or one does sub to extract a sub-array, I wrote the code already so that sharing is maximized. What is the percentage of cases in which you get a Vec, but then don't do any get/set on it, and only iterate?
Especially since you already have iterators on subranges?  Do you think it's worth it?  Anyone has advice?
 
> Is there a way to obtain a efficient filter?

Yes. I discovered a most-excellent way to do this. It requires arbitrary
metadata in every node, a constructor that composes subnodes to create the
metadata for the parent and a filter function that can cull branches from the
search tree.

I used this in my Mathematica implementation to provide asymptotically fast
filtering based upon lazily evaluated sets of symbols in each subnode. This
gave huge performance improvements with no significant performance overhead.

I don't provide filter because..., well, I guess because I forgot: of all iterators, filter is the one I need most rarely.
I should at least provide a simple implementation of it...

Another operation I would like to implement is splice:

splice i v1 v2

replaces the element in position i of vec v2 with vec v1.  A sort of generalized insert.

Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


BTW, Jon (and anyone else as well), let me know if you would like to help... I could create a Google Code project so that we get a svn repository for the code.

Luca
------=_Part_103877_9026133.1184946167406-- From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id D44C3BC69 for ; Fri, 20 Jul 2007 18:45:11 +0200 (CEST) Received: from smtp.janestcapital.com (www.janestcapital.com [66.155.124.107]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6KGjAr0028985 for ; Fri, 20 Jul 2007 18:45:11 +0200 Received: from [172.25.129.161] [38.96.172.125] by janestcapital.com with ESMTP (SMTPD-9.10) id A6A503B4; Fri, 20 Jul 2007 12:45:25 -0400 Message-ID: <46A0E690.2060805@janestcapital.com> Date: Fri, 20 Jul 2007 12:45:04 -0400 From: Brian Hurt User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Luca de Alfaro Cc: Jon Harrop , caml-list@yquem.inria.fr Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <28fa90930707190959g66cb2e6wc7a446af2a6dfb72@mail.gmail.com> <6f9f8f4a0707200035l4351c6b6g585b4edd2c51faa6@mail.gmail.com> <200707200914.18557.jon@ffconsultancy.com> <28fa90930707200842q742d5f77y29885d0c7ebcc74c@mail.gmail.com> In-Reply-To: <28fa90930707200842q742d5f77y29885d0c7ebcc74c@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Miltered: at discorde with ID 46A0E696.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 node:01 sub-array:01 subranges:01 citeseer:01 red-black:01 psu:98 iterators:01 wrote:01 wrote:01 extensible:01 caml-list:01 append:02 functional:02 slightly:03 Luca de Alfaro wrote: > > > This is almost easy. I would need to add a bit to each node to keep > track of whether it's balanced... > The penalty would be that the balancing function would need to do > slightly more work to find out what has to be balanced. > So perhaps it's not a good idea for append, insert, but it could make > sense for concat (?), and especially for filter and sub... > But I am hesitant. If one does concat, or one does sub to extract a > sub-array, I wrote the code already so that sharing is maximized. What > is the percentage of cases in which you get a Vec, but then don't do > any get/set on it, and only iterate? > Especially since you already have iterators on subranges? Do you > think it's worth it? Anyone has advice? I don't think that with laziness you can avoid enough work to make inserts O(1). On the other hand, sub and filter can be done in O(M + log N) easily enough, see: http://citeseer.ist.psu.edu/236207.html The paper is about red-black trees, but it's applicable to all rotation-balanced trees. Brian