From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=AWL,HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 461FBBC69 for ; Tue, 21 Aug 2007 23:39:14 +0200 (CEST) Received: from nz-out-0506.google.com (nz-out-0506.google.com [64.233.162.226]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l7LLdCjj029191 for ; Tue, 21 Aug 2007 23:39:13 +0200 Received: by nz-out-0506.google.com with SMTP id z3so651941nzf for ; Tue, 21 Aug 2007 14:39:12 -0700 (PDT) Received: by 10.142.222.21 with SMTP id u21mr1130328wfg.1187732350125; Tue, 21 Aug 2007 14:39:10 -0700 (PDT) Received: by 10.142.80.4 with HTTP; Tue, 21 Aug 2007 14:39:10 -0700 (PDT) Message-ID: <28fa90930708211439k1a5c251cv20831c47877a904f@mail.gmail.com> Date: Tue, 21 Aug 2007 14:39:10 -0700 From: "Luca de Alfaro" To: caml-list Subject: Re: [Caml-list] Ropes and rope-like functional extensible vectors with O(1) prepend/append. In-Reply-To: <20070728233305.GB28879@tux-chan> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_141516_4740819.1187732350061" References: <20070728233305.GB28879@tux-chan> X-Miltered: at concorde with ID 46CB5B80.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; vectors:01 prepend:01 node:01 icfp:01 bounded:01 amortized:01 vectors:01 appending:01 arbitrarily:01 amortized:01 logarithmic:01 logarithmic:01 worst-case:01 locality:01 rebuilding:01 ------=_Part_141516_4740819.1187732350061 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Great work! One small question: someone suggested me to rewrite Vec to add a Leaf (Node_without_children) construct, to cut down on the number of Empty instances I use. I have so far not done that (no reason in particular, but I was busy with other things). In light of your experiments, would you also advise me to do it? All the best, Luca On 7/28/07, Mauricio Fernandez wrote: > > In the aftermath of the ICFP contest, during which I used Luca de Alfaro's > Vec, I felt like implementing ropes, based on Boehm's paper and the > well-known > code included in his conservative garbage collector. > > I later realized that the basic implementation strategies ("dense" leaves, > bounded tree height and amortized constant time concatenation of small > strings) could be generalized to build general extensible vectors similar > to > Vec. > > Such vectors (tentatively named "Vect" until I find a better name) have > some > interesting properties: > * lower space overhead than other structures based on balanced trees such > as > Vec. The overhead can be adjusted, allowing to make "get" faster at the > expense of "set" and viceversa. > * appending or prepending a small vector to an arbitrarily large one in > amortized constant time > * concat, subarray, insert, remove operations in amortized logarithmic > time > * access and modification (get, set) in logarithmic time > > The first one is probably the most compelling. Currently, Vec imposes a > 6-word > overhead per element. Even after the obvious modification consisting in > adding > a new constructor for leaves, the overhead would still be 350%... Vect > uses > compact leaves with a configurable number of elements (32-64 seem good > choices, leading to worst-case overheads of 100% and 50% respectively), > which > also helps with "get" due to the improved spatial locality. > > You can find the code for both Rope and Vect at > http://eigenclass.org/repos/oropes/head/ > It is still young and experimental, but it's maybe worth a look. Any > feedback > is very welcome. > > The documentation can be found under > http://eigenclass.org/repos/oropes/head/doc/ > > I've spent some time benchmarking it against Vec; you can also find the > code I used and the resulting graphs at the above address. > > To summarise how it compares to Vec: > * Vec can be used when persistence is required, but Vect would probably be > a > poor choice in this case (until that is fixed using lazy rebuilding, > which > doesn't seem too hard), unless rebalancing explicitly before "taking the > snapshot" is an option > * Vect can append/prepend single elements or small vectors very quickly, > in > amortized constant time. See > http://eigenclass.org/repos/oropes/head/append.png > * as expected, Vec.set is faster than Vect's in general > http://eigenclass.org/repos/oropes/head/set.png > However, if the vector is balanced explicitly shortly before an update > burst, Vect is somewhat surprisingly faster > http://eigenclass.org/repos/oropes/head/set-balanced.png > This might be attributed to Vect's smaller memory profile and the fact > that > it allows better use of the L2 cache, but there seems to be another > factor > that I have yet to discover. > * Vect.get is considerably faster than Vec.get > http://eigenclass.org/repos/oropes/head/get.png > > The above URL is a darcs repository, so if anybody shoots me a patch I'll > be > happy to apply it :) > > Regards, > > -- > Mauricio Fernandez - http://eigenclass.org > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > ------=_Part_141516_4740819.1187732350061 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Great work!

One small question: someone suggested me to rewrite Vec to add a Leaf (Node_without_children) construct, to cut down on the number of Empty instances I use.  I have so far not done that (no reason in particular, but I was busy with other things).  In light of your experiments, would you also advise me to do it? 

All the best,

Luca

On 7/28/07, Mauricio Fernandez <mfp@acm.org> wrote:
In the aftermath of the ICFP contest, during which I used Luca de Alfaro's
Vec, I felt like implementing ropes, based on Boehm's paper and the well-known
code included in his conservative garbage collector.

I later realized that the basic implementation strategies ("dense" leaves,
bounded tree height and amortized constant time concatenation of small
strings) could be generalized to build general extensible vectors similar to
Vec.

Such vectors (tentatively named "Vect" until I find a better name) have some
interesting properties:
* lower space overhead than other structures based on balanced trees such as
  Vec.  The overhead can be adjusted, allowing to make "get" faster at the
  expense of "set" and viceversa.
* appending or prepending a small vector to an arbitrarily large one in
  amortized constant time
* concat, subarray, insert, remove operations in amortized logarithmic time
* access and modification (get, set) in logarithmic time

The first one is probably the most compelling. Currently, Vec imposes a 6-word
overhead per element. Even after the obvious modification consisting in adding
a new constructor for leaves, the overhead would still be 350%... Vect uses
compact leaves with a configurable number of elements (32-64 seem good
choices, leading to worst-case overheads of 100% and 50% respectively), which
also helps with "get" due to the improved spatial locality.

You can find the code for both Rope and Vect at
http://eigenclass.org/repos/oropes/head/
It is still young and experimental, but it's maybe worth a look. Any feedback
is very welcome.

The documentation can be found under
http://eigenclass.org/repos/oropes/head/doc/

I've spent some time benchmarking it against Vec; you can also find the
code I used and the resulting graphs at the above address.

To summarise how it compares to Vec:
* Vec can be used when persistence is required, but Vect would probably be a
  poor choice in this case (until that is fixed using lazy rebuilding, which
  doesn't seem too hard), unless rebalancing explicitly before "taking the
  snapshot" is an option
* Vect can append/prepend single elements or small vectors very quickly, in
  amortized constant time. See
  http://eigenclass.org/repos/oropes/head/append.png
* as expected, Vec.set is faster than Vect's in general
  http://eigenclass.org/repos/oropes/head/set.png
  However, if the vector is balanced explicitly shortly before an update
  burst, Vect is somewhat surprisingly faster
  http://eigenclass.org/repos/oropes/head/set-balanced.png
  This might be attributed to Vect's smaller memory profile and the fact that
  it allows better use of the L2 cache, but there seems to be another factor
  that I have yet to discover.
* Vect.get is considerably faster than Vec.get
   http://eigenclass.org/repos/oropes/head/get.png

The above URL is a darcs repository, so if anybody shoots me a patch I'll be
happy to apply it :)

Regards,

--
Mauricio Fernandez  -   http://eigenclass.org

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

------=_Part_141516_4740819.1187732350061--