From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.4 required=5.0 tests=SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id C8F6ABC6B for ; Sun, 29 Jul 2007 01:33:07 +0200 (CEST) Received: from ctsmtpout1.frontal.correo (smtp.telefonica.net [213.4.149.66]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6SNX7sc010258 for ; Sun, 29 Jul 2007 01:33:07 +0200 Received: from tux-chan (193.153.2.138) by ctsmtpout1.frontal.correo (7.2.056.6) (authenticated as ferferse$telefonica.net) id 46A8D64500046F9E for caml-list@inria.fr; Sun, 29 Jul 2007 01:33:06 +0200 Received: from batsman by tux-chan with local (Exim 3.36 #1 (Debian)) id 1IEvmT-0002gM-00 for ; Sun, 29 Jul 2007 01:33:05 +0200 Date: Sun, 29 Jul 2007 01:33:05 +0200 From: Mauricio Fernandez To: caml-list Subject: Ropes and rope-like functional extensible vectors with O(1) prepend/append. Message-ID: <20070728233305.GB28879@tux-chan> Mail-Followup-To: caml-list MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.11 Sender: =?iso-8859-1?Q?Mauricio_Julio_Fern=E1ndez_Pradier?= X-Miltered: at concorde with ID 46ABD233.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; vectors:01 prepend:01 icfp:01 bounded:01 amortized:01 vectors:01 appending:01 arbitrarily:01 amortized:01 logarithmic:01 logarithmic:01 worst-case:01 locality:01 rebuilding:01 prepend:01 In the aftermath of the ICFP contest, during which I used Luca de Alfaro's Vec, I felt like implementing ropes, based on Boehm's paper and the well-known code included in his conservative garbage collector. I later realized that the basic implementation strategies ("dense" leaves, bounded tree height and amortized constant time concatenation of small strings) could be generalized to build general extensible vectors similar to Vec. Such vectors (tentatively named "Vect" until I find a better name) have some interesting properties: * lower space overhead than other structures based on balanced trees such as Vec. The overhead can be adjusted, allowing to make "get" faster at the expense of "set" and viceversa. * appending or prepending a small vector to an arbitrarily large one in amortized constant time * concat, subarray, insert, remove operations in amortized logarithmic time * access and modification (get, set) in logarithmic time The first one is probably the most compelling. Currently, Vec imposes a 6-word overhead per element. Even after the obvious modification consisting in adding a new constructor for leaves, the overhead would still be 350%... Vect uses compact leaves with a configurable number of elements (32-64 seem good choices, leading to worst-case overheads of 100% and 50% respectively), which also helps with "get" due to the improved spatial locality. You can find the code for both Rope and Vect at http://eigenclass.org/repos/oropes/head/ It is still young and experimental, but it's maybe worth a look. Any feedback is very welcome. The documentation can be found under http://eigenclass.org/repos/oropes/head/doc/ I've spent some time benchmarking it against Vec; you can also find the code I used and the resulting graphs at the above address. To summarise how it compares to Vec: * Vec can be used when persistence is required, but Vect would probably be a poor choice in this case (until that is fixed using lazy rebuilding, which doesn't seem too hard), unless rebalancing explicitly before "taking the snapshot" is an option * Vect can append/prepend single elements or small vectors very quickly, in amortized constant time. See http://eigenclass.org/repos/oropes/head/append.png * as expected, Vec.set is faster than Vect's in general http://eigenclass.org/repos/oropes/head/set.png However, if the vector is balanced explicitly shortly before an update burst, Vect is somewhat surprisingly faster http://eigenclass.org/repos/oropes/head/set-balanced.png This might be attributed to Vect's smaller memory profile and the fact that it allows better use of the L2 cache, but there seems to be another factor that I have yet to discover. * Vect.get is considerably faster than Vec.get http://eigenclass.org/repos/oropes/head/get.png The above URL is a darcs repository, so if anybody shoots me a patch I'll be happy to apply it :) Regards, -- Mauricio Fernandez - http://eigenclass.org