From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 257B7BC6B for ; Sat, 4 Aug 2007 12:45:32 +0200 (CEST) Received: from pih-relay06.plus.net (pih-relay06.plus.net [212.159.14.133]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l74AjVLB004581 (version=TLSv1/SSLv3 cipher=AES256-SHA bits=256 verify=NO) for ; Sat, 4 Aug 2007 12:45:31 +0200 Received: from [80.229.56.224] (helo=beast.local) by pih-relay06.plus.net with esmtp (Exim) id 1IHH8U-00086z-Sr for caml-list@inria.fr; Sat, 04 Aug 2007 11:45:31 +0100 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@inria.fr Subject: Re: [Caml-list] Ropes and rope-like functional extensible vectors with O(1) prepend/append. Date: Sat, 4 Aug 2007 11:36:37 +0100 User-Agent: KMail/1.9.7 References: <20070728233305.GB28879@tux-chan> <200707301447.24010.jon@ffconsultancy.com> <20070731235514.GA31718@tux-chan> In-Reply-To: <20070731235514.GA31718@tux-chan> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200708041136.37354.jon@ffconsultancy.com> X-Miltered: at discorde with ID 46B458CB.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; vectors:01 prepend:01 literals:01 functor:01 node:01 run-time:01 associative:01 bool:01 node:01 recursion:01 abstraction:01 recursive:01 reimplement:01 inserting:01 allocations:01 Incidentally, this data structure is a completely generic sequence. I think it would be extremely valuable to provide pattern matching over such a sequence. This could be done either using the active patterns macro or by adding a new macro (that could also allow sequence literals). I did something similar for the original Vec data structure (independently). On Wednesday 01 August 2007 00:55:14 Mauricio Fernandez wrote: > Yes, linear search over <100 elements should be acceptable if the > structure is to hold several orders of magnitude more... Yes. After all, this optimization is replacing linear search anyway... > You're very right, a functor makes so much more sense here: it saves one > word per node and allows stronger typing (the alternative would be ugly, > lots of if mycombine != hiscombine then invalid_arg "operation" and errors > found at run-time). Exactly. > So combine would be combine : 'meta -> 'meta -> 'meta; commutative and > associative, so that it can be used in leaves as > Array.fold_left combine arr default > which shows that a default value for the metadata would have to be provided > too. Sounds good. > What about cull? a control_cull : 'meta -> bool that tells the vect > whether the search goes on recursively for each node (so the search is > carried out by a function in Vect) , or a function that handles recursion > itself, using some get_metadata : ('a, 'meta) t -> 'meta ? It seems the > latter could lead to a leaky abstraction though. Which type would it have > anyway? Both > ('a, 'meta) t -> 'a and say ('a, 'meta) t -> 'a list could be useful > (the former can be used for find and the latter e.g. for select). What about writing the search in continuation passing style. So the search function calls a continuation with a left, current and right just like the recursive call within "Set.find". On a related note, I think the Set module would be much more powerful if the choose function chose a roughly central element by extracting from the root. Not only is this faster, it allows many more algorithmic optimizations to be performance from outside Set by recursively choosing and splitting. I think the set-theoretic operations could then be implemented from outside the AVL tree. > > You could then reimplement the Set module on top of your data structure > > by searching for the index of the given element and inserting it if it is > > new. > > For the sake of better space efficiency? And performance. The current Set implementation performs huge amount of unnecessary allocations and is an order of magnitude slower than hashset as a consequence. Your rope-based implementation could close that gap considerably without sacrificing the purely functional style. > Set uses 5 words per element, but > it could be brought down to 3.5 words by adding a new constructor. Still, > Vect's ~1.125 to ~2.0 would remain considerably better. > It'd be great to find a way to make good use of the O(1) append to improve > on Set's logarithmic bounds, but I can't see how right now (again, it's > late :) There are lots more potential improvements, like adding a set_of_array function that avoids repeated insertion. > > Yes. You could also use recursive subdivision to create a perfectly > > balanced result. > > The problem is that the obvious implementation, using Array, would run > against the max_array_length limit. Avoiding it is pretty easy but there > are still a few more interesting things to be done :) I'd forget about it to be honest. In 2 years, most desktops will be 64-bit... > Last but not least, I've added destructive_set : int -> 'a -> 'a t -> unit. > It's evil but so much faster... > http://eigenclass.org/repos/oropes/head/set-balanced.png > It brings Vect one order of magnitude closer to Array for ephemeral usage. Cool. -- Dr Jon D Harrop, Flying Frog Consultancy Ltd. OCaml for Scientists http://www.ffconsultancy.com/products/ocaml_for_scientists/?e