From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id 3D449BC69 for ; Fri, 20 Jul 2007 17:42:50 +0200 (CEST) Received: from qb-out-0506.google.com (qb-out-0506.google.com [72.14.204.236]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l6KFgmTb019539 for ; Fri, 20 Jul 2007 17:42:49 +0200 Received: by qb-out-0506.google.com with SMTP id e11so1581344qbe for ; Fri, 20 Jul 2007 08:42:48 -0700 (PDT) Received: by 10.142.97.20 with SMTP id u20mr43655wfb.1184946167443; Fri, 20 Jul 2007 08:42:47 -0700 (PDT) Received: by 10.142.72.4 with HTTP; Fri, 20 Jul 2007 08:42:47 -0700 (PDT) Message-ID: <28fa90930707200842q742d5f77y29885d0c7ebcc74c@mail.gmail.com> Date: Fri, 20 Jul 2007 08:42:47 -0700 From: "Luca de Alfaro" To: "Jon Harrop" Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays Cc: caml-list@yquem.inria.fr In-Reply-To: <200707200914.18557.jon@ffconsultancy.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_103877_9026133.1184946167406" References: <28fa90930707181032q7681340pc30fb47434aff5fc@mail.gmail.com> <28fa90930707190959g66cb2e6wc7a446af2a6dfb72@mail.gmail.com> <6f9f8f4a0707200035l4351c6b6g585b4edd2c51faa6@mail.gmail.com> <200707200914.18557.jon@ffconsultancy.com> X-Miltered: at discorde with ID 46A0D7F8.002 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; arrays:01 camlp:01 notation:01 node:01 node:01 unbalanced:01 unbalanced:01 sub-array:01 subranges:01 subnodes:01 subnode:01 ocaml:01 ocaml:01 beginner's:01 bug:01 ------=_Part_103877_9026133.1184946167406 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline On 7/20/07, Jon Harrop wrote: > > On Friday 20 July 2007 08:35:39 Loup Vaillant wrote: > > > ;-). I am open to new ideas. In part, I wanted a simple data > structure > > > (easier to extend, among other things). Also, I use Set, Map, etc, > quite > > > often, and those are also balanced trees: I thought that if I can live > > > with those, I can probably live with Vec as well. > > This is the beginnings of an awesome data structure! Thanks! > So can I. Your current implementation is already very attractive, and > > looks very usable. For the new idea, have you thought of making (or > > specifying) syntactic sugar to use your array? > > Should be very easy using the new camlp4. You might like to add a slicing > notation as well. :-) I have to study how to do it ... this would be very interesting. Would you be interested in helping? > About improving performance... > > I have two suggestions: > > 1. Add an extra node representing single elements that replaces > Node(Empty, _, > Empty). The reduces GC stress enormously and makes the whole thing ~30% > faster. This is easy. I can give it a try soon, and see if I get something reasonable, or if the code blows up. 2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't > need > to rebalance, but "get" should force rebalancing. Extracting subarrays > should > return an unbalanced result. This is almost easy. I would need to add a bit to each node to keep track of whether it's balanced... The penalty would be that the balancing function would need to do slightly more work to find out what has to be balanced. So perhaps it's not a good idea for append, insert, but it could make sense for concat (?), and especially for filter and sub... But I am hesitant. If one does concat, or one does sub to extract a sub-array, I wrote the code already so that sharing is maximized. What is the percentage of cases in which you get a Vec, but then don't do any get/set on it, and only iterate? Especially since you already have iterators on subranges? Do you think it's worth it? Anyone has advice? > > Is there a way to obtain a efficient filter? > > Yes. I discovered a most-excellent way to do this. It requires arbitrary > metadata in every node, a constructor that composes subnodes to create the > metadata for the parent and a filter function that can cull branches from > the > search tree. > > I used this in my Mathematica implementation to provide asymptotically > fast > filtering based upon lazily evaluated sets of symbols in each subnode. > This > gave huge performance improvements with no significant performance > overhead. I don't provide filter because..., well, I guess because I forgot: of all iterators, filter is the one I need most rarely. I should at least provide a simple implementation of it... Another operation I would like to implement is splice: splice i v1 v2 replaces the element in position i of vec v2 with vec v1. A sort of generalized insert. Dr Jon D Harrop, Flying Frog Consultancy Ltd. > OCaml for Scientists > http://www.ffconsultancy.com/products/ocaml_for_scientists/?e > > _______________________________________________ > Caml-list mailing list. Subscription management: > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list > Archives: http://caml.inria.fr > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners > Bug reports: http://caml.inria.fr/bin/caml-bugs > BTW, Jon (and anyone else as well), let me know if you would like to help... I could create a Google Code project so that we get a svn repository for the code. Luca ------=_Part_103877_9026133.1184946167406 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline On 7/20/07, Jon Harrop <jon@ffconsultancy.com> wrote:
On Friday 20 July 2007 08:35:39 Loup Vaillant wrote:
> > ;-).  I am open to new ideas.  In part, I wanted a simple data structure
> > (easier to extend, among other things).  Also, I use Set, Map, etc, quite
> > often, and those are also balanced trees: I thought that if I can live
> > with those, I can probably live with Vec as well.

This is the beginnings of an awesome data structure!

Thanks!

> So can I. Your current implementation is already very attractive, and
> looks very usable. For the new idea, have you thought of making (or
> specifying) syntactic sugar to use your array?

Should be very easy using the new camlp4. You might like to add a slicing
notation as well. :-)

I have to study how to do it ... this would be very interesting. 
Would you be interested in helping?

> About improving performance...

I have two suggestions:

1. Add an extra node representing single elements that replaces Node(Empty, _,
Empty). The reduces GC stress enormously and makes the whole thing ~30%
faster.

This is easy.  I can give it a try soon, and see if I get something reasonable, or if the code blows up.  

2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't need
to rebalance, but "get" should force rebalancing. Extracting subarrays should
return an unbalanced result.

This is almost easy.  I would need to add a bit to each node to keep track of whether it's balanced...
The penalty would be that the balancing function would need to do slightly more work to find out what has to be balanced.
So perhaps it's not a good idea for append, insert, but it could make sense for concat (?), and especially for filter and sub...
But I am hesitant.  If one does concat, or one does sub to extract a sub-array, I wrote the code already so that sharing is maximized. What is the percentage of cases in which you get a Vec, but then don't do any get/set on it, and only iterate?
Especially since you already have iterators on subranges?  Do you think it's worth it?  Anyone has advice?
 
> Is there a way to obtain a efficient filter?

Yes. I discovered a most-excellent way to do this. It requires arbitrary
metadata in every node, a constructor that composes subnodes to create the
metadata for the parent and a filter function that can cull branches from the
search tree.

I used this in my Mathematica implementation to provide asymptotically fast
filtering based upon lazily evaluated sets of symbols in each subnode. This
gave huge performance improvements with no significant performance overhead.

I don't provide filter because..., well, I guess because I forgot: of all iterators, filter is the one I need most rarely.
I should at least provide a simple implementation of it...

Another operation I would like to implement is splice:

splice i v1 v2

replaces the element in position i of vec v2 with vec v1.  A sort of generalized insert.

Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs


BTW, Jon (and anyone else as well), let me know if you would like to help... I could create a Google Code project so that we get a svn repository for the code.

Luca
------=_Part_103877_9026133.1184946167406--