caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: "Luca de Alfaro" <luca@dealfaro.org>
To: "Jon Harrop" <jon@ffconsultancy.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Vec: a functional implementation of extensible arrays
Date: Fri, 20 Jul 2007 08:42:47 -0700	[thread overview]
Message-ID: <28fa90930707200842q742d5f77y29885d0c7ebcc74c@mail.gmail.com> (raw)
In-Reply-To: <200707200914.18557.jon@ffconsultancy.com>

[-- Attachment #1: Type: text/plain, Size: 3622 bytes --]

On 7/20/07, Jon Harrop <jon@ffconsultancy.com> wrote:
>
> On Friday 20 July 2007 08:35:39 Loup Vaillant wrote:
> > > ;-).  I am open to new ideas.  In part, I wanted a simple data
> structure
> > > (easier to extend, among other things).  Also, I use Set, Map, etc,
> quite
> > > often, and those are also balanced trees: I thought that if I can live
> > > with those, I can probably live with Vec as well.
>
> This is the beginnings of an awesome data structure!


Thanks!

> So can I. Your current implementation is already very attractive, and
> > looks very usable. For the new idea, have you thought of making (or
> > specifying) syntactic sugar to use your array?
>
> Should be very easy using the new camlp4. You might like to add a slicing
> notation as well. :-)


I have to study how to do it ... this would be very interesting.
Would you be interested in helping?

> About improving performance...
>
> I have two suggestions:
>
> 1. Add an extra node representing single elements that replaces
> Node(Empty, _,
> Empty). The reduces GC stress enormously and makes the whole thing ~30%
> faster.


This is easy.  I can give it a try soon, and see if I get something
reasonable, or if the code blows up.

2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't
> need
> to rebalance, but "get" should force rebalancing. Extracting subarrays
> should
> return an unbalanced result.


This is almost easy.  I would need to add a bit to each node to keep track
of whether it's balanced...
The penalty would be that the balancing function would need to do slightly
more work to find out what has to be balanced.
So perhaps it's not a good idea for append, insert, but it could make sense
for concat (?), and especially for filter and sub...
But I am hesitant.  If one does concat, or one does sub to extract a
sub-array, I wrote the code already so that sharing is maximized. What is
the percentage of cases in which you get a Vec, but then don't do any
get/set on it, and only iterate?
Especially since you already have iterators on subranges?  Do you think it's
worth it?  Anyone has advice?


> > Is there a way to obtain a efficient filter?
>
> Yes. I discovered a most-excellent way to do this. It requires arbitrary
> metadata in every node, a constructor that composes subnodes to create the
> metadata for the parent and a filter function that can cull branches from
> the
> search tree.
>
> I used this in my Mathematica implementation to provide asymptotically
> fast
> filtering based upon lazily evaluated sets of symbols in each subnode.
> This
> gave huge performance improvements with no significant performance
> overhead.


I don't provide filter because..., well, I guess because I forgot: of all
iterators, filter is the one I need most rarely.
I should at least provide a simple implementation of it...

Another operation I would like to implement is splice:

splice i v1 v2

replaces the element in position i of vec v2 with vec v1.  A sort of
generalized insert.

Dr Jon D Harrop, Flying Frog Consultancy Ltd.
> OCaml for Scientists
> http://www.ffconsultancy.com/products/ocaml_for_scientists/?e
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


BTW, Jon (and anyone else as well), let me know if you would like to help...
I could create a Google Code project so that we get a svn repository for the
code.

Luca

[-- Attachment #2: Type: text/html, Size: 5231 bytes --]

  reply	other threads:[~2007-07-20 15:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-18 17:32 Luca de Alfaro
2007-07-19  7:45 ` [Caml-list] " Loup Vaillant
2007-07-19  8:17   ` Hugo Ferreira
2007-07-19 16:51     ` Luca de Alfaro
2007-07-19 17:13     ` Luca de Alfaro
2007-07-19 16:59   ` Luca de Alfaro
2007-07-20  7:35     ` Loup Vaillant
2007-07-20  8:14       ` Jon Harrop
2007-07-20 15:42         ` Luca de Alfaro [this message]
2007-07-20 16:45           ` Brian Hurt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=28fa90930707200842q742d5f77y29885d0c7ebcc74c@mail.gmail.com \
    --to=luca@dealfaro.org \
    --cc=caml-list@yquem.inria.fr \
    --cc=jon@ffconsultancy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).