caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Vec: a functional implementation of extensible arrays
@ 2007-07-18 17:32 Luca de Alfaro
  2007-07-19  7:45 ` [Caml-list] " Loup Vaillant
  0 siblings, 1 reply; 10+ messages in thread
From: Luca de Alfaro @ 2007-07-18 17:32 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1238 bytes --]

Dear All,

I would like to share with you an Ocaml implementation of extensible
arrays.  The implementation is functional, based on balanced trees (and on
the code for Set and Map); I called the module Vec (for vector - I like
short names).  You can find it at http://www.dealfaro.com/home/vec.html
Module Vec provides, in log-time:

   -  Access and modification to arbitrary elements (Vec.put n el v puts
   element el in position n of vector v, for instance).
   - Concatenation
   - Insertion and removal of elements from arbitrary positions
   (auto-enlarging and auto-shrinking the vector).

as well as:

   - All kind of iterators and some visitor functions.
   - Efficient translation to/from lists and arrays.

An advantage of Vec over List, for very large data structures, is that
iterating over a Vec of size n requires always stack depth bounded by log n:
with lists, non-tail-recursive functions can cause stack overflows.

I have been using this data structure for some months, and it has been very
handy in a large number of occasions.  I hope it can be as useful to you.

I would appreciate all advice and feedback.  Also, is there a repository
where I should upload it?  Do you think it is worth it?

All the best,

Luca

[-- Attachment #2: Type: text/html, Size: 1468 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-18 17:32 Vec: a functional implementation of extensible arrays Luca de Alfaro
@ 2007-07-19  7:45 ` Loup Vaillant
  2007-07-19  8:17   ` Hugo Ferreira
  2007-07-19 16:59   ` Luca de Alfaro
  0 siblings, 2 replies; 10+ messages in thread
From: Loup Vaillant @ 2007-07-19  7:45 UTC (permalink / raw)
  To: caml-list

2007/7/18, Luca de Alfaro <luca@dealfaro.org>:
> Dear All,
>
> I would like to share with you an Ocaml implementation of extensible arrays.
>  The implementation is functional, based on balanced trees (and on the code
> for Set and Map); I called the module Vec (for vector - I like
> short names).  You can find it at
> http://www.dealfaro.com/home/vec.html
> Module Vec provides, in log-time:
>
>  Access and modification to arbitrary elements ( Vec.put n el v puts element
> el in position n of vector v, for instance).
> Concatenation
> Insertion and removal of elements from arbitrary positions (auto-enlarging
> and auto-shrinking the vector).
> as well as:
>
> All kind of iterators and some visitor functions.
> Efficient translation to/from lists and arrays.
> An advantage of Vec over List, for very large data structures, is that
> iterating over a Vec of size n requires always stack depth bounded by log n:
> with lists, non-tail-recursive functions can cause stack overflows.
>
> I have been using this data structure for some months, and it has been very
> handy in a large number of occasions.  I hope it can be as useful to you.
>
> I would appreciate all advice and feedback.  Also, is there a repository
> where I should upload it?  Do you think it is worth it?
>
> All the best,
>
> Luca

Very interesting. I always felt uneasy about the presence of
imperative arrays without a functional counterpart. I can't wait to
try it.

Looking at your array type definition, I assume that the timings you
specified are worst-case? Is it possible to achieve better (but
amortized) bounds? Do you think it would be worth the trouble?

I didn't see in your specs the complexity of your iterators. Does
these work in linear time, like those of the List and Array module?

Regards,
Loup


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-19  7:45 ` [Caml-list] " Loup Vaillant
@ 2007-07-19  8:17   ` Hugo Ferreira
  2007-07-19 16:51     ` Luca de Alfaro
  2007-07-19 17:13     ` Luca de Alfaro
  2007-07-19 16:59   ` Luca de Alfaro
  1 sibling, 2 replies; 10+ messages in thread
From: Hugo Ferreira @ 2007-07-19  8:17 UTC (permalink / raw)
  To: Loup Vaillant; +Cc: caml-list

Hello,

For those of you interested in functional array consider Sylvain Conchon
and Jean-Christophe Filliâtre paper in [1]. The Union-Find (UF) uses
persistent arrays as its base data structure. I have made tests with the
UF using the code provided, an implementation of k-BUF data structure
(delayed backtracking) and altered version of the persistent array (fat
nodes + delayed backtracking). The tests I did show that this version of
persistent arrays is very efficient (especially for single threaded
backtracking).

Maybe Luca could pit his implementation against that of the article and
report on how they fare?

Regards,
Hugo Ferreira.

[1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps

Loup Vaillant wrote:
> 2007/7/18, Luca de Alfaro <luca@dealfaro.org>:
>> Dear All,
>>
>> I would like to share with you an Ocaml implementation of extensible
>> arrays.
>>  The implementation is functional, based on balanced trees (and on the
>> code
>> for Set and Map); I called the module Vec (for vector - I like
>> short names).  You can find it at
>> http://www.dealfaro.com/home/vec.html
>> Module Vec provides, in log-time:
>>
>>  Access and modification to arbitrary elements ( Vec.put n el v puts
>> element
>> el in position n of vector v, for instance).
>> Concatenation
>> Insertion and removal of elements from arbitrary positions
>> (auto-enlarging
>> and auto-shrinking the vector).
>> as well as:
>>
>> All kind of iterators and some visitor functions.
>> Efficient translation to/from lists and arrays.
>> An advantage of Vec over List, for very large data structures, is that
>> iterating over a Vec of size n requires always stack depth bounded by
>> log n:
>> with lists, non-tail-recursive functions can cause stack overflows.
>>
>> I have been using this data structure for some months, and it has been
>> very
>> handy in a large number of occasions.  I hope it can be as useful to you.
>>
>> I would appreciate all advice and feedback.  Also, is there a repository
>> where I should upload it?  Do you think it is worth it?
>>
>> All the best,
>>
>> Luca
> 
> Very interesting. I always felt uneasy about the presence of
> imperative arrays without a functional counterpart. I can't wait to
> try it.
> 
> Looking at your array type definition, I assume that the timings you
> specified are worst-case? Is it possible to achieve better (but
> amortized) bounds? Do you think it would be worth the trouble?
> 
> I didn't see in your specs the complexity of your iterators. Does
> these work in linear time, like those of the List and Array module?
> 
> Regards,
> Loup
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-19  8:17   ` Hugo Ferreira
@ 2007-07-19 16:51     ` Luca de Alfaro
  2007-07-19 17:13     ` Luca de Alfaro
  1 sibling, 0 replies; 10+ messages in thread
From: Luca de Alfaro @ 2007-07-19 16:51 UTC (permalink / raw)
  To: Hugo Ferreira; +Cc: Loup Vaillant, caml-list

[-- Attachment #1: Type: text/plain, Size: 5124 bytes --]

Dear All,

thanks for the pointer to the excellent paper.  First, let me say that my
Vec data structure was born to fill a need I had while working on a project:
while it has been useful to me, I certainly do not claim it is the best that
can be done, so I am very grateful for these suggestions!

My Vec data structure is different from persistent arrays.  It is likely to
be less efficient for get/set use.
However, it offers at logarithmic cost insertion/removal operations that are
not present in the standard persistent arrays.
Consider a Vec a of size 10.

   - Vec.insert 3 d a inserts value d in position 3 of a, shifting
   elements 3..9 of a to positions 4..10.
   - Vec.remove 3 a removes the element in position 3 of a, shifting
   elements 4..9 to positions 3..8.  Vec.pop is similar and returns the
   removed element as well.
   - Vec.concat works in log-time.

These operations are necessary if you want to use a Vec as a FIFO, for
example (you append elements at the end, and you get the first element via
Vec.pop 0 a).  In many algorithms, it is often handy to be able to
remove/insert elements in the middle of a list.

In summary, I don't think the Vec data structure is a replacement for arrays
or persistent arrays in numerically-intensive work.  But if you want a
flexible data structure for the 90% of the code that is not peformance
critical, they can be useful.
Now the question is: can one get better get/set efficiency while retaining
the ability to insert/remove elements?  (I am sure that there is something
better to be done...).

Luca

On 7/19/07, Hugo Ferreira <hmf@inescporto.pt> wrote:
>
> Hello,
>
> For those of you interested in functional array consider Sylvain Conchon
> and Jean-Christophe Filliâtre paper in [1]. The Union-Find (UF) uses
> persistent arrays as its base data structure. I have made tests with the
> UF using the code provided, an implementation of k-BUF data structure
> (delayed backtracking) and altered version of the persistent array (fat
> nodes + delayed backtracking). The tests I did show that this version of
> persistent arrays is very efficient (especially for single threaded
> backtracking).
>
> Maybe Luca could pit his implementation against that of the article and
> report on how they fare?
>
> Regards,
> Hugo Ferreira.
>
> [1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps
>
> Loup Vaillant wrote:
> > 2007/7/18, Luca de Alfaro <luca@dealfaro.org>:
> >> Dear All,
> >>
> >> I would like to share with you an Ocaml implementation of extensible
> >> arrays.
> >>  The implementation is functional, based on balanced trees (and on the
> >> code
> >> for Set and Map); I called the module Vec (for vector - I like
> >> short names).  You can find it at
> >> http://www.dealfaro.com/home/vec.html
> >> Module Vec provides, in log-time:
> >>
> >>  Access and modification to arbitrary elements ( Vec.put n el v puts
> >> element
> >> el in position n of vector v, for instance).
> >> Concatenation
> >> Insertion and removal of elements from arbitrary positions
> >> (auto-enlarging
> >> and auto-shrinking the vector).
> >> as well as:
> >>
> >> All kind of iterators and some visitor functions.
> >> Efficient translation to/from lists and arrays.
> >> An advantage of Vec over List, for very large data structures, is that
> >> iterating over a Vec of size n requires always stack depth bounded by
> >> log n:
> >> with lists, non-tail-recursive functions can cause stack overflows.
> >>
> >> I have been using this data structure for some months, and it has been
> >> very
> >> handy in a large number of occasions.  I hope it can be as useful to
> you.
> >>
> >> I would appreciate all advice and feedback.  Also, is there a
> repository
> >> where I should upload it?  Do you think it is worth it?
> >>
> >> All the best,
> >>
> >> Luca
> >
> > Very interesting. I always felt uneasy about the presence of
> > imperative arrays without a functional counterpart. I can't wait to
> > try it.
> >
> > Looking at your array type definition, I assume that the timings you
> > specified are worst-case? Is it possible to achieve better (but
> > amortized) bounds? Do you think it would be worth the trouble?
> >
> > I didn't see in your specs the complexity of your iterators. Does
> > these work in linear time, like those of the List and Array module?
> >
> > Regards,
> > Loup
> >
> > _______________________________________________
> > Caml-list mailing list. Subscription management:
> > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> > Archives: http://caml.inria.fr
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
> >
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 6467 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-19  7:45 ` [Caml-list] " Loup Vaillant
  2007-07-19  8:17   ` Hugo Ferreira
@ 2007-07-19 16:59   ` Luca de Alfaro
  2007-07-20  7:35     ` Loup Vaillant
  1 sibling, 1 reply; 10+ messages in thread
From: Luca de Alfaro @ 2007-07-19 16:59 UTC (permalink / raw)
  To: Loup Vaillant; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3105 bytes --]

On 7/19/07, Loup Vaillant <loup.vaillant@gmail.com> wrote:
>
> 2007/7/18, Luca de Alfaro <luca@dealfaro.org>:
> > Dear All,
> >
> > I would like to share with you an Ocaml implementation of extensible
> arrays.
> >  The implementation is functional, based on balanced trees (and on the
> code
> > for Set and Map); I called the module Vec (for vector - I like
> > short names).  You can find it at
> > http://www.dealfaro.com/home/vec.html
> > Module Vec provides, in log-time:
> >
> >  Access and modification to arbitrary elements ( Vec.put n el v puts
> element
> > el in position n of vector v, for instance).
> > Concatenation
> > Insertion and removal of elements from arbitrary positions
> (auto-enlarging
> > and auto-shrinking the vector).
> > as well as:
> >
> > All kind of iterators and some visitor functions.
> > Efficient translation to/from lists and arrays.
> > An advantage of Vec over List, for very large data structures, is that
> > iterating over a Vec of size n requires always stack depth bounded by
> log n:
> > with lists, non-tail-recursive functions can cause stack overflows.
> >
> > I have been using this data structure for some months, and it has been
> very
> > handy in a large number of occasions.  I hope it can be as useful to
> you.
> >
> > I would appreciate all advice and feedback.  Also, is there a repository
> > where I should upload it?  Do you think it is worth it?
> >
> > All the best,
> >
> > Luca
>
> Very interesting. I always felt uneasy about the presence of
> imperative arrays without a functional counterpart. I can't wait to
> try it.
>
> Looking at your array type definition, I assume that the timings you
> specified are worst-case? Is it possible to achieve better (but
> amortized) bounds? Do you think it would be worth the trouble?
>
> I didn't see in your specs the complexity of your iterators. Does
> these work in linear time, like those of the List and Array module?
>
> Regards,
> Loup


For get/set, the worst case and the average case are both logarithmic: it's
a balanced tree (if you are lucky, you can find your answer at the root!
;-).  I am open to new ideas.  In part, I wanted a simple data structure
(easier to extend, among other things).  Also, I use Set, Map, etc, quite
often, and those are also balanced trees: I thought that if I can live with
those, I can probably live with Vec as well.

For an iterator, the worst case is as follows, where n is the size of the
Vec:

   - if you iterate on the whole Vec, then O(n)
   - if you iterate over m elements (you can iterate on a subrange), then
   O(m + log n).

That's why I have iterators: you can also iterate via a for loop, using get
to access the elements, but then the time becomes O(n log n) for the first
case, and O(m log n) for the second case.

 Luca

_______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 4125 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-19  8:17   ` Hugo Ferreira
  2007-07-19 16:51     ` Luca de Alfaro
@ 2007-07-19 17:13     ` Luca de Alfaro
  1 sibling, 0 replies; 10+ messages in thread
From: Luca de Alfaro @ 2007-07-19 17:13 UTC (permalink / raw)
  To: Hugo Ferreira; +Cc: Loup Vaillant, caml-list

[-- Attachment #1: Type: text/plain, Size: 3753 bytes --]

Reading the paper, I noticed that my name "put" for assigning a value to a
position is highly nonstandard.
I have changed it to "set" for version 1.1 of Vec.  Hopefully, changing so
soon will avoid confusion... sorry.

Luca

On 7/19/07, Hugo Ferreira <hmf@inescporto.pt> wrote:
>
> Hello,
>
> For those of you interested in functional array consider Sylvain Conchon
> and Jean-Christophe Filliâtre paper in [1]. The Union-Find (UF) uses
> persistent arrays as its base data structure. I have made tests with the
> UF using the code provided, an implementation of k-BUF data structure
> (delayed backtracking) and altered version of the persistent array (fat
> nodes + delayed backtracking). The tests I did show that this version of
> persistent arrays is very efficient (especially for single threaded
> backtracking).
>
> Maybe Luca could pit his implementation against that of the article and
> report on how they fare?
>
> Regards,
> Hugo Ferreira.
>
> [1] http://www.lri.fr/~filliatr/ftp/publis/puf-wml07.ps
>
> Loup Vaillant wrote:
> > 2007/7/18, Luca de Alfaro <luca@dealfaro.org>:
> >> Dear All,
> >>
> >> I would like to share with you an Ocaml implementation of extensible
> >> arrays.
> >>  The implementation is functional, based on balanced trees (and on the
> >> code
> >> for Set and Map); I called the module Vec (for vector - I like
> >> short names).  You can find it at
> >> http://www.dealfaro.com/home/vec.html
> >> Module Vec provides, in log-time:
> >>
> >>  Access and modification to arbitrary elements ( Vec.put n el v puts
> >> element
> >> el in position n of vector v, for instance).
> >> Concatenation
> >> Insertion and removal of elements from arbitrary positions
> >> (auto-enlarging
> >> and auto-shrinking the vector).
> >> as well as:
> >>
> >> All kind of iterators and some visitor functions.
> >> Efficient translation to/from lists and arrays.
> >> An advantage of Vec over List, for very large data structures, is that
> >> iterating over a Vec of size n requires always stack depth bounded by
> >> log n:
> >> with lists, non-tail-recursive functions can cause stack overflows.
> >>
> >> I have been using this data structure for some months, and it has been
> >> very
> >> handy in a large number of occasions.  I hope it can be as useful to
> you.
> >>
> >> I would appreciate all advice and feedback.  Also, is there a
> repository
> >> where I should upload it?  Do you think it is worth it?
> >>
> >> All the best,
> >>
> >> Luca
> >
> > Very interesting. I always felt uneasy about the presence of
> > imperative arrays without a functional counterpart. I can't wait to
> > try it.
> >
> > Looking at your array type definition, I assume that the timings you
> > specified are worst-case? Is it possible to achieve better (but
> > amortized) bounds? Do you think it would be worth the trouble?
> >
> > I didn't see in your specs the complexity of your iterators. Does
> > these work in linear time, like those of the List and Array module?
> >
> > Regards,
> > Loup
> >
> > _______________________________________________
> > Caml-list mailing list. Subscription management:
> > http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> > Archives: http://caml.inria.fr
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
> >
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

[-- Attachment #2: Type: text/html, Size: 5049 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-19 16:59   ` Luca de Alfaro
@ 2007-07-20  7:35     ` Loup Vaillant
  2007-07-20  8:14       ` Jon Harrop
  0 siblings, 1 reply; 10+ messages in thread
From: Loup Vaillant @ 2007-07-20  7:35 UTC (permalink / raw)
  To: caml-list

2007/7/19, Luca de Alfaro <luca@dealfaro.org>:
>
> For get/set, the worst case and the average case are both logarithmic: it's
> a balanced tree (if you are lucky, you can find your answer at the root!

I did. :)

> ;-).  I am open to new ideas.  In part, I wanted a simple data structure
> (easier to extend, among other things).  Also, I use Set, Map, etc, quite
> often, and those are also balanced trees: I thought that if I can live with
> those, I can probably live with Vec as well.

So can I. Your current implementation is already very attractive, and
looks very usable. For the new idea, have you thought of making (or
specifying) syntactic sugar to use your array?

About improving performance, here is my guess : there is no way to
lower the bounds on get and set. However, the average cost of insert
may already be O(1), provided you use your array the same way you
would use an imperative version of it (more accurately, not inserting
an element to an old version of your array). The same may be true for
remove.

Therefore, if I guess right, to take advantage of persistence AND have
insert perform in O(1) average, you would have to use (and pay for)
lazy evaluation. How, I don't know (yet).

(Note that I have stolen this idea from Okasaki's book)

> For an iterator, the worst case is as follows, where n is the size of the
> Vec:
>
> if you iterate on the whole Vec, then O(n)
> if you iterate over m elements (you can iterate on a subrange), then O(m +
> log n).
> That's why I have iterators: you can also iterate via a for loop, using get
> to access the elements, but then the time becomes O(n log n) for the first
> case, and O(m log n) for the second case.

That is why I wondered if lazy evaluation was worth the trouble at all
: most of the time, we iterate rather than insert or remove elements.
I only regret the absence of filter. Is there a way to obtain a
efficient filter? (Well, if my guess above is right, a naive
implementation of filter would already be quite efficient...)

Regards,
Loup


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-20  7:35     ` Loup Vaillant
@ 2007-07-20  8:14       ` Jon Harrop
  2007-07-20 15:42         ` Luca de Alfaro
  0 siblings, 1 reply; 10+ messages in thread
From: Jon Harrop @ 2007-07-20  8:14 UTC (permalink / raw)
  To: caml-list

On Friday 20 July 2007 08:35:39 Loup Vaillant wrote:
> > ;-).  I am open to new ideas.  In part, I wanted a simple data structure
> > (easier to extend, among other things).  Also, I use Set, Map, etc, quite
> > often, and those are also balanced trees: I thought that if I can live
> > with those, I can probably live with Vec as well.

This is the beginnings of an awesome data structure!

> So can I. Your current implementation is already very attractive, and
> looks very usable. For the new idea, have you thought of making (or
> specifying) syntactic sugar to use your array?

Should be very easy using the new camlp4. You might like to add a slicing 
notation as well. :-)

> About improving performance...

I have two suggestions:

1. Add an extra node representing single elements that replaces Node(Empty, _, 
Empty). The reduces GC stress enormously and makes the whole thing ~30% 
faster.

2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't need 
to rebalance, but "get" should force rebalancing. Extracting subarrays should 
return an unbalanced result.

> Is there a way to obtain a efficient filter?

Yes. I discovered a most-excellent way to do this. It requires arbitrary 
metadata in every node, a constructor that composes subnodes to create the 
metadata for the parent and a filter function that can cull branches from the 
search tree.

I used this in my Mathematica implementation to provide asymptotically fast 
filtering based upon lazily evaluated sets of symbols in each subnode. This 
gave huge performance improvements with no significant performance overhead.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-20  8:14       ` Jon Harrop
@ 2007-07-20 15:42         ` Luca de Alfaro
  2007-07-20 16:45           ` Brian Hurt
  0 siblings, 1 reply; 10+ messages in thread
From: Luca de Alfaro @ 2007-07-20 15:42 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 3622 bytes --]

On 7/20/07, Jon Harrop <jon@ffconsultancy.com> wrote:
>
> On Friday 20 July 2007 08:35:39 Loup Vaillant wrote:
> > > ;-).  I am open to new ideas.  In part, I wanted a simple data
> structure
> > > (easier to extend, among other things).  Also, I use Set, Map, etc,
> quite
> > > often, and those are also balanced trees: I thought that if I can live
> > > with those, I can probably live with Vec as well.
>
> This is the beginnings of an awesome data structure!


Thanks!

> So can I. Your current implementation is already very attractive, and
> > looks very usable. For the new idea, have you thought of making (or
> > specifying) syntactic sugar to use your array?
>
> Should be very easy using the new camlp4. You might like to add a slicing
> notation as well. :-)


I have to study how to do it ... this would be very interesting.
Would you be interested in helping?

> About improving performance...
>
> I have two suggestions:
>
> 1. Add an extra node representing single elements that replaces
> Node(Empty, _,
> Empty). The reduces GC stress enormously and makes the whole thing ~30%
> faster.


This is easy.  I can give it a try soon, and see if I get something
reasonable, or if the code blows up.

2. Allow unbalanced sub trees. Balancing is slow and folds and maps don't
> need
> to rebalance, but "get" should force rebalancing. Extracting subarrays
> should
> return an unbalanced result.


This is almost easy.  I would need to add a bit to each node to keep track
of whether it's balanced...
The penalty would be that the balancing function would need to do slightly
more work to find out what has to be balanced.
So perhaps it's not a good idea for append, insert, but it could make sense
for concat (?), and especially for filter and sub...
But I am hesitant.  If one does concat, or one does sub to extract a
sub-array, I wrote the code already so that sharing is maximized. What is
the percentage of cases in which you get a Vec, but then don't do any
get/set on it, and only iterate?
Especially since you already have iterators on subranges?  Do you think it's
worth it?  Anyone has advice?


> > Is there a way to obtain a efficient filter?
>
> Yes. I discovered a most-excellent way to do this. It requires arbitrary
> metadata in every node, a constructor that composes subnodes to create the
> metadata for the parent and a filter function that can cull branches from
> the
> search tree.
>
> I used this in my Mathematica implementation to provide asymptotically
> fast
> filtering based upon lazily evaluated sets of symbols in each subnode.
> This
> gave huge performance improvements with no significant performance
> overhead.


I don't provide filter because..., well, I guess because I forgot: of all
iterators, filter is the one I need most rarely.
I should at least provide a simple implementation of it...

Another operation I would like to implement is splice:

splice i v1 v2

replaces the element in position i of vec v2 with vec v1.  A sort of
generalized insert.

Dr Jon D Harrop, Flying Frog Consultancy Ltd.
> OCaml for Scientists
> http://www.ffconsultancy.com/products/ocaml_for_scientists/?e
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


BTW, Jon (and anyone else as well), let me know if you would like to help...
I could create a Google Code project so that we get a svn repository for the
code.

Luca

[-- Attachment #2: Type: text/html, Size: 5231 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] Vec: a functional implementation of extensible arrays
  2007-07-20 15:42         ` Luca de Alfaro
@ 2007-07-20 16:45           ` Brian Hurt
  0 siblings, 0 replies; 10+ messages in thread
From: Brian Hurt @ 2007-07-20 16:45 UTC (permalink / raw)
  To: Luca de Alfaro; +Cc: Jon Harrop, caml-list

Luca de Alfaro wrote:

>
>
> This is almost easy.  I would need to add a bit to each node to keep 
> track of whether it's balanced...
> The penalty would be that the balancing function would need to do 
> slightly more work to find out what has to be balanced.
> So perhaps it's not a good idea for append, insert, but it could make 
> sense for concat (?), and especially for filter and sub...
> But I am hesitant.  If one does concat, or one does sub to extract a 
> sub-array, I wrote the code already so that sharing is maximized. What 
> is the percentage of cases in which you get a Vec, but then don't do 
> any get/set on it, and only iterate?
> Especially since you already have iterators on subranges?  Do you 
> think it's worth it?  Anyone has advice?

I don't think that with laziness you can avoid enough work to make 
inserts O(1).

On the other hand, sub and filter can be done in O(M + log N) easily 
enough, see:
http://citeseer.ist.psu.edu/236207.html

The paper is about red-black trees, but it's applicable to all 
rotation-balanced trees.

Brian


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2007-07-20 16:45 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-18 17:32 Vec: a functional implementation of extensible arrays Luca de Alfaro
2007-07-19  7:45 ` [Caml-list] " Loup Vaillant
2007-07-19  8:17   ` Hugo Ferreira
2007-07-19 16:51     ` Luca de Alfaro
2007-07-19 17:13     ` Luca de Alfaro
2007-07-19 16:59   ` Luca de Alfaro
2007-07-20  7:35     ` Loup Vaillant
2007-07-20  8:14       ` Jon Harrop
2007-07-20 15:42         ` Luca de Alfaro
2007-07-20 16:45           ` Brian Hurt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).