caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Ropes and rope-like functional extensible vectors with O(1) prepend/append.
Date: Sat, 4 Aug 2007 11:36:37 +0100	[thread overview]
Message-ID: <200708041136.37354.jon@ffconsultancy.com> (raw)
In-Reply-To: <20070731235514.GA31718@tux-chan>


Incidentally, this data structure is a completely generic sequence. I think it 
would be extremely valuable to provide pattern matching over such a sequence. 
This could be done either using the active patterns macro or by adding a new 
macro (that could also allow sequence literals). I did something similar for 
the original Vec data structure (independently).

On Wednesday 01 August 2007 00:55:14 Mauricio Fernandez wrote:
> Yes, linear search over <100 elements should be acceptable if the
> structure is to hold several orders of magnitude more...

Yes. After all, this optimization is replacing linear search anyway...

> You're very right, a functor makes so much more sense here: it saves one
> word per node and allows stronger typing (the alternative would be ugly,
> lots of  if mycombine != hiscombine then invalid_arg "operation" and errors
> found at run-time).

Exactly.

> So combine would be  combine : 'meta -> 'meta -> 'meta; commutative and
> associative, so that it can be used in leaves as
>   Array.fold_left combine arr default
> which shows that a default value for the metadata would have to be provided
> too.

Sounds good.

> What about cull? a  control_cull : 'meta -> bool  that tells the vect
> whether the search goes on recursively for each node (so the search is
> carried out by a function in Vect) , or a function that handles recursion
> itself, using some get_metadata : ('a, 'meta) t -> 'meta ? It seems the
> latter could lead to a leaky abstraction though. Which type would it have
> anyway? Both
> ('a, 'meta) t -> 'a  and say  ('a, 'meta) t -> 'a list  could be useful
> (the former can be used for find and the latter e.g. for select).

What about writing the search in continuation passing style. So the search 
function calls a continuation with a left, current and right just like the 
recursive call within "Set.find".

On a related note, I think the Set module would be much more powerful if the 
choose function chose a roughly central element by extracting from the root. 
Not only is this faster, it allows many more algorithmic optimizations to be 
performance from outside Set by recursively choosing and splitting. I think 
the set-theoretic operations could then be implemented from outside the AVL 
tree.

> > You could then reimplement the Set module on top of your data structure
> > by searching for the index of the given element and inserting it if it is
> > new.
>
> For the sake of better space efficiency?

And performance. The current Set implementation performs huge amount of 
unnecessary allocations and is an order of magnitude slower than hashset as a 
consequence. Your rope-based implementation could close that gap considerably 
without sacrificing the purely functional style.

> Set uses 5 words per element, but 
> it could be brought down to 3.5 words by adding a new constructor. Still,
> Vect's ~1.125 to ~2.0 would remain considerably better.
> It'd be great to find a way to make good use of the O(1) append to improve
> on Set's logarithmic bounds, but I can't see how right now (again, it's
> late :)

There are lots more potential improvements, like adding a set_of_array 
function that avoids repeated insertion.

> > Yes. You could also use recursive subdivision to create a perfectly
> > balanced result.
>
> The problem is that the obvious implementation, using Array, would run
> against the max_array_length limit. Avoiding it is pretty easy but there
> are still a few more interesting things to be done :)

I'd forget about it to be honest. In 2 years, most desktops will be 64-bit...

> Last but not least, I've added destructive_set : int -> 'a -> 'a t -> unit.
> It's evil but so much faster...
> http://eigenclass.org/repos/oropes/head/set-balanced.png
> It brings Vect one order of magnitude closer to Array for ephemeral usage.

Cool.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


  reply	other threads:[~2007-08-04 10:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-28 23:33 Mauricio Fernandez
2007-07-30  0:46 ` [Caml-list] " Jon Harrop
2007-07-30 11:51   ` Mauricio Fernandez
2007-07-30 13:47     ` Jon Harrop
2007-07-31 23:55       ` Mauricio Fernandez
2007-08-04 10:36         ` Jon Harrop [this message]
2007-08-09 22:07 ` Nathaniel Gray
2007-08-21 21:39 ` Luca de Alfaro
2007-08-22  2:54   ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200708041136.37354.jon@ffconsultancy.com \
    --to=jon@ffconsultancy.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).