caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Ropes and rope-like functional extensible vectors with O(1) prepend/append.
Date: Mon, 30 Jul 2007 14:47:23 +0100	[thread overview]
Message-ID: <200707301447.24010.jon@ffconsultancy.com> (raw)
In-Reply-To: <20070730115129.GD28879@tux-chan>

On Monday 30 July 2007 12:51:29 Mauricio Fernandez wrote:
> The structure is persistent and all operations are non-destructive.
> Mutation is used only in the rebalancing operation and in set, but they
> affect an ephemeral forest of ropes/vects and a new string/array
> respectively, so the original structure is never modified and in principle
> all operations should be thread-safe.
>
> Ropes/vects being functional doesn't mean that they will perform well in a
> persistent setting however, see the clarification at
>
> http://eigenclass.org/repos/oropes/head/doc/Vect.html

Ok.

> > I have some suggestions:
> >
> > I'd like metadata in every node, so I can provide a constructor that
> > combines the metadata of child nodes and a cull function to accelerate
> > searches.
>
> If I understand it correctly, that scheme could in the limit turn some O(n)
> searches into O(log n)?), right?

Something like that, yes.

> Unlike Vec, Vect uses "compact" leaves (Leaf of 'a array) of bounded size
> (leaf_size, typically 16-64), which might not fit very well.

I can think of two solutions:

1. Linear search.

2. Store an array of metadata corresponding to a binary search of the array of 
elements in a leaf.

The latter sounds sexy but the former would probably suffice. :-)

> Vect would need to know how to combine the metadata, wouldn't it?

Users would have to provide a function to do that, yes.

> I was 
> thinking about something like
>   Leaf of ('meta -> 'meta -> 'meta) * 'meta * 'a array
> but I've realized that this wouldn't suffice. So, given that Vect.t is
> currently
>
>     type 'a t =
> 	Empty
>
>       | Concat of 'a t * int * 'a t * int * int
>       | Leaf of 'a array
>
> something like this maybe?
>
>   type ('a, 'meta) t =
>   	Empty of ('meta -> 'meta -> 'meta)
>
>       | Concat of ('meta -> 'meta -> 'meta) * 'meta *
>
>                   ('a, 'meta) t * int *
>                   ('a, 'meta) t * int *
> 		  int
>
>       | Leaf of ('meta -> 'meta -> meta) * 'meta array * 'a array
>
> 		(* maybe also  * 'meta  to cache the last computation? *)
>
> or even without the ('meta -> 'meta -> 'meta) part, forcing the user to
> pass the function on each modification?  Just thinking out loud.

I would use a functor to pass it the "combine" and "cull" functions, rather 
than putting them in the tree itself. This is analogous to the Set.Make 
functor accepting a comparison function.

> At any rate, it'd be better to provide it as a separate structure, any
> suggestions for the name?.

You could just call it Tree and try to make it as generic as possible.

You could then reimplement the Set module on top of your data structure by 
searching for the index of the given element and inserting it if it is new. 
Hmm, maybe a function that combines a search with an update (to avoid two 
traversals) would be a good idea...

> > The usual HOFs, like map.
>
> I just pushed a patch with filter and map. The former is trivially
> implemented with fold + append (thanks to the O(1) append). I was going to
> code map the same way but I ended up making one that returns an isomorphic
> vect and is faster (since there's no need to rebalance).

Yes. You could also use recursive subdivision to create a perfectly balanced 
result.

> So Vect currently has iter, iteri, rangeiter, fold, map and filter. I'm
> considering renaming fold to fold_left and providing fold_right too.

I'd definitely provide both folds, yes.

I'd also like specialized rewrite functions that avoid allocations when the 
outputs are the same as the inputs. So I'd make the set function bail via an 
exception when the element is left unchanged, returning the original Vect and 
doing no allocation in this case. I'd also like an id_map function that did a 
map but reused old branches whenever they were left unchanged.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


  reply	other threads:[~2007-07-30 13:56 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-07-28 23:33 Mauricio Fernandez
2007-07-30  0:46 ` [Caml-list] " Jon Harrop
2007-07-30 11:51   ` Mauricio Fernandez
2007-07-30 13:47     ` Jon Harrop [this message]
2007-07-31 23:55       ` Mauricio Fernandez
2007-08-04 10:36         ` Jon Harrop
2007-08-09 22:07 ` Nathaniel Gray
2007-08-21 21:39 ` Luca de Alfaro
2007-08-22  2:54   ` skaller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200707301447.24010.jon@ffconsultancy.com \
    --to=jon@ffconsultancy.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).