caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Alain Frisch <alain@frisch.fr>
To: Berke Durak <berke.durak@exalead.com>
Cc: caml-list <caml-list@inria.fr>
Subject: Re: [Caml-list] Canonical Set/Map datastructure?
Date: Wed, 05 Mar 2008 18:27:38 +0100	[thread overview]
Message-ID: <47CED80A.1010504@frisch.fr> (raw)
In-Reply-To: <47CECF23.1020508@exalead.com>

Berke Durak wrote:
> The Map and Set modules use AVL trees which are efficient but not 
> canonical - a given
> set of elements can have more than one representation.  This means that 
> you cannot use
> ad hoc comparison on sets and maps, and this is why they are presented 
> as functors.
> 
> Does anyone know if, in the many years that have passed since the 
> implementation of
> those fine modules, someone has invented a (functional) datastructure 
> that is as
> efficient while being canonic?

Well, Patricia trees have been around for many years and they satisfy 
this property. They also allow set operations (union, intersection, ...) 
in linear time (and I explain below how this can be optimized to 
something which is really efficient for some applications). 
Jean-Christophe Filliâtre has an implementation on its web page.

Patricia trees work fine when the set elements can easily be represented 
as strings of bits. So if you can map your elements to integers, that's 
ok. Otherwise, you can hash-cons your elements to get unique integers 
for them.

Something that Jean-Christophe's implementation doesn't do but which is 
quite easy to add is to use hash-consing on patricia trees themselves, 
that is, to memoize their constructors in order to get unique physical 
representation and maximal sharing. That way, you get:

  structural equality = physical equality = set equality

With this property, set operations on patricia trees can be optimized 
with reflexivity properties (e.g. the inner loop of the union function 
can start by checking equality of its arguments).

Also, you get a nice unique integer for each tree. This allow you to 
memoize efficiently set operations (like union, intersection, for which 
you can use memoization in the inner loop, not only at toplevel), and to 
build sets of sets (and so on).

-- Alain


  parent reply	other threads:[~2008-03-05 17:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-05 16:49 Berke Durak
2008-03-05 17:16 ` [Caml-list] " Brian Hurt
2008-03-05 17:27 ` Alain Frisch [this message]
2008-03-05 19:53   ` Jean-Christophe Filliâtre
2008-03-05 20:03   ` Jon Harrop
2008-03-05 21:56     ` Alain Frisch
2008-03-06  7:45     ` Jean-Christophe Filliâtre
2008-03-05 17:34 ` Harrison, John R
2008-03-06  9:53 ` Berke Durak
2008-03-06 17:36   ` Harrison, John R
2008-03-07 10:09     ` Berke Durak
2008-03-07 17:13       ` Harrison, John R
2008-03-07 10:19   ` Alain Frisch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47CED80A.1010504@frisch.fr \
    --to=alain@frisch.fr \
    --cc=berke.durak@exalead.com \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).