caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Canonical Set/Map datastructure?
@ 2008-03-05 16:49 Berke Durak
  2008-03-05 17:16 ` [Caml-list] " Brian Hurt
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Berke Durak @ 2008-03-05 16:49 UTC (permalink / raw)
  To: Caml-list List

The Map and Set modules use AVL trees which are efficient but not canonical - a given
set of elements can have more than one representation.  This means that you cannot use
ad hoc comparison on sets and maps, and this is why they are presented as functors.

Does anyone know if, in the many years that have passed since the implementation of
those fine modules, someone has invented a (functional) datastructure that is as
efficient while being canonic?

That would permit polymorphic set and map modules that work correctly on sets of sets, for
instance.  Of course, the order induced on sets by the adhoc comparison doesn't have to
be a useful order; just being a good order would suffice.
-- 
Berke DURAK


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-05 16:49 Canonical Set/Map datastructure? Berke Durak
@ 2008-03-05 17:16 ` Brian Hurt
  2008-03-05 17:27 ` Alain Frisch
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Brian Hurt @ 2008-03-05 17:16 UTC (permalink / raw)
  To: Berke Durak; +Cc: Caml-list List

Berke Durak wrote:

> The Map and Set modules use AVL trees which are efficient but not 
> canonical - a given
> set of elements can have more than one representation.  This means 
> that you cannot use
> ad hoc comparison on sets and maps, and this is why they are presented 
> as functors.

However, as you can walk the tree in O(N), it's still possible to do 
set/map compare in O(N) worst case.  All this means is that 
Pervasives.compare is not equivalent to Set.compare.

>
> Does anyone know if, in the many years that have passed since the 
> implementation of
> those fine modules, someone has invented a (functional) datastructure 
> that is as
> efficient while being canonic?

The preserves "fast" (O(log N)) insert and removal?  No.  If you're 
willing to accept O(N) insert/removal cost, simple sorted lists work.  
I'm not sure if it's possible to have both fast insert/removal and a 
canonical form.

Brian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-05 16:49 Canonical Set/Map datastructure? Berke Durak
  2008-03-05 17:16 ` [Caml-list] " Brian Hurt
@ 2008-03-05 17:27 ` Alain Frisch
  2008-03-05 19:53   ` Jean-Christophe Filliâtre
  2008-03-05 20:03   ` Jon Harrop
  2008-03-05 17:34 ` Harrison, John R
  2008-03-06  9:53 ` Berke Durak
  3 siblings, 2 replies; 13+ messages in thread
From: Alain Frisch @ 2008-03-05 17:27 UTC (permalink / raw)
  To: Berke Durak; +Cc: caml-list

Berke Durak wrote:
> The Map and Set modules use AVL trees which are efficient but not 
> canonical - a given
> set of elements can have more than one representation.  This means that 
> you cannot use
> ad hoc comparison on sets and maps, and this is why they are presented 
> as functors.
> 
> Does anyone know if, in the many years that have passed since the 
> implementation of
> those fine modules, someone has invented a (functional) datastructure 
> that is as
> efficient while being canonic?

Well, Patricia trees have been around for many years and they satisfy 
this property. They also allow set operations (union, intersection, ...) 
in linear time (and I explain below how this can be optimized to 
something which is really efficient for some applications). 
Jean-Christophe Filliâtre has an implementation on its web page.

Patricia trees work fine when the set elements can easily be represented 
as strings of bits. So if you can map your elements to integers, that's 
ok. Otherwise, you can hash-cons your elements to get unique integers 
for them.

Something that Jean-Christophe's implementation doesn't do but which is 
quite easy to add is to use hash-consing on patricia trees themselves, 
that is, to memoize their constructors in order to get unique physical 
representation and maximal sharing. That way, you get:

  structural equality = physical equality = set equality

With this property, set operations on patricia trees can be optimized 
with reflexivity properties (e.g. the inner loop of the union function 
can start by checking equality of its arguments).

Also, you get a nice unique integer for each tree. This allow you to 
memoize efficiently set operations (like union, intersection, for which 
you can use memoization in the inner loop, not only at toplevel), and to 
build sets of sets (and so on).

-- Alain


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [Caml-list] Canonical Set/Map datastructure?
  2008-03-05 16:49 Canonical Set/Map datastructure? Berke Durak
  2008-03-05 17:16 ` [Caml-list] " Brian Hurt
  2008-03-05 17:27 ` Alain Frisch
@ 2008-03-05 17:34 ` Harrison, John R
  2008-03-06  9:53 ` Berke Durak
  3 siblings, 0 replies; 13+ messages in thread
From: Harrison, John R @ 2008-03-05 17:34 UTC (permalink / raw)
  To: Berke Durak, Caml-list List

Hi Berke,

| Does anyone know if, in the many years that have passed since the
| implementation of those fine modules, someone has invented a
| (functional) datastructure that is as efficient while being
| canonic?

I have an implementation, which you're welcome to use/adapt. It's
certainly not particularly well optimized, but I've relied on it
for a few years with no problems. The code is in this file:

  http://www.cl.cam.ac.uk/~jrh13/atp/OCaml/lib.ml

Search for "Diego" and the relevant stuff starts there; a few
earlier functions (like "uniq") are used, but they are easy to
copy or replace.

This was the outcome of a similar question I asked on the OCaml
list a few years ago. Diego Olivier Fernandez Pons not only
pointed me at some interesting theoretical results, but also
suggested an idea using hashes and Patricia trees that works very
well in practice. That's what I implemented. My implementation
is for maps only, but it's easy to adapt for sets. Or you could
just use maps to type "unit" if you're lazy like me.

John.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-05 17:27 ` Alain Frisch
@ 2008-03-05 19:53   ` Jean-Christophe Filliâtre
  2008-03-05 20:03   ` Jon Harrop
  1 sibling, 0 replies; 13+ messages in thread
From: Jean-Christophe Filliâtre @ 2008-03-05 19:53 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Berke Durak, caml-list

Alain Frisch a écrit :
> Something that Jean-Christophe's implementation doesn't do but which is
> quite easy to add is to use hash-consing on patricia trees themselves,

This is a nice idea, thanks; I will eventually add this to my
implementation.

Meanwhile, I can mention that there is also an hash-consing library on
my web page (together with a paper describing the technique) and thus
it should be as ``simple'' as combining the two codes :-)

-- 
Jean-Christophe Filliâtre
http://www.lri.fr/~filliatr/



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-05 17:27 ` Alain Frisch
  2008-03-05 19:53   ` Jean-Christophe Filliâtre
@ 2008-03-05 20:03   ` Jon Harrop
  2008-03-05 21:56     ` Alain Frisch
  2008-03-06  7:45     ` Jean-Christophe Filliâtre
  1 sibling, 2 replies; 13+ messages in thread
From: Jon Harrop @ 2008-03-05 20:03 UTC (permalink / raw)
  To: caml-list

On Wednesday 05 March 2008 17:27:38 Alain Frisch wrote:
> Patricia trees work fine when the set elements can easily be represented
> as strings of bits.
> ...
>   structural equality = physical equality = set equality

This is very interesting. What are the disadvantages? I'm guessing: slow value 
creation for small values and heavy memory consumption.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-05 20:03   ` Jon Harrop
@ 2008-03-05 21:56     ` Alain Frisch
  2008-03-06  7:45     ` Jean-Christophe Filliâtre
  1 sibling, 0 replies; 13+ messages in thread
From: Alain Frisch @ 2008-03-05 21:56 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:
> On Wednesday 05 March 2008 17:27:38 Alain Frisch wrote:
>> Patricia trees work fine when the set elements can easily be represented
>> as strings of bits.
>> ...
>>   structural equality = physical equality = set equality
> 
> This is very interesting. What are the disadvantages? I'm guessing: slow value 
> creation for small values and heavy memory consumption.

I haven't done any serious benchmark comparing Patricia trees and 
OCaml's Set module. For the situations where we're using hash-consed 
Patricia trees, it is simply not an option to use balanced trees (that 
would turn almost linear algorithms into quadratic-or-more ones).

Of course, hash-consing (and memoization of set operations) means extra 
tables and lookups, so there will be some overhead. Last time I checked, 
it was much more efficient to use regular hash tables rather than weak 
ones, but this might have changed since then.

Another disadvantage is that hash-consing usually does not play well 
with marshaling (e.g. you need to manually traverse unmarshaled values 
to restore invariants).

Alain


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-05 20:03   ` Jon Harrop
  2008-03-05 21:56     ` Alain Frisch
@ 2008-03-06  7:45     ` Jean-Christophe Filliâtre
  1 sibling, 0 replies; 13+ messages in thread
From: Jean-Christophe Filliâtre @ 2008-03-06  7:45 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:
> On Wednesday 05 March 2008 17:27:38 Alain Frisch wrote:
>> Patricia trees work fine when the set elements can easily be represented
>> as strings of bits.
>> ...
>>   structural equality = physical equality = set equality
> 
> This is very interesting. What are the disadvantages? I'm guessing: slow value 
> creation for small values and heavy memory consumption.

Regarding memory consumpion, it is true that hash-consing tables are
using memory, but sharing structurally equal values may also save some
space; see for instance the benchmarks on page 4 of
http://www.lri.fr/~filliatr/ftp/publis/hash-consing2.ps

-- 
Jean-Christophe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-05 16:49 Canonical Set/Map datastructure? Berke Durak
                   ` (2 preceding siblings ...)
  2008-03-05 17:34 ` Harrison, John R
@ 2008-03-06  9:53 ` Berke Durak
  2008-03-06 17:36   ` Harrison, John R
  2008-03-07 10:19   ` Alain Frisch
  3 siblings, 2 replies; 13+ messages in thread
From: Berke Durak @ 2008-03-06  9:53 UTC (permalink / raw)
  To: Berke Durak; +Cc: Caml-list List

Berke Durak a écrit :
> The Map and Set modules use AVL trees which are efficient but not 
> canonical - a given
> set of elements can have more than one representation.  This means that 
> you cannot use
> ad hoc comparison on sets and maps, and this is why they are presented 
> as functors.
> 
> Does anyone know if, in the many years that have passed since the 
> implementation of
> those fine modules, someone has invented a (functional) datastructure 
> that is as
> efficient while being canonic?
> 
> That would permit polymorphic set and map modules that work correctly on 
> sets of sets, for
> instance.  Of course, the order induced on sets by the adhoc comparison 
> doesn't have to
> be a useful order; just being a good order would suffice.

Thanks for all your replies.

I did not know that Patricia trees were canonical.

However, the idea of combining hash-consing and Patricia trees, however
elegant, does not suit my problem.  Basically, you are cheating by using
an auxiliary data structure, the hashtable (which is also O(n^2) worst-case).

As I was improving my IO combinator library with sets and maps, the structures
need to be self-contained, and not need a description as a bitstring (which
could be done by using Marshal.to_string but I don't think the performance
would be there).  Maybe some wizardry relying on the physical representation
of objects would permit storage of arbitrary values in Patricia trees, but I
remain skeptical.
-- 
Berke DURAK


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [Caml-list] Canonical Set/Map datastructure?
  2008-03-06  9:53 ` Berke Durak
@ 2008-03-06 17:36   ` Harrison, John R
  2008-03-07 10:09     ` Berke Durak
  2008-03-07 10:19   ` Alain Frisch
  1 sibling, 1 reply; 13+ messages in thread
From: Harrison, John R @ 2008-03-06 17:36 UTC (permalink / raw)
  To: Berke Durak; +Cc: Caml-list List

Hi Berke,

| However, the idea of combining hash-consing and Patricia trees,
| however elegant, does not suit my problem.  Basically, you are
| cheating by using an auxiliary data structure, the hashtable (which is
| also O(n^2) worst-case).

The code I pointed you to uses hashes, but not hash tables. As for the
worst-case efficiency, what you say is true, but it's unlikely to
matter in practice. And I suspect it may be unavoidable in principle,
which is the interesting thing I learned last time this was discussed.
See for example the following paper and its references:

 "New Tight Bounds on Uniquely Represented Dictionaries"
 Arne Andersson and Thomas Ottmann
 SIAM J. vol 24, pp. 1091-1103, 1995.

The paper is available online as

  http://user.it.uu.se/~arnea/ps/andetigh.ps

Note in particular the following paragraph:

 "Hence we show that there is a huge gap between unique and non-unique
  representations of dictionaries. We feel that these findings point
  to a fundamental fact in the theory of data structures".

| As I was improving my IO combinator library with sets and maps, the
| structures need to be self-contained, and not need a description as a
| bitstring (which could be done by using Marshal.to_string but I don't
| think the performance would be there).  Maybe some wizardry relying on
| the physical representation of objects would permit storage of
| arbitrary values in Patricia trees, but I remain skeptical. --

My code uses OCaml's polymorphic hashes internally, but these do not
appear in the interface and the user doesn't need to supply anything.
Indeed, you may regard OCaml's polymorphic hash as a hack, but it is
available in OCaml for every type just as surely as polymorphic
equality. So I'm not sure I quite understand the nature of your
objection.

John.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-06 17:36   ` Harrison, John R
@ 2008-03-07 10:09     ` Berke Durak
  2008-03-07 17:13       ` Harrison, John R
  0 siblings, 1 reply; 13+ messages in thread
From: Berke Durak @ 2008-03-07 10:09 UTC (permalink / raw)
  To: Harrison, John R; +Cc: Caml-list List

Harrison, John R a écrit :
> Hi Berke,
> 
> | However, the idea of combining hash-consing and Patricia trees,
> | however elegant, does not suit my problem.  Basically, you are
> | cheating by using an auxiliary data structure, the hashtable (which is
> | also O(n^2) worst-case).
> 
> The code I pointed you to uses hashes, but not hash tables. As for the
> worst-case efficiency, what you say is true, but it's unlikely to
> matter in practice. And I suspect it may be unavoidable in principle,
> which is the interesting thing I learned last time this was discussed.
> See for example the following paper and its references:

Hi John,

Thanks for your code.  I'm sorry I thought you maintained a separate hash table.
It's very interesting code and I'll give it a try.

But it seems to have two properties which make it unsuitable for
the particular application I had in mind:

- The theoretical worst-case performance of hash-based data structures
can be attained if the hash function has bad dispersal.  As the code
relies on Hashtbl.hash, which does not hash deeply, this could
potentially be a proble, in particular if your keys have long "common
prefixes" that are not distinguished by the hash function.  It might
work well in practice but I feel a little uncomfortable.

- Also, it does not preserve the natural order for keys, and that
is particularly bad, because I often use, for instance, float-indexed
maps or sets as a substitute for heaps.

The paper with the inverse cubic lower bound is *very* interesting; we don't
see plausible lower bounds often in complexity theory, especially with such
large assumptions (just bounded out-degree for the graph nodes).

So it seems there is little hope to have a drop-in replacement for Set
or Map that is compatible with the natural order of things, a.k.a.,
Pervasives.compare.

-- 
Berke DURAK


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Caml-list] Canonical Set/Map datastructure?
  2008-03-06  9:53 ` Berke Durak
  2008-03-06 17:36   ` Harrison, John R
@ 2008-03-07 10:19   ` Alain Frisch
  1 sibling, 0 replies; 13+ messages in thread
From: Alain Frisch @ 2008-03-07 10:19 UTC (permalink / raw)
  To: Berke Durak; +Cc: caml-list

Berke Durak wrote:
> However, the idea of combining hash-consing and Patricia trees, however
> elegant, does not suit my problem.  Basically, you are cheating by using
> an auxiliary data structure, the hashtable (which is also O(n^2) 
> worst-case).

You can also implement hash-consing with the Map module (or a 
polymorphic version of it) to avoid the worst case complexity. But yes, 
you need an extra data structure.

-- Alain


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [Caml-list] Canonical Set/Map datastructure?
  2008-03-07 10:09     ` Berke Durak
@ 2008-03-07 17:13       ` Harrison, John R
  0 siblings, 0 replies; 13+ messages in thread
From: Harrison, John R @ 2008-03-07 17:13 UTC (permalink / raw)
  To: Berke Durak; +Cc: Caml-list List

[-- Attachment #1: Type: text/plain, Size: 2308 bytes --]

Hi Berke,

| Thanks for your code.  I'm sorry I thought you maintained a separate
| hash table. It's very interesting code and I'll give it a try.

It would in fact have been more efficient to use Jean-Christophe
Filliatre's implementation of Patricia trees instead of writing my
own. However, it was important to me to release the code under a
BSD-like license. For what it's worth, I attach a version of my code
that's extracted from its context so it can be used independently. (I
replaced a few of my pet functions with standard library ones, and I
hope I didn't screw anything up in the process.)

| - The theoretical worst-case performance of hash-based data structures
| can be attained if the hash function has bad dispersal.  As the code
| relies on Hashtbl.hash, which does not hash deeply, this could
| potentially be a proble, in particular if your keys have long "common
| prefixes" that are not distinguished by the hash function.  It might
| work well in practice but I feel a little uncomfortable.

Yes, sure. My applications are mainly in theorem proving and symbolic
computation where this isn't an issue, and I can imagine it might not
be suitable elsewhere.

| - Also, it does not preserve the natural order for keys, and that
| is particularly bad, because I often use, for instance,
| float-indexed maps or sets as a substitute for heaps.

When I was looking at this area last time (maybe just following the
references from the paper I cited) I came across a kind of mixed
heap/tree structure with distinct "horizontal" and "vertical"
orderings. That might be something to consider. But once again there
is a bad worst-case performance if the two orders happen to be
correlated.

| The paper with the inverse cubic lower bound is *very* interesting; we
| don't see plausible lower bounds often in complexity theory,
| especially with such large assumptions (just bounded out-degree for
| the graph nodes).
|
| So it seems there is little hope to have a drop-in replacement for Set
| or Map that is compatible with the natural order of things, a.k.a.,
| Pervasives.compare.

Yes, I really found it striking that in a fundamental sense, you need
to pay the price of noncanonicality in order to get the guaranteed
O(log n) lookup.

John.


[-- Attachment #2: map.ml --]
[-- Type: application/octet-stream, Size: 10766 bytes --]

(* ------------------------------------------------------------------------- *)
(* Polymorphic finite partial functions via Patricia trees.                  *)
(*                                                                           *)
(* The point of this strange representation is that it is canonical (equal   *)
(* functions have the same encoding) yet reasonably efficient on average.    *)
(*                                                                           *)
(*                  (c) John Harrison 2003.                                  *)
(*                                                                           *)
(*    See http://www.cl.cam.ac.uk/~jrh13/atp/OCaml/LICENSE.txt               *)
(*                                                                           *)
(* Idea due to Diego Olivier Fernandez Pons (OCaml list, 2003/11/10).        *)
(* ------------------------------------------------------------------------- *)

type ('a,'b)func =
   Empty
 | Leaf of int * ('a*'b)list
 | Branch of int * int * ('a,'b)func * ('a,'b)func;;

(* ------------------------------------------------------------------------- *)
(* Undefined function.                                                       *)
(* ------------------------------------------------------------------------- *)

let undefined = Empty;;

(* ------------------------------------------------------------------------- *)
(* In case of equality comparison worries, better use this.                  *)
(* ------------------------------------------------------------------------- *)

let is_undefined f =
  match f with
    Empty -> true
  | _ -> false;;

(* ------------------------------------------------------------------------- *)
(* Operation analogous to "map" for lists.                                   *)
(* ------------------------------------------------------------------------- *)

let mapf =
  let rec map_list f l =
    match l with
      [] -> []
    | (x,y)::t -> (x,f(y))::(map_list f t) in
  let rec mapf f t =
    match t with
      Empty -> Empty
    | Leaf(h,l) -> Leaf(h,map_list f l)
    | Branch(p,b,l,r) -> Branch(p,b,mapf f l,mapf f r) in
  mapf;;

(* ------------------------------------------------------------------------- *)
(* Operations analogous to "fold" for lists.                                 *)
(* ------------------------------------------------------------------------- *)

let foldl =
  let rec foldl_list f a l =
    match l with
      [] -> a
    | (x,y)::t -> foldl_list f (f a x y) t in
  let rec foldl f a t =
    match t with
      Empty -> a
    | Leaf(h,l) -> foldl_list f a l
    | Branch(p,b,l,r) -> foldl f (foldl f a l) r in
  foldl;;

let foldr =
  let rec foldr_list f l a =
    match l with
      [] -> a
    | (x,y)::t -> f x y (foldr_list f t a) in
  let rec foldr f t a =
    match t with
      Empty -> a
    | Leaf(h,l) -> foldr_list f l a
    | Branch(p,b,l,r) -> foldr f l (foldr f r a) in
  foldr;;

(* ------------------------------------------------------------------------- *)
(* Application.                                                              *)
(* ------------------------------------------------------------------------- *)

let applyd =
  let rec apply_listd l d x =
    match l with
      (a,b)::t -> let c = Pervasives.compare x a in
                  if c = 0 then b else if c > 0 then apply_listd t d x else d x
    | [] -> d x in
  fun f d x ->
    let k = Hashtbl.hash x in
    let rec look t =
      match t with
        Leaf(h,l) when h = k -> apply_listd l d x
      | Branch(p,b,l,r) when (k lxor p) land (b - 1) = 0
                -> look (if k land b = 0 then l else r)
      | _ -> d x in
    look f;;

let apply f = applyd f (fun x -> failwith "apply");;

let tryapplyd f a d = applyd f (fun x -> d) a;;

let tryapplyl f x = tryapplyd f x [];;

let defined f x = try apply f x; true with Failure _ -> false;;

(* ------------------------------------------------------------------------- *)
(* Undefinition.                                                             *)
(* ------------------------------------------------------------------------- *)

let undefine =
  let rec undefine_list x l =
    match l with
      (a,b as ab)::t ->
          let c = Pervasives.compare x a in
          if c = 0 then t
          else if c < 0 then l else
          let t' = undefine_list x t in
          if t' == t then l else ab::t'
    | [] -> [] in
  fun x ->
    let k = Hashtbl.hash x in
    let rec und t =
      match t with
        Leaf(h,l) when h = k ->
          let l' = undefine_list x l in
          if l' == l then t
          else if l' = [] then Empty
          else Leaf(h,l')
      | Branch(p,b,l,r) when k land (b - 1) = p ->
          if k land b = 0 then
            let l' = und l in
            if l' == l then t
            else if is_undefined l' then r
            else Branch(p,b,l',r)
          else
            let r' = und r in
            if r' == r then t
            else if is_undefined r' then l
            else Branch(p,b,l,r')
      | _ -> t in
    und;;

(* ------------------------------------------------------------------------- *)
(* Redefinition and combination.                                             *)
(* ------------------------------------------------------------------------- *)

let (|->),combine =
  let ldb x y = let z = x lxor y in z land (-z) in
  let newbranch p1 t1 p2 t2 =
    let b = ldb p1 p2 in
    let p = p1 land (b - 1) in
    if p1 land b = 0 then Branch(p,b,t1,t2)
    else Branch(p,b,t2,t1) in
  let rec define_list (x,y as xy) l =
    match l with
      (a,b as ab)::t ->
          let c = Pervasives.compare x a in
          if c = 0 then xy::t
          else if c < 0 then xy::l
          else ab::(define_list xy t)
    | [] -> [xy]
  and combine_list op z l1 l2 =
    match (l1,l2) with
      [],_ -> l2
    | _,[] -> l1
    | ((x1,y1 as xy1)::t1,(x2,y2 as xy2)::t2) ->
          let c = Pervasives.compare x1 x2 in
          if c < 0 then xy1::(combine_list op z t1 l2)
          else if c > 0 then xy2::(combine_list op z l1 t2) else
          let y = op y1 y2 and l = combine_list op z t1 t2 in
          if z(y) then l else (x1,y)::l in
  let (|->) x y =
    let k = Hashtbl.hash x in
    let rec upd t =
      match t with
        Empty -> Leaf (k,[x,y])
      | Leaf(h,l) ->
           if h = k then Leaf(h,define_list (x,y) l)
           else newbranch h t k (Leaf(k,[x,y]))
      | Branch(p,b,l,r) ->
          if k land (b - 1) <> p then newbranch p t k (Leaf(k,[x,y]))
          else if k land b = 0 then Branch(p,b,upd l,r)
          else Branch(p,b,l,upd r) in
    upd in
  let rec combine op z t1 t2 =
    match (t1,t2) with
      Empty,_ -> t2
    | _,Empty -> t1
    | Leaf(h1,l1),Leaf(h2,l2) ->
          if h1 = h2 then
            let l = combine_list op z l1 l2 in
            if l = [] then Empty else Leaf(h1,l)
          else newbranch h1 t1 h2 t2
    | (Leaf(k,lis) as lf),(Branch(p,b,l,r) as br) ->
          if k land (b - 1) = p then
            if k land b = 0 then
              let l' = combine op z lf l in
              if is_undefined l' then r else Branch(p,b,l',r)
            else
              let r' = combine op z lf r in
              if is_undefined r' then l else Branch(p,b,l,r')
          else
            newbranch k lf p br
    | (Branch(p,b,l,r) as br),(Leaf(k,lis) as lf) ->
          if k land (b - 1) = p then
            if k land b = 0 then
              let l' = combine op z l lf in
              if is_undefined l' then r else Branch(p,b,l',r)
            else
              let r' = combine op z r lf in
              if is_undefined r' then l else Branch(p,b,l,r')
          else
            newbranch p br k lf
    | Branch(p1,b1,l1,r1),Branch(p2,b2,l2,r2) ->
          if b1 < b2 then
            if p2 land (b1 - 1) <> p1 then newbranch p1 t1 p2 t2
            else if p2 land b1 = 0 then
              let l = combine op z l1 t2 in
              if is_undefined l then r1 else Branch(p1,b1,l,r1)
            else
              let r = combine op z r1 t2 in
              if is_undefined r then l1 else Branch(p1,b1,l1,r)
          else if b2 < b1 then
            if p1 land (b2 - 1) <> p2 then newbranch p1 t1 p2 t2
            else if p1 land b2 = 0 then
              let l = combine op z t1 l2 in
              if is_undefined l then r2 else Branch(p2,b2,l,r2)
            else
              let r = combine op z t1 r2 in
              if is_undefined r then l2 else Branch(p2,b2,l2,r)
          else if p1 = p2 then
            let l = combine op z l1 l2 and r = combine op z r1 r2 in
            if is_undefined l then r
            else if is_undefined r then l else Branch(p1,b1,l,r)
          else
            newbranch p1 t1 p2 t2 in
  (|->),combine;;

(* ------------------------------------------------------------------------- *)
(* Special case of point function.                                           *)
(* ------------------------------------------------------------------------- *)

let (|=>) = fun x y -> (x |-> y) undefined;;

(* ------------------------------------------------------------------------- *)
(* Grab an arbitrary element.                                                *)
(* ------------------------------------------------------------------------- *)

let rec choose t =
  match t with
    Empty -> failwith "choose: completely undefined function"
  | Leaf(h,l) -> List.hd l
  | Branch(b,p,t1,t2) -> choose t1;;

(* ------------------------------------------------------------------------- *)
(* Mapping to sorted-list representation of the graph, domain and range.     *)
(* ------------------------------------------------------------------------- *)

let rec uniq l =
  match l with
    x::(y::_ as t) -> let t' = uniq t in
                      if Pervasives.compare x y = 0 then t' else
                      if t'==t then l else x::t'
 | _ -> l;;

let setify l = uniq(List.sort Pervasives.compare l);;

let graph f = setify (foldl (fun a x y -> (x,y)::a) [] f);;

let dom f = setify(foldl (fun a x y -> x::a) [] f);;

let ran f = setify(foldl (fun a x y -> y::a) [] f);;

(* ------------------------------------------------------------------------- *)
(* Idiom for a mapping zipping domain and range lists.                       *)
(* ------------------------------------------------------------------------- *)

let fpf xs ys = List.fold_right2 (|->) xs ys undefined;;

(* ------------------------------------------------------------------------- *)
(* Install a (trivial) printer for finite partial functions.                 *)
(* ------------------------------------------------------------------------- *)

let print_fpf (f:('a,'b)func) = print_string "<func>";;

#install_printer print_fpf;;

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-03-07 17:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-03-05 16:49 Canonical Set/Map datastructure? Berke Durak
2008-03-05 17:16 ` [Caml-list] " Brian Hurt
2008-03-05 17:27 ` Alain Frisch
2008-03-05 19:53   ` Jean-Christophe Filliâtre
2008-03-05 20:03   ` Jon Harrop
2008-03-05 21:56     ` Alain Frisch
2008-03-06  7:45     ` Jean-Christophe Filliâtre
2008-03-05 17:34 ` Harrison, John R
2008-03-06  9:53 ` Berke Durak
2008-03-06 17:36   ` Harrison, John R
2008-03-07 10:09     ` Berke Durak
2008-03-07 17:13       ` Harrison, John R
2008-03-07 10:19   ` Alain Frisch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).