caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Efficient and canonical set representation?
@ 2003-11-06 16:41 Harrison, John R
  2003-11-06 17:04 ` Brian Hurt
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Harrison, John R @ 2003-11-06 16:41 UTC (permalink / raw)
  To: caml-list; +Cc: Harrison, John R

Does anyone know a representation of finite sets over an orderable polymorphic type
that's (1) efficient and (2) canonical? Even better would be a CAML or OCaml
implementation. More precisely I'm looking for:

  1. Log-time lookup and insertion, and linear-time union, intersection etc.

  2. Equal sets are represented by the same object.

For example, ordered lists satisfy (2) but only part of (1), while all the variants
of balanced trees I can remember that satisfy (1) --- AVL trees etc. --- fail (2).

Thanks,

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-07 14:15 Harrison, John R
  0 siblings, 0 replies; 17+ messages in thread
From: Harrison, John R @ 2003-11-07 14:15 UTC (permalink / raw)
  To: erayo, caml-list; +Cc: Harrison, John R

| You basically want O(1) for set equality, I suppose.

Actually, no --- perhaps I should have made clearer what I *really* want.
The efficiency of comparison wasn't my motivation, but rather elegance
and aesthetics. And I meant "canonical" with respect to ordinary
structural equality, not necessarily pointer equality, so the problem is
potentially a bit easier than you might have thought.

I want to be able to treat an abstract type in a truly abstract way,
and not worry about special-purpose equality relations on certain types.
Otherwise it's an ugly mess dealing with complicated nestings like sets
of pairs of lists of sets.

Now, I think the right solution, conceptually speaking, is to allow
user-defined equality on abstract types. But as far as I know this cannot
be done in OCaml, and I've never met much enthusiasm for the idea among
the CAML or SML experts.

So a poor second best is to define abstract types in a canonical way, 
which was the starting-point of my question.

After your remarks and Brian's, I'm starting to wonder if it is possible
at all to do what I want. Maybe I should be looking for an impossibility
proof instead...

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-07 15:27 Fred Smith
  2003-11-07 15:44 ` Samuel Lacas
  0 siblings, 1 reply; 17+ messages in thread
From: Fred Smith @ 2003-11-07 15:27 UTC (permalink / raw)
  To: Harrison, John R, erayo, caml-list


I guess what you're looking for are sorted arrays:
  1) O(log n) lookup and insertion via binary search
  2) O(n) union and intersection are simple
  3) Equal sets are represented by structurally equivalent objects.

-Fred

> -----Original Message-----
> From: owner-caml-list@pauillac.inria.fr 
> [mailto:owner-caml-list@pauillac.inria.fr] On Behalf Of 
> Harrison, John R
> Sent: Friday, November 07, 2003 9:16 AM
> To: erayo@cs.bilkent.edu.tr; caml-list@inria.fr
> Cc: Harrison, John R
> Subject: RE: [Caml-list] Efficient and canonical set representation?
> 
> 
> | You basically want O(1) for set equality, I suppose.
> 
> Actually, no --- perhaps I should have made clearer what I 
> *really* want. The efficiency of comparison wasn't my 
> motivation, but rather elegance and aesthetics. And I meant 
> "canonical" with respect to ordinary structural equality, not 
> necessarily pointer equality, so the problem is potentially a 
> bit easier than you might have thought.
> 
> I want to be able to treat an abstract type in a truly 
> abstract way, and not worry about special-purpose equality 
> relations on certain types. Otherwise it's an ugly mess 
> dealing with complicated nestings like sets of pairs of lists of sets.
> 
> Now, I think the right solution, conceptually speaking, is to 
> allow user-defined equality on abstract types. But as far as 
> I know this cannot be done in OCaml, and I've never met much 
> enthusiasm for the idea among the CAML or SML experts.
> 
> So a poor second best is to define abstract types in a canonical way, 
> which was the starting-point of my question.
> 
> After your remarks and Brian's, I'm starting to wonder if it 
> is possible at all to do what I want. Maybe I should be 
> looking for an impossibility proof instead...
> 
> John.
> 
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: 
http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs
FAQ: http://caml.inria.fr/FAQ/ Beginner's list:
http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-07 17:27 Fred Smith
  2003-11-10 13:24 ` Diego Olivier Fernandez Pons
  0 siblings, 1 reply; 17+ messages in thread
From: Fred Smith @ 2003-11-07 17:27 UTC (permalink / raw)
  To: Samuel Lacas, caml-list


Ocaml has a structural comparison function (<=) : 'a -> 'a -> bool.  

But the bigger problem with this approach (as Jean-Baptiste Rouquier) kindly pointed out to me is that insertion takes O(n), not O(log(n)) time. Duh.

-Fred


> -----Original Message-----
> From: owner-caml-list@pauillac.inria.fr 
> [mailto:owner-caml-list@pauillac.inria.fr] On Behalf Of Samuel Lacas
> Sent: Friday, November 07, 2003 10:44 AM
> To: caml-list@inria.fr
> Subject: Re: [Caml-list] Efficient and canonical set representation?
> 
> 
> Fred Smith a écrit 2.2K le Fri, Nov 07, 2003 at 10:27:25AM -0500: # 
> # I guess what you're looking for are sorted arrays:
> #   1) O(log n) lookup and insertion via binary search
> #   2) O(n) union and intersection are simple
> #   3) Equal sets are represented by structurally equivalent objects.
> # 
> # -Fred
> 
> Hmm, except that, if I'm not wrong, it was required the 
> structure to hold any kind of object. Sorted arrays require 
> the elements to be sortable. Using the hash of the objects 
> may be an answer ?
> 
> sL
> 
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: 
http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-12  0:20 Harrison, John R
  2003-11-12  2:04 ` Brian Hurt
  2003-11-12 16:16 ` Diego Olivier Fernandez Pons
  0 siblings, 2 replies; 17+ messages in thread
From: Harrison, John R @ 2003-11-12  0:20 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list, Harrison, John R

That seems to be the best suggestion so far. I guess it would work well
in practice. But theoretically it still doesn't give O(log n) lookup
and insertion without the kinds of assumptions you noted about the
distribution of elements w.r.t. the hash function. And relying on
polymorphic hashing seems a bit of a hack.

So I still can't help wondering if there's an elegant solution with the
desired worst-case behaviour, preferably relying only on pairwise
comparison. Is it just a coincidence that the numerous varieties of
balanced tree (AVL, 2-3-4, red-black, ...) all seem to be non-canonical?
Or is it essential to their efficiency? (Perhaps this is a question for
another forum.)

John.

-----Original Message-----
From: owner-caml-list@pauillac.inria.fr
[mailto:owner-caml-list@pauillac.inria.fr]On Behalf Of Diego Olivier
Fernandez Pons
Sent: Monday, November 10, 2003 5:25 AM
To: Harrison, John R
Cc: caml-list@inria.fr
Subject: RE: [Caml-list] Efficient and canonical set representation?


    Bonjour,

> After your remarks and Brian's, I'm starting to wonder if it is
> possible at all to do what I want. Maybe I should be looking for an
> impossibility proof instead...

Patricia sets seem to be what you are looking for.
 (1). Efficient usual operations (lookup, insertion, union)
 (2). Structural equality

Their only problem is that they cannot handle polymorphic orderable
types but only integers...

Hash the data, use this key to insert it in a patricia map and solve
the collisions by chaining in an ordered list (with the polymorphic
[compare] function). (1) and (2) still hold under usual hypothesis on
the rate of collisions.

A few changes to JCF's implementation should be enough.

        Diego Olivier



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-12  3:34 Harrison, John R
  2003-11-12  7:50 ` Brian Hurt
  0 siblings, 1 reply; 17+ messages in thread
From: Harrison, John R @ 2003-11-12  3:34 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Ocaml Mailing List, Harrison, John R

| I've been batting around ideas for ways to do balanced trees so that no 
| matter what order you add things, you always get the same tree.  But even 
| assuming you could do this, doing a structural compare is still O(N).  So 
| you might as well let the trees be different.

Right, but see my second message --- I'm only interested in canonicity
up to structural equality and I'm happy with O(N) comparison. So it's
just the "no matter what order you add things you get the same tree"
property that I care about. But it's not yet obvious to me whether I
can even achieve that much.

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread
* RE: [Caml-list] Efficient and canonical set representation?
@ 2003-11-12 17:18 Harrison, John R
  0 siblings, 0 replies; 17+ messages in thread
From: Harrison, John R @ 2003-11-12 17:18 UTC (permalink / raw)
  To: Diego Olivier Fernandez Pons; +Cc: caml-list, Harrison, John R

| It is not a coincidence. I remember having read somewhere that it
| could be proven that if a balanced tree (based on comparisons) did not
| have enough different representations of a same set, then the
| insertion could not be done in logarithmic time.

Interesting. I was beginning to suspect that might be the case. Does
anyone have an exact reference for such a result?

John.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2003-11-12 17:18 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-06 16:41 [Caml-list] Efficient and canonical set representation? Harrison, John R
2003-11-06 17:04 ` Brian Hurt
2003-11-07  3:43 ` Eray Ozkural
2003-11-07  3:52 ` Eray Ozkural
2003-11-07 14:15 Harrison, John R
2003-11-07 15:27 Fred Smith
2003-11-07 15:44 ` Samuel Lacas
2003-11-08 16:50   ` Eray Ozkural
2003-11-07 17:27 Fred Smith
2003-11-10 13:24 ` Diego Olivier Fernandez Pons
2003-11-10 19:28   ` Julien Signoles
2003-11-12  0:20 Harrison, John R
2003-11-12  2:04 ` Brian Hurt
2003-11-12 16:16 ` Diego Olivier Fernandez Pons
2003-11-12  3:34 Harrison, John R
2003-11-12  7:50 ` Brian Hurt
2003-11-12 17:18 Harrison, John R

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).