caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: sets
@ 1993-05-06 19:08 Xavier Leroy
  1993-05-07  7:31 ` sets Pierre Weis
  0 siblings, 1 reply; 3+ messages in thread
From: Xavier Leroy @ 1993-05-06 19:08 UTC (permalink / raw)
  To: caml-list

In reply to Pierre Weis' message:

> As Xavier pointed out, a naive implementation based on lists is very
> easy to implement. Its drawback is evidently efficiency (O(n^2) for
> basic operations union and intersection). But as far as I know, it is
> the only one which is absolutely compatible with ML polymorphism: you
> get a truly polymorphic empty set and a truly polymorphic function to
> add an element to a set.

I disagree with this point. The list-based implementation of sets
assumes that you're either satisfied with the existing polymorphic
equality primitive, or able to define a reasonable polymorphic
equality function incorporating user-defined equality functions at
some types. In this case, it is easy to extend the polymorphic
equality function to a polymorphic comparison function (just do some
kind of lexicographic ordering over records and sums).
Then, we can have balanced tree implementations of sets, with
operations in O(log n) or O(n log n) instead of O(n) and O(n^2). This
implementation would be as "truly polymorphic" as the one based on
lists.

The problem with both lists and trees is: we don't know how to build a
good polymorphic equality/ordering function.

> If we agree that a library implementing sets must be very efficient or
> useless, then the problem becomes very hard. The reason is that there
> exists many clever ways to represent sets, each one adapted to a particular
> usage of sets and to different underlying type.

You set out to do something very ambitious here: define a polymorphic
data structure with non-uniform representations. Sets are not the only
data structure that could benefit from this technique: for instance,
arrays of characters and arrays of floating-point numbers could be
represented more compactly than generic arrays. But I still need to be
convinced we need to go that far. Generic arrays are a little bit
wasteful, but this seems tolerable in practice. Similarly, I think
generic sets implemented as balanced trees would also be tolerable in
many situations. Granted, the union of two binary trees is never going
to be as fast as an "or" operations between two integers. Do we really
need this efficiency? In SETL, the answer is certainly "yes", because
sets are the main data structure. But this is not so in ML.

- Xavier





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: sets
  1993-05-06 19:08 sets Xavier Leroy
@ 1993-05-07  7:31 ` Pierre Weis
  0 siblings, 0 replies; 3+ messages in thread
From: Pierre Weis @ 1993-05-07  7:31 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list

Xavier is right about set implementation using lists: the equality
problem is still there, as it is for trees. I skip on equality because
it is a ``generic'' problem for many other data structure in ML,
including lists (mem and memq) and association lists (assoc versus
assq). This problem is a real one and is not yet solved in ML.

I would like to precise that my preceding message is just some kind of
warning about the difficulty to implement set primitives, due to the
ML type system which forbids generic functions whose behaviour depends
on the type of their arguments.

> Similarly, I think
> generic sets implemented as balanced trees would also be tolerable in
> many situations.
Perfectly right. I think it's worth trying to get this
implementation, even if it is not as smart or as general as we could
imagine. I think your package is very useful, since it's a good
compromise between efficiency and generality, and I would be glad to
try and use it. So, Xavier, don't stop, go on coding for the Caml community!

I was just pointing out that, the same old story arose again with sets
as with lists or association, or I/Os, or printing, or equality, or
whatever needs to be a bit more clever than a good old (parametric)
polymorphic function.

Pierre Weis
----------------------------------------------------------------------------
Formel Project
INRIA, BP 105, F-78153 Le Chesnay Cedex (France)
E-mail: Pierre.Weis@inria.fr
Telephone: +33 1 39 63 55 98
----------------------------------------------------------------------------




^ permalink raw reply	[flat|nested] 3+ messages in thread

* sets
@ 1993-05-06  8:22 
  0 siblings, 0 replies; 3+ messages in thread
From:  @ 1993-05-06  8:22 UTC (permalink / raw)
  To: caml-redistribution

This is a very long message about sets. Read the abstract and skip
it, if it does not interest you.

Abstract:
  Sets are a very useful data structure, but are very difficult to
  implement. The basic problem is that ML polymorphism is not adequate
  to a proper representation of sets. It's not a problem about modules,
  but about polymorphism. We need a more powerful type system to get
  really efficient sets.

As Xavier pointed out, a naive implementation based on lists is very
easy to implement. Its drawback is evidently efficiency (O(n^2) for
basic operations union and intersection). But as far as I know, it is
the only one which is absolutely compatible with ML polymorphism: you
get a truly polymorphic empty set and a truly polymorphic function to
add an element to a set.

If we agree that a library implementing sets must be very efficient or
useless, then the problem becomes very hard. The reason is that there
exists many clever ways to represent sets, each one adapted to a particular
usage of sets and to different underlying type. (I guess we want to
implement finite and homogeneous sets over any ML type.) Consider for
instance sets based on an enumerated type: if this type has less than,
say 32 or 31 constant constructors, you have to represent sets over
this type as integers so that union and intersection cost only one
machine instruction. If the enumerated type has more than 32 constant
constructors, you must represent sets on this type as strings
(considered as bit vectors) in order to get a still very fast
implementation of union and intersection. If the underlying type gets
non constant constructors, things become even worse: it can be the
case that you have to stick to the representation of sets as string,
if there are not many elements of the underlying type used in the sets
manipulated by the program, otherwise you must adopt a tree-like
representation (balanced tree or even lists in some cases). You may
choose even more complex representation, such that a mixt of bit
vector and balanced trees. Well, my claim is that efficient set
implementation requires various representations of sets coexisting at
the same time in the program, either a single implementation for each
basic universe (underlying type), or even better a dynamically
modified scheme of representation for the sets of a given universe.

This is very difficult to handle in ML, since data representation
should be uniform to be compatible with polymorphism (and we want
polymorphic sets, don't we?). My own opinion is that this is not a
question of modules, but a problem of typechecking system of the core
language. This is analoguous to I/Os and printing. There exists some
situations where you need to write functions which are able to run
differently according to the type of their arguments or results
(polymorphic write and read as in Pascal). The problem is clear when
you imagine to offer a polymorphic map_set function in the set
library: map_set is the higher-order function on sets, equivalent to
map on lists. Thus map_set has type:
 For all 'a, 'b. ('a -> 'b) -> 'a set -> 'b set.
The ``object oriented'' approach of Damien or Didier is the trivial 
solution that comes in mind in the first 5 minutes when you imagine to
implement sets. Since it took 5 minutes of Damien's or Didier's time
to be found, it's a very clever idea (I'm not flaming).
Thus, this method works very fine when you design functions which
get as arguments every sets involved in the problem: they dynamically asked to
these arguments ``Hey guy, what's your real implementation?'', get the
answer and work accordingly. This method (not in the OO sense!)
completely fails for map_set: you have to ask to the function what can
be the representation of a set built with elements of its co-domain
type! We don't want to add this ``method'' to each closure, for the
sole reason that we want to implement sets.

Well you can say: I don't mind map_set. You can, but that's not a good
way to deal with the problem: the problem is a real one, it will not desappear
like that. What about singleton : 'a -> 'a set ? or empty : 'a set ?
There are other examples of this scheme in the field of primitives
over ``maps'' (in the set litterature sense).

We need a type-system to treat this cases, and in particular
polymorphic printing (for user's convenience and debugging) and input
(I want a safe function read : unit -> 'a).

Catherine Dubois and I have are working since now 1.5 year about set
implementation in Caml. She knows very well set implementation, and I
know the Caml system. Thus, we got very efficient schemes to represent
sets, and all was fine, until we faced the map_set problem. We then
turned to the design of a new type-system for Caml integrating parametric
polymorphism and a new one which allows the IMPLEMENTATION of map_set.
Thanks to this new type-system we can write the map_set primitive.
Without it, even the most dirty tricks (including OO features), those
which cannot be told in manuals, failed to implement the map_set
function (providing you keep it its natural and required type).
For the time being, we are writing a paper (article or research report) on
this type-system. Wait and see ...

Pierre Weis
----------------------------------------------------------------------------
Formel Project
INRIA, BP 105, F-78153 Le Chesnay Cedex (France)
E-mail: Pierre.Weis@inria.fr
Telephone: +33 1 39 63 55 98
----------------------------------------------------------------------------



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~1993-05-07  7:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1993-05-06 19:08 sets Xavier Leroy
1993-05-07  7:31 ` sets Pierre Weis
  -- strict thread matches above, loose matches on Subject: below --
1993-05-06  8:22 sets 

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).