sets

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

To: caml-redistribution@margaux
Subject: sets
Date: Thu, 6 May 1993 10:22:36 +0200 (MET DST)	[thread overview]
Message-ID: <9370.732388392@margaux.inria.fr> (raw)

This is a very long message about sets. Read the abstract and skip
it, if it does not interest you.

Abstract:
  Sets are a very useful data structure, but are very difficult to
  implement. The basic problem is that ML polymorphism is not adequate
  to a proper representation of sets. It's not a problem about modules,
  but about polymorphism. We need a more powerful type system to get
  really efficient sets.

As Xavier pointed out, a naive implementation based on lists is very
easy to implement. Its drawback is evidently efficiency (O(n^2) for
basic operations union and intersection). But as far as I know, it is
the only one which is absolutely compatible with ML polymorphism: you
get a truly polymorphic empty set and a truly polymorphic function to
add an element to a set.

If we agree that a library implementing sets must be very efficient or
useless, then the problem becomes very hard. The reason is that there
exists many clever ways to represent sets, each one adapted to a particular
usage of sets and to different underlying type. (I guess we want to
implement finite and homogeneous sets over any ML type.) Consider for
instance sets based on an enumerated type: if this type has less than,
say 32 or 31 constant constructors, you have to represent sets over
this type as integers so that union and intersection cost only one
machine instruction. If the enumerated type has more than 32 constant
constructors, you must represent sets on this type as strings
(considered as bit vectors) in order to get a still very fast
implementation of union and intersection. If the underlying type gets
non constant constructors, things become even worse: it can be the
case that you have to stick to the representation of sets as string,
if there are not many elements of the underlying type used in the sets
manipulated by the program, otherwise you must adopt a tree-like
representation (balanced tree or even lists in some cases). You may
choose even more complex representation, such that a mixt of bit
vector and balanced trees. Well, my claim is that efficient set
implementation requires various representations of sets coexisting at
the same time in the program, either a single implementation for each
basic universe (underlying type), or even better a dynamically
modified scheme of representation for the sets of a given universe.

This is very difficult to handle in ML, since data representation
should be uniform to be compatible with polymorphism (and we want
polymorphic sets, don't we?). My own opinion is that this is not a
question of modules, but a problem of typechecking system of the core
language. This is analoguous to I/Os and printing. There exists some
situations where you need to write functions which are able to run
differently according to the type of their arguments or results
(polymorphic write and read as in Pascal). The problem is clear when
you imagine to offer a polymorphic map_set function in the set
library: map_set is the higher-order function on sets, equivalent to
map on lists. Thus map_set has type:
 For all 'a, 'b. ('a -> 'b) -> 'a set -> 'b set.
The ``object oriented'' approach of Damien or Didier is the trivial 
solution that comes in mind in the first 5 minutes when you imagine to
implement sets. Since it took 5 minutes of Damien's or Didier's time
to be found, it's a very clever idea (I'm not flaming).
Thus, this method works very fine when you design functions which
get as arguments every sets involved in the problem: they dynamically asked to
these arguments ``Hey guy, what's your real implementation?'', get the
answer and work accordingly. This method (not in the OO sense!)
completely fails for map_set: you have to ask to the function what can
be the representation of a set built with elements of its co-domain
type! We don't want to add this ``method'' to each closure, for the
sole reason that we want to implement sets.

Well you can say: I don't mind map_set. You can, but that's not a good
way to deal with the problem: the problem is a real one, it will not desappear
like that. What about singleton : 'a -> 'a set ? or empty : 'a set ?
There are other examples of this scheme in the field of primitives
over ``maps'' (in the set litterature sense).

We need a type-system to treat this cases, and in particular
polymorphic printing (for user's convenience and debugging) and input
(I want a safe function read : unit -> 'a).

Catherine Dubois and I have are working since now 1.5 year about set
implementation in Caml. She knows very well set implementation, and I
know the Caml system. Thus, we got very efficient schemes to represent
sets, and all was fine, until we faced the map_set problem. We then
turned to the design of a new type-system for Caml integrating parametric
polymorphism and a new one which allows the IMPLEMENTATION of map_set.
Thanks to this new type-system we can write the map_set primitive.
Without it, even the most dirty tricks (including OO features), those
which cannot be told in manuals, failed to implement the map_set
function (providing you keep it its natural and required type).
For the time being, we are writing a paper (article or research report) on
this type-system. Wait and see ...

Pierre Weis
----------------------------------------------------------------------------
Formel Project
INRIA, BP 105, F-78153 Le Chesnay Cedex (France)
E-mail: Pierre.Weis@inria.fr
Telephone: +33 1 39 63 55 98
----------------------------------------------------------------------------

next             reply	other threads:[~1993-05-06  8:22 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1993-05-06  8:22  [this message]
1993-05-06 19:08 sets Xavier Leroy
1993-05-07  7:31 ` sets Pierre Weis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9370.732388392@margaux.inria.fr \
    --to=caml-redistribution@margaux \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).