Re: sets - Xavier Leroy

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Xavier Leroy <xavier@Theory.Stanford.EDU>
To: caml-list@margaux
Subject: Re: sets
Date: Thu, 6 May 1993 12:08:25 -0700 (PDT)	[thread overview]
Message-ID: <9305061908.AA04126@Tamuz.Stanford.EDU> (raw)

In reply to Pierre Weis' message:

> As Xavier pointed out, a naive implementation based on lists is very
> easy to implement. Its drawback is evidently efficiency (O(n^2) for
> basic operations union and intersection). But as far as I know, it is
> the only one which is absolutely compatible with ML polymorphism: you
> get a truly polymorphic empty set and a truly polymorphic function to
> add an element to a set.

I disagree with this point. The list-based implementation of sets
assumes that you're either satisfied with the existing polymorphic
equality primitive, or able to define a reasonable polymorphic
equality function incorporating user-defined equality functions at
some types. In this case, it is easy to extend the polymorphic
equality function to a polymorphic comparison function (just do some
kind of lexicographic ordering over records and sums).
Then, we can have balanced tree implementations of sets, with
operations in O(log n) or O(n log n) instead of O(n) and O(n^2). This
implementation would be as "truly polymorphic" as the one based on
lists.

The problem with both lists and trees is: we don't know how to build a
good polymorphic equality/ordering function.

> If we agree that a library implementing sets must be very efficient or
> useless, then the problem becomes very hard. The reason is that there
> exists many clever ways to represent sets, each one adapted to a particular
> usage of sets and to different underlying type.

You set out to do something very ambitious here: define a polymorphic
data structure with non-uniform representations. Sets are not the only
data structure that could benefit from this technique: for instance,
arrays of characters and arrays of floating-point numbers could be
represented more compactly than generic arrays. But I still need to be
convinced we need to go that far. Generic arrays are a little bit
wasteful, but this seems tolerable in practice. Similarly, I think
generic sets implemented as balanced trees would also be tolerable in
many situations. Granted, the union of two binary trees is never going
to be as fast as an "or" operations between two integers. Do we really
need this efficiency? In SETL, the answer is certainly "yes", because
sets are the main data structure. But this is not so in ML.

- Xavier

next             reply	other threads:[~1993-05-07  7:12 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
1993-05-06 19:08 Xavier Leroy [this message]
1993-05-07  7:31 ` sets Pierre Weis
  -- strict thread matches above, loose matches on Subject: below --
1993-05-06  8:22 sets

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9305061908.AA04126@Tamuz.Stanford.EDU \
    --to=xavier@theory.stanford.edu \
    --cc=caml-list@margaux \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).