From mboxrd@z Thu Jan 1 00:00:00 1970 Received: by margaux.inria.fr, Fri, 7 May 93 09:12:01 +0200 Received: from concorde.inria.fr by margaux.inria.fr, Thu, 6 May 93 21:08:10 +0200 Received: from Tamuz.Stanford.EDU by concorde.inria.fr; Thu, 6 May 1993 21:08:33 +0200 Received: by Tamuz.Stanford.EDU (5.61/25-theory-eef) id AA04126; Thu, 6 May 93 12:08:26 -0700 From: Xavier Leroy Message-Id: <9305061908.AA04126@Tamuz.Stanford.EDU> Subject: Re: sets To: caml-list@margaux Date: Thu, 6 May 1993 12:08:25 -0700 (PDT) Content-Type: text/plain Sender: weis@margaux In reply to Pierre Weis' message: > As Xavier pointed out, a naive implementation based on lists is very > easy to implement. Its drawback is evidently efficiency (O(n^2) for > basic operations union and intersection). But as far as I know, it is > the only one which is absolutely compatible with ML polymorphism: you > get a truly polymorphic empty set and a truly polymorphic function to > add an element to a set. I disagree with this point. The list-based implementation of sets assumes that you're either satisfied with the existing polymorphic equality primitive, or able to define a reasonable polymorphic equality function incorporating user-defined equality functions at some types. In this case, it is easy to extend the polymorphic equality function to a polymorphic comparison function (just do some kind of lexicographic ordering over records and sums). Then, we can have balanced tree implementations of sets, with operations in O(log n) or O(n log n) instead of O(n) and O(n^2). This implementation would be as "truly polymorphic" as the one based on lists. The problem with both lists and trees is: we don't know how to build a good polymorphic equality/ordering function. > If we agree that a library implementing sets must be very efficient or > useless, then the problem becomes very hard. The reason is that there > exists many clever ways to represent sets, each one adapted to a particular > usage of sets and to different underlying type. You set out to do something very ambitious here: define a polymorphic data structure with non-uniform representations. Sets are not the only data structure that could benefit from this technique: for instance, arrays of characters and arrays of floating-point numbers could be represented more compactly than generic arrays. But I still need to be convinced we need to go that far. Generic arrays are a little bit wasteful, but this seems tolerable in practice. Similarly, I think generic sets implemented as balanced trees would also be tolerable in many situations. Granted, the union of two binary trees is never going to be as fast as an "or" operations between two integers. Do we really need this efficiency? In SETL, the answer is certainly "yes", because sets are the main data structure. But this is not so in ML. - Xavier