caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map
@ 2008-04-21  9:39 Berke Durak
  2008-04-21 10:59 ` [Caml-list] " Jean-Christophe Filliâtre
  0 siblings, 1 reply; 9+ messages in thread
From: Berke Durak @ 2008-04-21  9:39 UTC (permalink / raw)
  To: Caml-list List

Hello,

I decided to share my modification of the standard Set and Map modules.

Basically, I took Set and Map, unfunctorized them, passing the comparison
function as an optional argument with Pervasives.compare as the default,
added or exposed some missing functions (reverse_fold, reverse_iter, min_key...)
and added converters for Sexplib.

It is under LGPL and feel free to include it in Extlib, in Ocaml "batteries"
or anywhere else.

   http://abaababa.ouvaton.org/caml/xsetxmap-20080421.tar.gz
-- 
Berke DURAK


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map
  2008-04-21  9:39 Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map Berke Durak
@ 2008-04-21 10:59 ` Jean-Christophe Filliâtre
  2008-04-21 11:19   ` Berke Durak
  0 siblings, 1 reply; 9+ messages in thread
From: Jean-Christophe Filliâtre @ 2008-04-21 10:59 UTC (permalink / raw)
  To: Berke Durak; +Cc: Caml-list List

Berke Durak wrote:
> I decided to share my modification of the standard Set and Map modules.
> 
> Basically, I took Set and Map, unfunctorized them, passing the comparison
> function as an optional argument with Pervasives.compare as the default,

Since your library is allowing the user to pass the comparison function
to any function which requires it

  type 'elt cp = 'elt -> 'elt -> int
  val add: ?cp:'elt cp -> 'elt -> 'elt t -> 'elt t
  val mem: ?cp:'elt cp -> 'elt -> 'elt t -> bool
  ...

you should really add a *huge* warning in the documentation saying that
it is unsound to insert an element with a comparison function and then
to search for it using another one.

This is precisely where the functor approach is increasing your program
correctness, with a static binding of a single comparison function into
the code of all operations.

Even without functors, you can provide a safer approach where the
comparison function is given when creating the data structure (a la Caml
light), and stored inside. This can be an optional argument of create
for instance, so that you keep your idea of a default comparison function:

	val create: ?cp:('elt -> 'elt -> int) -> unit -> 'elt t

(Note that I agree with you that it is sometimes annoying that Set and
Map do not provide polymorphic versions using Pervasives.compare, as
Hashtbl does with the polymorphic hash funtion. I'm only concerned with
the potential risk with your solution...)

-- 
Jean-Christophe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map
  2008-04-21 10:59 ` [Caml-list] " Jean-Christophe Filliâtre
@ 2008-04-21 11:19   ` Berke Durak
  2008-04-23 12:35     ` Jon Harrop
  0 siblings, 1 reply; 9+ messages in thread
From: Berke Durak @ 2008-04-21 11:19 UTC (permalink / raw)
  To: Jean-Christophe Filliâtre; +Cc: Caml List

[-- Attachment #1: Type: text/plain, Size: 828 bytes --]

Yes you are right, I should have put a warning.  These modules are for
peculiar cases.
So here it is:

WARNING:  The default comparison function is not sound for non-canonical
datastructures such
as Sets and Maps.  Xset is not canonical.  Do not use Xsets or Xmap without
providing the correct
comparison function.  You must always pass the same comparison function.

(That's why I was asking if there was a Set/Map structure with canonical
representation the other day.)

Extlib provides a version where the comparison function for the keys is
stored in the map.
But I chose to not do this for some peculiar reasons.
  - I'm serializing Xmap and Xsets using Marshal, and I don't want to have
functions in the structure
  - I don't like the overhead of a wrapper structure and I don't want to
complicate the API
-- 
Berke Durak

[-- Attachment #2: Type: text/html, Size: 971 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map
  2008-04-21 11:19   ` Berke Durak
@ 2008-04-23 12:35     ` Jon Harrop
  2008-04-23 13:25       ` Brian Hurt
  2008-04-23 13:41       ` [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions " Berke Durak
  0 siblings, 2 replies; 9+ messages in thread
From: Jon Harrop @ 2008-04-23 12:35 UTC (permalink / raw)
  To: caml-list

On Monday 21 April 2008 12:19:09 Berke Durak wrote:
> Yes you are right, I should have put a warning.  These modules are for
> peculiar cases.

Actually I would say that your style is more useful than the built-in Set and 
Map modules because you don't have to jump through hoops defining your 
own "Int" module with its own "int" type and its own comparison function over 
ints every time you want a set of integers. I would put the comparison 
function in the set itself though.

Your modules cover 99% of use cases with the lowest syntactic overhead 
possible in OCaml. However, you have inherited a couple of design flaws from 
OCaml's standard library:

. The "height" function has been inlined which has no significant effect on 
performance but makes it harder to refactor the code.

. If you special case Node(Empty, v, Empty) to a new Node1(v) type constructor 
then you can improve performance by 30% by alleviating the GC.

As Jean-Christophe says, it is dangerous. However, the whole language is 
already dangerous in this sense because it is so easy to erroneously apply 
polymorphic comparison or equality to the existing sets and maps and get 
silent data corruption. The only solution is to fix the language itself by 
allowing types to override comparison, equality and hashing.

Another inherent problem is the inefficiency of performing comparisons in this 
way in OCaml. Type specialization (e.g. using a JIT) greatly accelerates this 
kind of code by allowing the comparison function (which is almost always 
trivial) to be inlined and optimized.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map
  2008-04-23 12:35     ` Jon Harrop
@ 2008-04-23 13:25       ` Brian Hurt
  2008-04-23 13:56         ` [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib awareversions " David Allsopp
  2008-04-23 13:41       ` [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions " Berke Durak
  1 sibling, 1 reply; 9+ messages in thread
From: Brian Hurt @ 2008-04-23 13:25 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:

>Actually I would say that your style is more useful than the built-in Set and 
>Map modules because you don't have to jump through hoops defining your 
>own "Int" module with its own "int" type and its own comparison function over 
>ints every time you want a set of integers. I would put the comparison 
>function in the set itself though.
>
>  
>
IMHO, the Int module should be in the standard library, and the Set and 
Map modules should have already instantiated sets and maps for the 
standard base types (int, float, string, char).

Also, I'm not as down on functors as a lot of programmers seem to be.  
While not perfect, they solve a number of problems very well.  For 
example, there are a number of operations on sets and maps which are 
O(N) if and only if you know the two trees are in the same order, but 
O(N log N) if you don't know they're in the same order.  Functors lift 
this check into the type system.

Brian


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map
  2008-04-23 12:35     ` Jon Harrop
  2008-04-23 13:25       ` Brian Hurt
@ 2008-04-23 13:41       ` Berke Durak
  1 sibling, 0 replies; 9+ messages in thread
From: Berke Durak @ 2008-04-23 13:41 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:
> On Monday 21 April 2008 12:19:09 Berke Durak wrote:
>> Yes you are right, I should have put a warning.  These modules are for
>> peculiar cases.
> 
> Actually I would say that your style is more useful than the built-in Set and 
> Map modules because you don't have to jump through hoops defining your 
> own "Int" module with its own "int" type and its own comparison function over 
> ints every time you want a set of integers. I would put the comparison 
> function in the set itself though.

Well, thanks.

> Your modules cover 99% of use cases with the lowest syntactic overhead 
> possible in OCaml. However, you have inherited a couple of design flaws from 
> OCaml's standard library:
> 
> . The "height" function has been inlined which has no significant effect on 
> performance but makes it harder to refactor the code.
> 
> . If you special case Node(Empty, v, Empty) to a new Node1(v) type constructor 
> then you can improve performance by 30% by alleviating the GC.

That's a good idea.

> As Jean-Christophe says, it is dangerous.

Have you checked out the second version where I use phantom types to prevent
the more severe cases of data corruption?  Yes, you can still end up with
multiple copies of the same "set", but at least the tree won't get
inconsistent because you mistakenly called add with different comparison functions.
So it's as safe/unsafe as using List.assoc or the generic hash table to
store sets or hash tables.

That's still a problem, but it's not that much worse than with
functors.  After all, nothing prevents you from incorrectly writing

   module StringSet = Set.Make(String)
   module SetSet = Set.Make(struct type t = StringSet.t let compare = compare)

 > However, the whole language is
> already dangerous in this sense because it is so easy to erroneously apply 
> polymorphic comparison or equality to the existing sets and maps and get 
> silent data corruption. The only solution is to fix the language itself by 
> allowing types to override comparison, equality and hashing.

Overriding equality etc. would require run-time type-based dispatch
or implicitly carrying the comparison function (as Haskell apparently does
with type classes) as an extra argument.  Could be expensive.

> Another inherent problem is the inefficiency of performing comparisons in this 
> way in OCaml. Type specialization (e.g. using a JIT) greatly accelerates this 
> kind of code by allowing the comparison function (which is almost always 
> trivial) to be inlined and optimized.

Did anyone try the GNU lightning-based ocamljit in the Ocaml CVS?
-- 
Berke DURAK


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib awareversions of Set and Map
  2008-04-23 13:25       ` Brian Hurt
@ 2008-04-23 13:56         ` David Allsopp
  2008-04-23 14:22           ` Berke Durak
  2008-04-26 14:44           ` Alexandre Pilkiewicz
  0 siblings, 2 replies; 9+ messages in thread
From: David Allsopp @ 2008-04-23 13:56 UTC (permalink / raw)
  To: 'Brian Hurt', 'Jon Harrop'; +Cc: caml-list

Brian Hurt wrote:

> Jon Harrop wrote:
>
> > Actually I would say that your style is more useful than the built-in
> > Set and Map modules because you don't have to jump through hoops
> > defining your own "Int" module with its own "int" type and its own 
> > comparison function over ints every time you want a set of integers. I
> >  would put the comparison function in the set itself though.
> >
> >  
> >
> IMHO, the Int module should be in the standard library, and the Set and 
> Map modules should have already instantiated sets and maps for the 
> standard base types (int, float, string, char).

Agreed - then we could also have more sensibly located functions such as
Int.of_string (note that it's the same length as int_of_string!!) and remove
lots of random functions from Pervasives!

All that said, and especially as StdLib changes are reasonably rare, I find
having files IntSet.ml and IntSet.mli containing:

include Set.Make(struct type t = int let compare = Pervasives.compare end)

and

include Set.S with type elt = int

isn't too bad (except that you have to include IntSet.cmo/.cmx when
compiling, obviously)


David


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib awareversions of Set and Map
  2008-04-23 13:56         ` [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib awareversions " David Allsopp
@ 2008-04-23 14:22           ` Berke Durak
  2008-04-26 14:44           ` Alexandre Pilkiewicz
  1 sibling, 0 replies; 9+ messages in thread
From: Berke Durak @ 2008-04-23 14:22 UTC (permalink / raw)
  To: David Allsopp; +Cc: 'Brian Hurt', 'Jon Harrop', caml-list

David Allsopp wrote:
> Brian Hurt wrote:
> 
>> Jon Harrop wrote:
>>
> 
> Agreed - then we could also have more sensibly located functions such as
> Int.of_string (note that it's the same length as int_of_string!!) and remove
> lots of random functions from Pervasives!

Of course we'll just add an extra Int module and leave Pervasives alone, or at
least make it trigger a warning.

> All that said, and especially as StdLib changes are reasonably rare, I find
> having files IntSet.ml and IntSet.mli containing:
> 
> include Set.Make(struct type t = int let compare = Pervasives.compare end)
> 
> and
> 
> include Set.S with type elt = int
> 
> isn't too bad (except that you have to include IntSet.cmo/.cmx when
> compiling, obviously)
> 
> 

I agree, and someone might write an optimized version for ints or floats.

But Xset & Xmap also provide opportunities to add some missing functions.
In particular, I often use Sets or Maps as a good priority queue/heap substitute, and
reverse_fold / reverse_iter come in handy.

The only problem is that I have an allergy to CamlCaseIdentifiers but I'll
just swallow some antiHistaminics.

-- 
Berke DURAK


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib awareversions of Set and Map
  2008-04-23 13:56         ` [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib awareversions " David Allsopp
  2008-04-23 14:22           ` Berke Durak
@ 2008-04-26 14:44           ` Alexandre Pilkiewicz
  1 sibling, 0 replies; 9+ messages in thread
From: Alexandre Pilkiewicz @ 2008-04-26 14:44 UTC (permalink / raw)
  To: caml-list

Le Wednesday 23 April 2008 15:56:47 David Allsopp, vous avez écrit :
> All that said, and especially as StdLib changes are reasonably rare, I find
> having files IntSet.ml and IntSet.mli containing:
>
> include Set.Make(struct type t = int let compare = Pervasives.compare end)
>
> and
>
> include Set.S with type elt = int

One should probably use an other function than Pervasives.compare in that 
case. Some "benchmarks" show that 100 000 000 uses of comp with different 
definitions lead to some important differences for performance

let comp = compare   
-> 1,54s


let comp (i:int) (j:int) = compare i j
-> 0.7s


let comp (i:int) (j:int) = 
  if i = j then 0 else if i < j then -1 else 1
-> 0.18s

-- 
Alexandre Pilkiewicz


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2008-04-26 14:44 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-04-21  9:39 Announce: xsetxmap, unfunctorized, Sexp-lib aware versions of Set and Map Berke Durak
2008-04-21 10:59 ` [Caml-list] " Jean-Christophe Filliâtre
2008-04-21 11:19   ` Berke Durak
2008-04-23 12:35     ` Jon Harrop
2008-04-23 13:25       ` Brian Hurt
2008-04-23 13:56         ` [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib awareversions " David Allsopp
2008-04-23 14:22           ` Berke Durak
2008-04-26 14:44           ` Alexandre Pilkiewicz
2008-04-23 13:41       ` [Caml-list] Announce: xsetxmap, unfunctorized, Sexp-lib aware versions " Berke Durak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).