caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* implementation of set (standard library)
@ 1999-01-26  0:15 Markus Mottl
  1999-02-02 18:07 ` Alexey Nogin
  0 siblings, 1 reply; 3+ messages in thread
From: Markus Mottl @ 1999-01-26  0:15 UTC (permalink / raw)
  To: OCAML

Hello again,

I have taken a look at the implementation of sets in the standard library
and thought that the "add" function could be implemented differently
(possibly faster). Here the alternative code:

  exception Found

  let rec add2 x s =
    let rec add2' = function
      Empty -> Node(Empty, x, Empty, 1)
    | Node(l, v, r, _) as t ->
        let c = Ord.compare x v in
        if c = 0 then raise Found else
        if c < 0 then bal (add2' l) v r else bal l v (add2' r) in
    try add2' s with Found -> s

I have measured the performance of this function:

If about 5% (or more, probably less) of all "adds" meet an already
existing element in the set, this function is faster.

One should also note that the memory requirements are lower: there is
no copying of paths in the case of already existing elements.

This trick could also be applied to "add" in "Map".
Maybe there are other functions where it can be applied.

Two functions I would really like to see in "Set":

  exception Found

  let add_elt x s =
    let el = ref x in
    let rec add_elt' = function
        Empty -> Node(Empty, x, Empty, 1)
      | Node(l, v, r, _) as t ->
          let c = Ord.compare x v in
          if c = 0 then begin el := v; raise Found end else
          if c < 0 then bal (add_elt' l) v r
          else bal l v (add_elt' r)
    in try (add_elt' s, x) with Found -> (s, !el)

  let rec find x = function
      Empty -> raise Not_found
    | Node(l, v, r, _) ->
        let c = Ord.compare x v in
        if c = 0 then v else
        if c < 0 then find x l
        else find x r

The "alternative" add function returns a tuple of the set and the element.
Note: the element could be a (mutable) object. It could be possible that
one wants to insert an object into a set and send further messages to it.
But if the object already exists in the set (or better: the comparison
function tells me so), it would be fine to send the messages to the object
already in the set and to discard the object that would have been added.

At least the "find"-function in "set" would be fine. The reason is the
same as above: one could send messages to an object found in the set.

The only workaround I have at the moment is to use a map instead, where
key and value are the same object -> this allows me to use find. Probably
not very elegant...

A "last wish": an iteration function that respects the order of elements
in set and also map. At the moment there is just the function "elements"
(only in set), which allows to get the elements "in order".  An ordered
insertion function would be very powerful, because one could emulate other
functions like e.g. "fold" with it (with mutable accumulators), too - but
the elements would be used in the computation in the appropriate order...

Best regards,
Markus

-- 
Markus Mottl, mottl@miss.wu-wien.ac.at, http://miss.wu-wien.ac.at/~mottl




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: implementation of set (standard library)
  1999-01-26  0:15 implementation of set (standard library) Markus Mottl
@ 1999-02-02 18:07 ` Alexey Nogin
  1999-02-04  4:06   ` Alexey Nogin
  0 siblings, 1 reply; 3+ messages in thread
From: Alexey Nogin @ 1999-02-02 18:07 UTC (permalink / raw)
  To: Markus Mottl; +Cc: OCAML

Hi,

Markus Mottl wrote:

> I have taken a look at the implementation of sets in the standard library
> and thought that the "add" function could be implemented differently
> (possibly faster).

Our group wrote several implementation of sets - based on splay trees,
red-black trees, etc. Implementation based on the splay trees turned out to
be much faster than the OCAML Set module when the "mem" operation is more
frequent than add/union/remove/inter/diff operations. But on our usage
pattern it turned out that red-black trees are much faster than both Set and
SplaySet. If somebody is interested, I can put the code on ftp.

Alexey
--------------------------------------------------------------
Home Page: http://www.cs.cornell.edu/nogin/
E-Mail: nogin@cs.cornell.edu (office), ayn2@cornell.edu (home)
Office: Upson 4139, tel: 1-607-255-4934
ICQ #: 24708107 (office), 24678341 (home)





^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: implementation of set (standard library)
  1999-02-02 18:07 ` Alexey Nogin
@ 1999-02-04  4:06   ` Alexey Nogin
  0 siblings, 0 replies; 3+ messages in thread
From: Alexey Nogin @ 1999-02-04  4:06 UTC (permalink / raw)
  To: Markus Mottl, OCAML

Alexey Nogin wrote:

> Hi,
>
> Markus Mottl wrote:
>
> > I have taken a look at the implementation of sets in the standard library
> > and thought that the "add" function could be implemented differently
> > (possibly faster).
>
> Our group wrote several implementation of sets - based on splay trees,
> red-black trees, etc. Implementation based on the splay trees turned out to
> be much faster than the OCAML Set module when the "mem" operation is more
> frequent than add/union/remove/inter/diff operations. But on our usage
> pattern it turned out that red-black trees are much faster than both Set and
> SplaySet. If somebody is interested, I can put the code on ftp.

You may get the code from ftp://ftp.cs.cornell.edu/pub/nogin/sets/ (it's not
there yet, but it should appear within an hour).

Alexey
--------------------------------------------------------------
Home Page: http://www.cs.cornell.edu/nogin/
E-Mail: nogin@cs.cornell.edu (office), ayn2@cornell.edu (home)
Office: Upson 4139, tel: 1-607-255-4934
ICQ #: 24708107 (office), 24678341 (home)





^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~1999-02-04  7:54 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-01-26  0:15 implementation of set (standard library) Markus Mottl
1999-02-02 18:07 ` Alexey Nogin
1999-02-04  4:06   ` Alexey Nogin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).