caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Set union
@ 2005-02-24  9:20 Jon Harrop
  2005-02-25 10:56 ` [Caml-list] " Radu Grigore
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Harrop @ 2005-02-24  9:20 UTC (permalink / raw)
  To: caml-list


Following my last post about the speed of set union in OCaml compared to 
C++/STL. What is the complexity of set union in OCaml in terms of the number 
of elements and the number of comparisons?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://ffconsultancy.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Set union
  2005-02-24  9:20 Set union Jon Harrop
@ 2005-02-25 10:56 ` Radu Grigore
  2005-02-25 17:30   ` Jon Harrop
  0 siblings, 1 reply; 9+ messages in thread
From: Radu Grigore @ 2005-02-25 10:56 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Thu, 24 Feb 2005 09:20:04 +0000, Jon Harrop <jon@jdh30.plus.com> wrote:
> 
> Following my last post about the speed of set union in OCaml compared to
> C++/STL. What is the complexity of set union in OCaml in terms of the number
> of elements and the number of comparisons?

As no one gave an informed answer I'll risk a hand-waiving one.

When all elements of a set are bigger than all elements of the other
set the Set.union function seems to simply add the elements of the
small set to the bigger set one at a time. So the complexity looks
like O(m lg n) where m is the size of the smaller set. For other cases
the process is a bit more complex: take the root of the short tree,
split the large tree into smaller/larger elements based on that root,
compute union of "small" trees, "compute union of "large" trees",
merge them. If I'm not mistaken this is O(m lg n) too.

-- 
regards,
 radu
http://rgrig.blogspot.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Set union
  2005-02-25 10:56 ` [Caml-list] " Radu Grigore
@ 2005-02-25 17:30   ` Jon Harrop
  2005-02-25 17:48     ` Xavier Leroy
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Harrop @ 2005-02-25 17:30 UTC (permalink / raw)
  To: caml-list

On Friday 25 February 2005 10:56, Radu Grigore wrote:
> When all elements of a set are bigger than all elements of the other
> set the Set.union function seems to simply add the elements of the
> small set to the bigger set one at a time. So the complexity looks
> like O(m lg n) where m is the size of the smaller set. For other cases
> the process is a bit more complex: take the root of the short tree,
> split the large tree into smaller/larger elements based on that root,
> compute union of "small" trees, "compute union of "large" trees",
> merge them. If I'm not mistaken this is O(m lg n) too.

Yes, my guess was O(n ln(n+N)) because the last insertions from the smaller 
set may be into a set with n+N elements. In contrast, the STL set_union 
function is T(n+N), which explains why it is so slow.

Now, what about best case behaviour: In the case of the union of two equal 
height, distinct sets, is OCaml's union T(1)?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://ffconsultancy.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Set union
  2005-02-25 17:30   ` Jon Harrop
@ 2005-02-25 17:48     ` Xavier Leroy
  2005-02-25 19:47       ` [Caml-list] Complexity of Set.union Jon Harrop
  0 siblings, 1 reply; 9+ messages in thread
From: Xavier Leroy @ 2005-02-25 17:48 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

[ Complexity of Set.union ]

> > For other cases
> > the process is a bit more complex: take the root of the short tree,
> > split the large tree into smaller/larger elements based on that root,
> > compute union of "small" trees, "compute union of "large" trees",
> > merge them. If I'm not mistaken this is O(m lg n) too.

My hope is that union takes time O(N log N) where N is the sum of the
numbers of elements in the two sets.  I'm thoroughly unable to prove
that, though, in particular because the complexity of the "split"
operation is unclear to me.

This bound is "reasonable", however, in that the trivial union
algorithm (repeatedly add every element of one of the sets to the
other one) achieves this bound, and the trick with "joining" is,
intuitively, just an optimization of this trivial algorithm.

> Now, what about best case behaviour: In the case of the union of two equal 
> height, distinct sets, is OCaml's union T(1)?

Did you mean "of two equal height sets such that all elements of the
first set are smaller than all elements of the second set"?  That
could indeed run in constant time (just join the two sets with a
"Node" constructor), but I doubt the current implementation achieves
this because of the repeated splitting.

- Xavier Leroy


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Complexity of Set.union
  2005-02-25 17:48     ` Xavier Leroy
@ 2005-02-25 19:47       ` Jon Harrop
  2005-02-25 21:50         ` Radu Grigore
  0 siblings, 1 reply; 9+ messages in thread
From: Jon Harrop @ 2005-02-25 19:47 UTC (permalink / raw)
  To: caml-list

On Friday 25 February 2005 17:48, Xavier Leroy wrote:
> My hope is that union takes time O(N log N) where N is the sum of the
> numbers of elements in the two sets.  I'm thoroughly unable to prove
> that, though, in particular because the complexity of the "split"
> operation is unclear to me.

Am I correct in thinking that your derivation of this assumes roughly 
equal-sized sets and that your complexity could be tightened a bit by using 
the two different set cardinalities explicitly?

I ask this because the STL set_union is probably O(n+N) (inserting an already 
sorted range into a set is apparently linear) which is worse than the O((n+N) 
log(n+N)) which you've suggested for OCaml.

But my OCaml code is vastly faster, so OCaml's complexity seems to be 
significantly better than that. At least in the special case of one small and 
one large set, which my code is bound by. Specifically, the sets have O(1) 
and O(i^2) elements when looking for the "i"th nearest neighbour. In reality 
this corresponds to computing the unions of sets containing 4 elements with 
sets containing 10^4 elements.

Hmm, now that I come to think of it, my performance measurements have all been 
specific to silicon (that's where the 4 comes from). I'll try retiming on 
some other atomic structures, where the small sets will contain about 12 
elements. I predict the OCaml code will do better relative to the C++ code, 
because the smaller sets won't be so small...

> This bound is "reasonable", however, in that the trivial union
> algorithm (repeatedly add every element of one of the sets to the
> other one) achieves this bound, and the trick with "joining" is,
> intuitively, just an optimization of this trivial algorithm.

I see. This could be improved in the unsymmetric case, by adding elements from 
the smaller set to the larger set. But the size of the set isn't stored so 
you'd have to make do with adding elements from the shallower set to the 
deeper set. I've no idea what the complexity of that would be...

As I know which of the two sets will be the larger and which will be the 
smaller, I'll try a customized union function which folds Set.add over the 
smaller set.

> > Now, what about best case behaviour: In the case of the union of two
> > equal height, distinct sets, is OCaml's union T(1)?
>
> Did you mean "of two equal height sets such that all elements of the
> first set are smaller than all elements of the second set"?

Yes, that's what I meant. :-)

> That 
> could indeed run in constant time (just join the two sets with a
> "Node" constructor), but I doubt the current implementation achieves
> this because of the repeated splitting.

Having said that, wouldn't it take the Set.union function O(log n + log N) 
time to prove that the inputs are non-overlapping, because it would have to 
traverse to the min/max elements of both sets?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://ffconsultancy.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Complexity of Set.union
  2005-02-25 19:47       ` [Caml-list] Complexity of Set.union Jon Harrop
@ 2005-02-25 21:50         ` Radu Grigore
  2005-02-25 21:52           ` Radu Grigore
                             ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Radu Grigore @ 2005-02-25 21:50 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Fri, 25 Feb 2005 19:47:45 +0000, Jon Harrop <jon@jdh30.plus.com> wrote:

> I ask this because the STL set_union is probably O(n+N) (inserting an already
> sorted range into a set is apparently linear) which is worse than the O((n+N)
> log(n+N)) which you've suggested for OCaml.

The complexity of set_union is indeed O(n+N), see [0]. It is basically
a merge of sorted _sequences_ [1]. I assume n is the size of the small
set and N is the size of the small set and the heights are h=O(lg n),
H=O(lg N). With this the complexity of Set.union is more like O(n
lg(n+N)), at least when all elements in one set are smaller than the
elements of the other set.

> I see. This could be improved in the unsymmetric case, by adding elements from
> the smaller set to the larger set. But the size of the set isn't stored so
> you'd have to make do with adding elements from the shallower set to the
> deeper set. I've no idea what the complexity of that would be...

That is how it works now. As Xavier said the trickiest part is split.

> > Did you mean "of two equal height sets such that all elements of the
> > first set are smaller than all elements of the second set"?
> 
> Yes, that's what I meant. :-)

In that case the current Set.union simply adds elements repeatedly
from the set with small height to the set with big height.

> > That
> > could indeed run in constant time (just join the two sets with a
> > "Node" constructor), but I doubt the current implementation achieves
> > this because of the repeated splitting.

What splitting? I see none in this case.

> Having said that, wouldn't it take the Set.union function O(log n + log N)
> time to prove that the inputs are non-overlapping, because it would have to
> traverse to the min/max elements of both sets?

I agree. Also, such a check looks ugly to me (for a standard library).

-- 
regards,
 radu
http://rgrig.blogspot.com/

[0] http://library.n0i.net/programming/c/cp-iso/lib-algorithms.html#lib.set.union
[1] http://rgrig.blogspot.com/2004/11/merging-lists.html


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Complexity of Set.union
  2005-02-25 21:50         ` Radu Grigore
@ 2005-02-25 21:52           ` Radu Grigore
  2005-02-25 22:31           ` Radu Grigore
  2005-02-25 22:36           ` Jon Harrop
  2 siblings, 0 replies; 9+ messages in thread
From: Radu Grigore @ 2005-02-25 21:52 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Fri, 25 Feb 2005 23:50:25 +0200, Radu Grigore <radugrigore@gmail.com> wrote:
> > > That
> > > could indeed run in constant time (just join the two sets with a
> > > "Node" constructor), but I doubt the current implementation achieves
> > > this because of the repeated splitting.
> 
> What splitting? I see none in this case.

30s later and I see it :)

-- 
regards,
 radu
http://rgrig.blogspot.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Complexity of Set.union
  2005-02-25 21:50         ` Radu Grigore
  2005-02-25 21:52           ` Radu Grigore
@ 2005-02-25 22:31           ` Radu Grigore
  2005-02-25 22:36           ` Jon Harrop
  2 siblings, 0 replies; 9+ messages in thread
From: Radu Grigore @ 2005-02-25 22:31 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Fri, 25 Feb 2005 23:50:25 +0200, Radu Grigore <radugrigore@gmail.com> wrote:
> The complexity of set_union is indeed O(n+N), see [0]. 

Ah, BTW, if you do something like
  set_union(..., inserter(s));
where s is a set then inserter has logarithmic complexity so overall
you get O((n+N) lg(n+N)).

-- 
regards,
 radu
http://rgrig.blogspot.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Caml-list] Complexity of Set.union
  2005-02-25 21:50         ` Radu Grigore
  2005-02-25 21:52           ` Radu Grigore
  2005-02-25 22:31           ` Radu Grigore
@ 2005-02-25 22:36           ` Jon Harrop
  2 siblings, 0 replies; 9+ messages in thread
From: Jon Harrop @ 2005-02-25 22:36 UTC (permalink / raw)
  To: caml-list

On Friday 25 February 2005 21:50, Radu Grigore wrote:
> > > Did you mean "of two equal height sets such that all elements of the
> > > first set are smaller than all elements of the second set"?
> >
> > Yes, that's what I meant. :-)
>
> In that case the current Set.union simply adds elements repeatedly
> from the set with small height to the set with big height.

Yes, I agree with everything you said.

> > Having said that, wouldn't it take the Set.union function O(log n + log
> > N) time to prove that the inputs are non-overlapping, because it would
> > have to traverse to the min/max elements of both sets?
>
> I agree. Also, such a check looks ugly to me (for a standard library).

I think I agree here too. Assuming that such special cases cannot be handled 
without incurring a performance cost (which I'm not sure about), I'd 
definitely go for the simplest code.

Regarding Don's question about what an imperative programmer would have to do 
to write efficient code. My interpretation is now that they would have to 
work out which of the data structures they could destroy. Then they would 
have to convert the purely functional implementation into one which performs 
in-place operations.

This is fairly easy in the case of my "nth" program because the main function 
"accumulates" its result and, therefore, this accumulator can be overwritten. 
I can imagine it quickly becoming a headache though. For example, if there 
were any non-trivial branches in the middle of the function.

Whilst writing the C++ implementation I remembered how I first became 
interested in functional programming - I noticed that I could only write 
correct C++ programs when I littered everything with "const". :-)

I'll post my detailed performance analysis when its done...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://ffconsultancy.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2005-02-25 22:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-02-24  9:20 Set union Jon Harrop
2005-02-25 10:56 ` [Caml-list] " Radu Grigore
2005-02-25 17:30   ` Jon Harrop
2005-02-25 17:48     ` Xavier Leroy
2005-02-25 19:47       ` [Caml-list] Complexity of Set.union Jon Harrop
2005-02-25 21:50         ` Radu Grigore
2005-02-25 21:52           ` Radu Grigore
2005-02-25 22:31           ` Radu Grigore
2005-02-25 22:36           ` Jon Harrop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).