caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jon Harrop <jon@jdh30.plus.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Complexity of Set.union
Date: Fri, 25 Feb 2005 19:47:45 +0000	[thread overview]
Message-ID: <200502251947.46657.jon@jdh30.plus.com> (raw)
In-Reply-To: <20050225174853.GA25527@yquem.inria.fr>

On Friday 25 February 2005 17:48, Xavier Leroy wrote:
> My hope is that union takes time O(N log N) where N is the sum of the
> numbers of elements in the two sets.  I'm thoroughly unable to prove
> that, though, in particular because the complexity of the "split"
> operation is unclear to me.

Am I correct in thinking that your derivation of this assumes roughly 
equal-sized sets and that your complexity could be tightened a bit by using 
the two different set cardinalities explicitly?

I ask this because the STL set_union is probably O(n+N) (inserting an already 
sorted range into a set is apparently linear) which is worse than the O((n+N) 
log(n+N)) which you've suggested for OCaml.

But my OCaml code is vastly faster, so OCaml's complexity seems to be 
significantly better than that. At least in the special case of one small and 
one large set, which my code is bound by. Specifically, the sets have O(1) 
and O(i^2) elements when looking for the "i"th nearest neighbour. In reality 
this corresponds to computing the unions of sets containing 4 elements with 
sets containing 10^4 elements.

Hmm, now that I come to think of it, my performance measurements have all been 
specific to silicon (that's where the 4 comes from). I'll try retiming on 
some other atomic structures, where the small sets will contain about 12 
elements. I predict the OCaml code will do better relative to the C++ code, 
because the smaller sets won't be so small...

> This bound is "reasonable", however, in that the trivial union
> algorithm (repeatedly add every element of one of the sets to the
> other one) achieves this bound, and the trick with "joining" is,
> intuitively, just an optimization of this trivial algorithm.

I see. This could be improved in the unsymmetric case, by adding elements from 
the smaller set to the larger set. But the size of the set isn't stored so 
you'd have to make do with adding elements from the shallower set to the 
deeper set. I've no idea what the complexity of that would be...

As I know which of the two sets will be the larger and which will be the 
smaller, I'll try a customized union function which folds Set.add over the 
smaller set.

> > Now, what about best case behaviour: In the case of the union of two
> > equal height, distinct sets, is OCaml's union T(1)?
>
> Did you mean "of two equal height sets such that all elements of the
> first set are smaller than all elements of the second set"?

Yes, that's what I meant. :-)

> That 
> could indeed run in constant time (just join the two sets with a
> "Node" constructor), but I doubt the current implementation achieves
> this because of the repeated splitting.

Having said that, wouldn't it take the Set.union function O(log n + log N) 
time to prove that the inputs are non-overlapping, because it would have to 
traverse to the min/max elements of both sets?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://ffconsultancy.com


  reply	other threads:[~2005-02-25 19:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-02-24  9:20 Set union Jon Harrop
2005-02-25 10:56 ` [Caml-list] " Radu Grigore
2005-02-25 17:30   ` Jon Harrop
2005-02-25 17:48     ` Xavier Leroy
2005-02-25 19:47       ` Jon Harrop [this message]
2005-02-25 21:50         ` [Caml-list] Complexity of Set.union Radu Grigore
2005-02-25 21:52           ` Radu Grigore
2005-02-25 22:31           ` Radu Grigore
2005-02-25 22:36           ` Jon Harrop

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200502251947.46657.jon@jdh30.plus.com \
    --to=jon@jdh30.plus.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).