caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] strange timing between search trees
Date: Fri, 25 Jan 2008 02:28:07 +0000	[thread overview]
Message-ID: <200801250228.07262.jon@ffconsultancy.com> (raw)
In-Reply-To: <47993C41.1050900@univ-savoie.fr>

On Friday 25 January 2008 01:32:49 Christophe Raffalli wrote:
> Dear list members,
>
> I wanted to compare 2-3-4 trees (look in wikipedia if you do not know
> them) with the balanced trees of the standard library.
> I expected the 2-3-4 to be much faster for search because they use much
> less indirections. However, I thought
> that construction should make little difference ...
>
> I was wrong :
> Construction is arround 20% faster for 2-3-4 trees
> Search is slower arround 5-10% for 2-3-4 trees (the diff gets smaller
> when the trees are larger which is expected)
>
> I wonder if the difference in code size is the explanation (the search
> function for balanced trees is really small and fits better
> in cache) ?

I doubt it. You can make the built-in Set module much faster by increasing the 
code size.

> I attach my code with two files (the code and the test, compile with
> ocamlopt unix.cmxa set234.ml test234tree.ml)
>
> Any remarks or comments ?

I get quantitatively similar performance results. You're missing a lot of 
information in your benchmarking though:

What happens if the sets are constructed in-order (affects locality)?

What happens if you iterate the benchmark over an array of preallocated sets?

I did lots of benchmarking along these lines for the optimization chapter in 
OCaml for Scientists (including benchmarking sets). I found that there are 
several alternative set implementations out there but they all give almost 
identical performance (as you're observing). However, the AVL trees of the 
built-in sets provide asymptotically more efficient set-theoretic operations 
(union, diff, inter) than most other implementations.

I also found that the benchmarking strategy more representative of the real 
code that I had was to iterate over preallocate sets, i.e. more cache misses.

Ultimately, I found it much more productive to simply optimize the Set module 
that comes with OCaml. I'll describe how in a later OCaml Journal article but 
the main ideas are to get the comparison function inlined (OCaml doesn't 
inline across functor boundaries) and add a new node type for leaves. The 
latter greatly reduces the stress on the GC, which is the main performance 
bottleneck of the current implementation, and I found it to be up to 30% 
faster.

We've also discussed this in detail before here so you might find more 
information in the caml-list archives.

HTH.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


  reply	other threads:[~2008-01-25  2:34 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-25  1:32 Christophe Raffalli
2008-01-25  2:28 ` Jon Harrop [this message]
2008-01-25 18:12   ` [Caml-list] " Christophe Raffalli
2008-01-25  2:31 ` Jon Harrop
2008-01-30 17:50 ` Christophe Raffalli
2008-01-30 18:10   ` Edgar Friendly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200801250228.07262.jon@ffconsultancy.com \
    --to=jon@ffconsultancy.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).