Performance of threaded interpreter on hyper-threaded CPU

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Michel Schinz <Michel.Schinz@epfl.ch>
To: caml-list@inria.fr
Subject: Performance of threaded interpreter on hyper-threaded CPU
Date: Tue, 18 Apr 2006 10:04:54 +0200	[thread overview]
Message-ID: <m2psjfpavt.fsf@torre.local> (raw)

Hi,

In order to get an idea of the speedup provided by a threaded code
interpreter, I've been comparing the performance of the switch-based
OCaml interpreter with the one which uses threaded code.

On some architectures, threaded code provides a speedup which matches
my expectations (around 20%), but on a hyper-threaded Pentium IV, I
actually get a massive slowdown: the threaded code interpreter is more
than two times slower than the switch-based one! I just wanted to
present my results here, since it seems that threaded code might not
always be the fastest option. I'd also be interested in knowing why
threaded code is so much slower in some cases, provided of course that
my results are not flawed.

My testing methodology and result are described below.

To get the two versions of the OCaml interpreter, I uncompress
ocaml-3.09.1 in two separate directories, and in one of them I simply
delete the line which defines THREADED_CODE in byterun/config.h. I
also add the following lines at the very beginning of ocaml_interprete
in byterun/interp.c (in both directories):

#ifdef THREADED_CODE
  fprintf(stderr, "threaded code\n");
#else
  fprintf(stderr, "switch-based\n");
#endif

They enable me to be sure that the version which is running is the one
I expect.

Once this is done, I compile the two versions by first launching
configure with a different prefix for both directories, and then
letting "make world install" do its job. When this is complete, I have
two versions of the interpreter, one using threaded code, the other
using a big switch, and my measurements can start.

To perform my measures, I use a small program which computes the
factorial of 5000 using a naive implementation of big integers
(represented as lists of "digits" in base 10000). I compile this
program once then run it with the switch-based interpreter, and then
with the threaded code one. I run the benchmark five times in a row,
and select the lowest time, as given by the "time" command. The
results are summarised in the following table. When the ratio given in
the last column is greater than 1, then threaded code is faster than
the switch-based solution. As you can see, this is only true in my
case for non-hyper-threaded architectures. Concerning the OS, the
first machine runs OS X 10.4.6, while the other ones run various
versions of Linux.

| architecture                      | switch | threaded |   ratio |
|-----------------------------------+--------+----------+---------|
| 1.25 GHzPower PC G4               |   9.04 |     7.24 |  1.2486 |
| 1.70 GHz Pentium 4                |   6.36 |     4.81 |  1.3222 |
| 3.0 GHz Pentium 4, hyper-threaded |   2.51 |     6.13 | 0.40946 |
| dual 3.0 GHz Xeon, hyper-threaded |   3.32 |     3.59 | 0.92479 |

I also measured the time taken by "make world" on the third machine,
and the results confirm that the threaded code interpreter is slower
than the switch-based one. Here are the timings:

  switch-based : 89.53s user, 12.73s system
  threaded code: 114.77s user, 13.03 system
  ratio (sw/th): 0.78

I will gladly provide more information about the various systems used
for testing if anyone is interested.

The small benchmark program I'm using is available there:

http://lamp.epfl.ch/~schinz/bignums.ml

Michel.

next             reply	other threads:[~2006-04-18  8:10 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-04-18  8:04 Michel Schinz [this message]
2006-04-18  8:33 ` [Caml-list] " Xavier Leroy
2006-04-18 10:27   ` Michel Schinz
2006-04-18 11:40     ` [Caml-list] " Till Varoquaux
2006-04-18 11:59       ` Michel Schinz
2006-04-18 12:56     ` Stefan Monnier
2006-04-18 16:18     ` [Caml-list] " Xavier Leroy
2006-04-18 16:42       ` gang chen
2006-04-19  8:24       ` Michel Schinz
2006-04-25 22:52     ` [Caml-list] " Joaquin Cuenca Abela
2006-04-27 11:42       ` Joaquin Cuenca Abela
2006-04-18  8:34 ` [Caml-list] " Christophe TROESTLER
2006-04-18  8:46   ` Jonathan Roewen
2006-04-18  8:57     ` Christophe TROESTLER

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m2psjfpavt.fsf@torre.local \
    --to=michel.schinz@epfl.ch \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).