Optimising: div and mod on AMD64

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: Jon Harrop <jon@ffconsultancy.com>
To: caml-list@yquem.inria.fr
Subject: Optimising: div and mod on AMD64
Date: Wed, 30 Nov 2005 04:29:56 +0000	[thread overview]
Message-ID: <200511300429.56862.jon@ffconsultancy.com> (raw)

I've spent a little time playing around with my 19-line Sudoku program:

  http://www.ffconsultancy.com/free/sudoku

Algorithmic optimisations are clearly the way forward from my brute force 
approach. However, I am curious about low-level aspects of the performance of 
the existing implementation in various ways.

Note that 85-90% of the time is spent in the "invalid" function:

let rec invalid ?(i=0) x y n =
  i<9 && (m.(y).[i] = n || m.(i).[x] = n ||
      m.(y/3*3 + i/3).[x/3*3 + i mod 3] = n || invalid ~i:(i+1) x y n)

This is a simple recursive function. You can improve its performance a bit by 
getting rid of the optional argument:

let rec invalid i x y n =
  i<9 && (m.(y).[i] = n || m.(i).[x] = n ||
      m.(y/3*3 + i/3).[x/3*3 + i mod 3] = n || invalid (i+1) x y n)

The C++ equivalent of this function (including bounds checking) is:

bool invalid(int i, int x, int y, int n) {
  return i<9 && (m.at(y).at(i) == n || m.at(i).at(x) == n ||
                 m.at(y/3*3 + i/3).at(x/3*3 + i%3) == n ||
                 invalid(i+1, x, y, n));
}

When switching from 32-bit x86 to 64-bit AMD64, the C++ becomes slightly 
faster but the OCaml becomes much slower, to the extent that the OCaml is 
4.8x slower than the C++ on AMD64:

Platform  x86 (900MHz) AMD64 (800MHz)
OCaml     3.716        6.209
C++       1.351        1.284

After quite a bit of playing I found out some interesting things. Allocation 
is very slow, but the "invalid" function does no allocation and incurs no GC. 
Recursion can be slow. Unrolling the invalid function makes little difference 
to performance. Removing bounds checking only improves performance by 6% on 
both platforms.

To my surprise, it seems that the vast majority of the time in the OCaml 
implementation is spent computing x/3*3, i/3 and i mod 3. This is surprising 
because these operations are "normally" very fast, e.g. in the C++.

Getting rid of integer div and mod in the "invalid" function by hoisting the 
computation of x/3*3 out of "invalid" and then out of the fold, and recursing 
over ix=[0,3) and iy=[0,3) rather than i=[0,9) gives:

let rec invalid ix iy fx fy x y n =
  if ix=3 then invalid 0 (iy+1) fx fy x y n else
    iy<3 && (m.(y).[ix + 3*iy] = n || m.(ix + 3*iy).[x] = n ||
	m.(fy + iy).[fx + ix] = n || invalid (ix+1) iy fx fy x y n)

Greatly improves performance, particularly on AMD64:

OCaml     1.946        1.308

Hoisting x/3*3 out of the fold (which isn't even the inner loop!) in the 
"search" function almost doubles the performance of the whole program:

	let fx = x/3*3 and fy = y/3*3 in
        fold (fun accu n ->
                let n = Char.chr (n + 48) in
                if invalid 0 0 fx fy x y n then accu else
                  (m.(y).[x] <- n;
                   let accu = search (x+1) y f accu in
                   m.(y).[x] <- '0';
                   accu)) accu 1 10

From my mediocre knowledge of x86 and AMD64 assembler, it looks as though 
ocamlopt is generating naive integer divisions and modulos, even when the 
divisors are small, constant integers. In contrast, g++ is converting these 
operations into multiplications by constants, shifts and 
addition/subtraction.

I am surprised that this makes such a big difference but, assuming I'm right, 
may we have optimised integer arithmetic for constant divisors please?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists

                 reply	other threads:[~2005-11-30  4:35 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200511300429.56862.jon@ffconsultancy.com \
    --to=jon@ffconsultancy.com \
    --cc=caml-list@yquem.inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).