From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <edwintorok@gmail.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82])
	by yquem.inria.fr (Postfix) with ESMTP id 4B66CBC57
	for <caml-list@yquem.inria.fr>; Sun, 12 Dec 2010 16:55:47 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ArgAANd/BE3RVaEzimdsb2JhbACDXKArCBUBAQEKCQwHDwYlpViJNDyCGINtLohWAQEDBYEcgzV0BI51
X-IronPort-AV: E=Sophos;i="4.59,332,1288566000"; 
   d="scan'208";a="92060716"
Received: from mail-fx0-f51.google.com ([209.85.161.51])
  by mail1-smtp-roc.national.inria.fr with ESMTP; 12 Dec 2010 16:55:32 +0100
Received: by fxm5 with SMTP id 5so4913088fxm.10
        for <caml-list@inria.fr>; Sun, 12 Dec 2010 07:55:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:received:received:date:from:to:cc:subject
         :message-id:in-reply-to:references:x-mailer:mime-version
         :content-type:content-transfer-encoding;
        bh=4gl9hlPAlynynFzcUIPNEARoz3ZO6IBrUpJErpL6JLw=;
        b=fs/y3OwtJxDKk4lUXEejtCWhsbUysJJ7GFWwQad7b15br8+E3GrJSNQgdk7VUTbMh2
         fMkdyejJsxxS4Sbe6EdogmvzPUaf9LjyyDrEkkdZBoHYQDvWaUOUvQ/EkncMiT3LPJhj
         j7jJJ2XvRGWA4ZimsKnSzuHG7mABHvqarHSxc=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=date:from:to:cc:subject:message-id:in-reply-to:references:x-mailer
         :mime-version:content-type:content-transfer-encoding;
        b=eqhdA9s8bs1M145SqYcNnzGFSZqIjbl/cPbSWlwbcq6VLsB6/4u72VYc+lzWeOsiQE
         /kxoIWeAoV2C6gPCP1hg6N4Davt6+y1Ir8XBmgEj7AMYMMg1xbez06NB4td5N5EjC4yI
         xQBV2iH3lJBibq6cF+xwEmFR80Ii3VczPXHwE=
Received: by 10.223.97.75 with SMTP id k11mr3274607fan.85.1292169331776;
        Sun, 12 Dec 2010 07:55:31 -0800 (PST)
Received: from deb0 ([79.114.96.209])
        by mx.google.com with ESMTPS id e6sm1420304fav.32.2010.12.12.07.55.30
        (version=SSLv3 cipher=RC4-MD5);
        Sun, 12 Dec 2010 07:55:31 -0800 (PST)
Date: Sun, 12 Dec 2010 17:55:24 +0200
From: =?UTF-8?B?VMO2csO2aw==?= Edwin <edwintorok@gmail.com>
To: "Jon Harrop" <jon@ffconsultancy.com>
Cc: <caml-list@inria.fr>
Subject: Re: Value types (Was: [Caml-list] ocamlopt LLVM support)
Message-ID: <20101212175524.73a8e285@deb0>
In-Reply-To: <036001cb9a0c$725acef0$57106cd0$@com>
References: <036001cb9a0c$725acef0$57106cd0$@com>
X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Spam: no; 0.00; ocamlopt:01 haskell:01 ocaml:01 compiler:01 compiler:01 'int:01 'int:01 ocamlopt:01 gcc:01 ocaml:01 unboxing:01 unboxing:01 unboxed:01 edwin:98 offenders:98 

On Sun, 12 Dec 2010 14:54:14 -0000
"Jon Harrop" <jon@ffconsultancy.com> wrote:

> The Haskell guys got their best performance improvement moving to
> LLVM from the hailstone benchmark so it is interesting to examine
> this case as well. I just added support for 64-bit ints to HLVM to
> implement that benchmark and my code is:
>=20
> Here=E2=80=99s the equivalent OCaml code:
>=20
>   let rec collatzLen(c, n) : int =3D
>     if n =3D 1L then c else
>       collatzLen (c+1, if Int64.rem n 2L =3D 0L then Int64.div n 2L else

OK, but boxing itself has nothing to do with the performance degration
here. It is the lack of compiler optimizations on the Int64 type. This
could be solved by implementing compiler optimizations for it (or
manually optimizing some integer arithmetic that is slow).

Lets run the code under a profiler, or look at the assembly (I used
'perf record' and 'perf report'). 2 'idiv' instructions turn up as top
offenders in the profile.

Problem #1: Int64.rem n 2 -> another idiv instruction

A C compiler would optimize this to an 'and' instruction.
Change that to 'Int64.logand n 1L =3D 0L'/

Problem #2: Int64.div n 2 -> idiv instruction.=20

A C compiler would optimize this to a right shift. Changing that to 'Int64.=
shift_right n 1' speeds
up the code.

With these changes I get almost the same speed as the C code:
$ ocamlopt x.ml -o x && time ./x
837799
real    0m0.664s
user    0m0.667s
sys     0m0.000s

$ gcc -O3 x.c && time ./a.out
837799
real    0m0.635s
user    0m0.633s
sys     0m0.000s

Here's the OCaml code:
let rec collatzLen(c, n) : int =3D
    if n =3D 1L then c else
      collatzLen (c+1, if Int64.logand n 1L =3D 0L then Int64.shift_right
n 1 else Int64.add (Int64.mul 3L n) 1L);;

  let rec loop(i, (nlen, n)) =3D
    if i =3D 1L then n else
      let ilen =3D collatzLen(1, i) in
      let nlen, n =3D if ilen > nlen then ilen, i else nlen, n in
      loop (Int64.sub i 1L, (nlen, n));;

  let _ =3D
      let s =3D loop (1000000L, (1,1000000L)) in
      print_int (Int64.to_int s);;

> 1. Unboxing can give huge performance improvements on serial code,

s/Unboxing/arithmetic optimizations/
Please find an example where the performance benefit is due to
unboxing, and not due to arithmetic optimizations performed on the
unboxed code.

> let alone parallel code. The optimized HLVM is running 32=C3=97 faster
> than the OCaml here.
>=20
> 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it
> to beat GCC-compiled C code here.
>=20

One advantage of using LLVM is that it would notice arithmetic
optimizations like this and perform it itself (even if you use the
boxed representation).

Best regards,
--Edwin