From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id 4B66CBC57 for ; Sun, 12 Dec 2010 16:55:47 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArgAANd/BE3RVaEzimdsb2JhbACDXKArCBUBAQEKCQwHDwYlpViJNDyCGINtLohWAQEDBYEcgzV0BI51 X-IronPort-AV: E=Sophos;i="4.59,332,1288566000"; d="scan'208";a="92060716" Received: from mail-fx0-f51.google.com ([209.85.161.51]) by mail1-smtp-roc.national.inria.fr with ESMTP; 12 Dec 2010 16:55:32 +0100 Received: by fxm5 with SMTP id 5so4913088fxm.10 for ; Sun, 12 Dec 2010 07:55:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=4gl9hlPAlynynFzcUIPNEARoz3ZO6IBrUpJErpL6JLw=; b=fs/y3OwtJxDKk4lUXEejtCWhsbUysJJ7GFWwQad7b15br8+E3GrJSNQgdk7VUTbMh2 fMkdyejJsxxS4Sbe6EdogmvzPUaf9LjyyDrEkkdZBoHYQDvWaUOUvQ/EkncMiT3LPJhj j7jJJ2XvRGWA4ZimsKnSzuHG7mABHvqarHSxc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=eqhdA9s8bs1M145SqYcNnzGFSZqIjbl/cPbSWlwbcq6VLsB6/4u72VYc+lzWeOsiQE /kxoIWeAoV2C6gPCP1hg6N4Davt6+y1Ir8XBmgEj7AMYMMg1xbez06NB4td5N5EjC4yI xQBV2iH3lJBibq6cF+xwEmFR80Ii3VczPXHwE= Received: by 10.223.97.75 with SMTP id k11mr3274607fan.85.1292169331776; Sun, 12 Dec 2010 07:55:31 -0800 (PST) Received: from deb0 ([79.114.96.209]) by mx.google.com with ESMTPS id e6sm1420304fav.32.2010.12.12.07.55.30 (version=SSLv3 cipher=RC4-MD5); Sun, 12 Dec 2010 07:55:31 -0800 (PST) Date: Sun, 12 Dec 2010 17:55:24 +0200 From: =?UTF-8?B?VMO2csO2aw==?= Edwin To: "Jon Harrop" Cc: Subject: Re: Value types (Was: [Caml-list] ocamlopt LLVM support) Message-ID: <20101212175524.73a8e285@deb0> In-Reply-To: <036001cb9a0c$725acef0$57106cd0$@com> References: <036001cb9a0c$725acef0$57106cd0$@com> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; ocamlopt:01 haskell:01 ocaml:01 compiler:01 compiler:01 'int:01 'int:01 ocamlopt:01 gcc:01 ocaml:01 unboxing:01 unboxing:01 unboxed:01 edwin:98 offenders:98 On Sun, 12 Dec 2010 14:54:14 -0000 "Jon Harrop" wrote: > The Haskell guys got their best performance improvement moving to > LLVM from the hailstone benchmark so it is interesting to examine > this case as well. I just added support for 64-bit ints to HLVM to > implement that benchmark and my code is: >=20 > Here=E2=80=99s the equivalent OCaml code: >=20 > let rec collatzLen(c, n) : int =3D > if n =3D 1L then c else > collatzLen (c+1, if Int64.rem n 2L =3D 0L then Int64.div n 2L else OK, but boxing itself has nothing to do with the performance degration here. It is the lack of compiler optimizations on the Int64 type. This could be solved by implementing compiler optimizations for it (or manually optimizing some integer arithmetic that is slow). Lets run the code under a profiler, or look at the assembly (I used 'perf record' and 'perf report'). 2 'idiv' instructions turn up as top offenders in the profile. Problem #1: Int64.rem n 2 -> another idiv instruction A C compiler would optimize this to an 'and' instruction. Change that to 'Int64.logand n 1L =3D 0L'/ Problem #2: Int64.div n 2 -> idiv instruction.=20 A C compiler would optimize this to a right shift. Changing that to 'Int64.= shift_right n 1' speeds up the code. With these changes I get almost the same speed as the C code: $ ocamlopt x.ml -o x && time ./x 837799 real 0m0.664s user 0m0.667s sys 0m0.000s $ gcc -O3 x.c && time ./a.out 837799 real 0m0.635s user 0m0.633s sys 0m0.000s Here's the OCaml code: let rec collatzLen(c, n) : int =3D if n =3D 1L then c else collatzLen (c+1, if Int64.logand n 1L =3D 0L then Int64.shift_right n 1 else Int64.add (Int64.mul 3L n) 1L);; let rec loop(i, (nlen, n)) =3D if i =3D 1L then n else let ilen =3D collatzLen(1, i) in let nlen, n =3D if ilen > nlen then ilen, i else nlen, n in loop (Int64.sub i 1L, (nlen, n));; let _ =3D let s =3D loop (1000000L, (1,1000000L)) in print_int (Int64.to_int s);; > 1. Unboxing can give huge performance improvements on serial code, s/Unboxing/arithmetic optimizations/ Please find an example where the performance benefit is due to unboxing, and not due to arithmetic optimizations performed on the unboxed code. > let alone parallel code. The optimized HLVM is running 32=C3=97 faster > than the OCaml here. >=20 > 2. LLVM makes it easy to JIT fast code from OCaml. HLVM is using it > to beat GCC-compiled C code here. >=20 One advantage of using LLVM is that it would notice arithmetic optimizations like this and perform it itself (even if you use the boxed representation). Best regards, --Edwin