From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail1-relais-roc.national.inria.fr (mail1-relais-roc.national.inria.fr [192.134.164.82]) by yquem.inria.fr (Postfix) with ESMTP id EE4FDBC57 for ; Sun, 12 Dec 2010 18:26:37 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArgAAPCUBE3RVaEzimdsb2JhbACDXKArCBUBAQEKCQwHDwYlpX2JNDyCGINvLohWAQEDBYRRdASOdQ X-IronPort-AV: E=Sophos;i="4.59,333,1288566000"; d="s'?scan'208";a="92062789" Received: from mail-fx0-f51.google.com ([209.85.161.51]) by mail1-smtp-roc.national.inria.fr with ESMTP; 12 Dec 2010 18:26:37 +0100 Received: by fxm5 with SMTP id 5so4955968fxm.10 for ; Sun, 12 Dec 2010 09:26:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type; bh=lQMbolbIpQLn1ko20wUkoSTyousVJinDmGFJoJTmndw=; b=kzS3WyqPplloQP2QSNnR0X++lb8Iw641GjDvQvVmchU3iO+zuthHx38Hmf7Z5Au1YC UyR6f57x3XBjKJ/w6A7Elf7TpoADcv/4lkyQNm/190vObU98BtFzYontrWRJGB6UYY1D GfPnW7TJvAbTMkuBLZCjHjpEGT9sXTkldcvi4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type; b=Cbf+6JwkDwUql5gr/MbHNNpSn2owpONe3+nGRN52I32Xm9YzVkGenKRlLv+1TspOoj Gqzeog+OrWzuSyKy57HSJO3Hdyg18TWMf0rNzHEZQA2t+6dBkFp2wHl8f13laJEsChjJ mbjUsxsGBkGeddXcCK7hltIh5804JMXRCPlB0= Received: by 10.223.83.201 with SMTP id g9mr3361053fal.140.1292174797193; Sun, 12 Dec 2010 09:26:37 -0800 (PST) Received: from deb0 ([79.114.96.209]) by mx.google.com with ESMTPS id y1sm1438493fak.15.2010.12.12.09.26.36 (version=SSLv3 cipher=RC4-MD5); Sun, 12 Dec 2010 09:26:36 -0800 (PST) Date: Sun, 12 Dec 2010 19:26:32 +0200 From: =?UTF-8?B?VMO2csO2aw==?= Edwin To: "Jon Harrop" Cc: Subject: Re: Value types (Was: [Caml-list] ocamlopt LLVM support) Message-ID: <20101212192632.6536a647@deb0> In-Reply-To: <038301cb9a20$13af7d10$3b0e7730$@com> References: <036001cb9a0c$725acef0$57106cd0$@com> <20101212175524.73a8e285@deb0> <038301cb9a20$13af7d10$3b0e7730$@com> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="MP_/fwTjk1oGbYzqxDnp/iQrgC." X-Spam: no; 0.00; ocamlopt:01 compiler:01 'int:01 compiler:01 'int:01 ocamlopt:01 ocaml:01 hash:01 gcc:01 ocaml:01 fwiw:01 edwin:98 edwin:98 'and':98 1090:98 X-Attachments: cset="UTF-8" type="application/octet-stream" name="x.s" name="x.s" --MP_/fwTjk1oGbYzqxDnp/iQrgC. Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline On Sun, 12 Dec 2010 17:14:45 -0000 "Jon Harrop" wrote: > T=C3=B6r=C3=B6k Edwin wrote: > > Problem #1: Int64.rem n 2 -> another idiv instruction > >=20 > > A C compiler would optimize this to an 'and' instruction. > > Change that to 'Int64.logand n 1L =3D 0L'/ >=20 > Yes. LLVM did that for me. >=20 > > Problem #2: Int64.div n 2 -> idiv instruction. > >=20 > > A C compiler would optimize this to a right shift. Changing that to > > 'Int64.shift_right n 1' speeds > > up the code. >=20 > Yes. LLVM also did that for me. In fact, I have been bitten by > ocamlopt not optimizing div and mod by a constant in real OCaml code > before. This problem also turns up in the context of hash table > implementations where you want to % by the length of the spine. Do you really need to use Int64 for that though? Won't the 63-bit version do? >=20 > > With these changes I get almost the same speed as the C code: > > $ ocamlopt x.ml -o x && time ./x > > 837799 > > real 0m0.664s > > user 0m0.667s > > sys 0m0.000s > >=20 > > $ gcc -O3 x.c && time ./a.out > > 837799 > > real 0m0.635s > > user 0m0.633s > > sys 0m0.000s > >=20 > > Here's the OCaml code: > > let rec collatzLen(c, n) : int =3D > > if n =3D 1L then c else > > collatzLen (c+1, if Int64.logand n 1L =3D 0L then > > Int64.shift_right n 1 else Int64.add (Int64.mul 3L n) 1L);; > >=20 > > let rec loop(i, (nlen, n)) =3D > > if i =3D 1L then n else > > let ilen =3D collatzLen(1, i) in > > let nlen, n =3D if ilen > nlen then ilen, i else nlen, n in > > loop (Int64.sub i 1L, (nlen, n));; > >=20 > > let _ =3D > > let s =3D loop (1000000L, (1,1000000L)) in > > print_int (Int64.to_int s);; >=20 > I am unable to reproduce your results. Here, the time falls from 24s > to 19.5s (using ocamlopt 3.12.0 on Intel x86) which is still 26=C3=97 > slower than HLVM. Do you still have 'idiv' in the compiled code? See my attached assembly, and compare it with yours please. I was doing the test on 64-bit, with ocamlopt 3.11.2 and 3.12.0. FWIW the original code took 2.8 seconds here, so only 4x slower (this is an AMD Phenom II x6 1090T CPU). It probably depends how fast/slow the 'idiv' is on your CPU. --Edwin --MP_/fwTjk1oGbYzqxDnp/iQrgC. Content-Type: application/octet-stream; name=x.s Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=x.s CS5zZWN0aW9uICAgICAgICAucm9kYXRhLmNzdDgsImEiLEBwcm9nYml0cwoJLmFsaWduCTE2CmNh bWxfbmVnZl9tYXNrOgkucXVhZCAgIDB4ODAwMDAwMDAwMDAwMDAwMCwgMAoJLmFsaWduCTE2CmNh bWxfYWJzZl9tYXNrOgkucXVhZCAgIDB4N0ZGRkZGRkZGRkZGRkZGRiwgMHhGRkZGRkZGRkZGRkZG RkZGCgkuZGF0YQoJLmdsb2JsCWNhbWxYX19kYXRhX2JlZ2luCmNhbWxYX19kYXRhX2JlZ2luOgoJ LnRleHQKCS5nbG9ibAljYW1sWF9fY29kZV9iZWdpbgpjYW1sWF9fY29kZV9iZWdpbjoKCS5kYXRh CgkucXVhZAkyMDQ4CgkuZ2xvYmwJY2FtbFgKY2FtbFg6Cgkuc3BhY2UJMTYKCS5kYXRhCgkucXVh ZAkzMzE5CmNhbWxYX18yOgoJLnF1YWQJY2FtbF90dXBsaWZ5MgoJLnF1YWQJLTMKCS5xdWFkCWNh bWxYX19sb29wXzEwMzMKCS5kYXRhCgkucXVhZAkzMzE5CmNhbWxYX18zOgoJLnF1YWQJY2FtbF90 dXBsaWZ5MgoJLnF1YWQJLTMKCS5xdWFkCWNhbWxYX19jb2xsYXR6TGVuXzEwMzAKCS5kYXRhCgku cXVhZAkyMDQ4CmNhbWxYX18xOgoJLnF1YWQJLkwxMDAwMDQKCS5xdWFkCS5MMTAwMDA1CgkucXVh ZAkyMDQ4Ci5MMTAwMDA1OgoJLnF1YWQJMwoJLnF1YWQJLkwxMDAwMDYKCS5xdWFkCTIzMDMKLkwx MDAwMDY6CgkucXVhZAljYW1sX2ludDY0X29wcwoJLnF1YWQJMTAwMDAwMAoJLnF1YWQJMjMwMwou TDEwMDAwNDoKCS5xdWFkCWNhbWxfaW50NjRfb3BzCgkucXVhZAkxMDAwMDAwCgkudGV4dAoJLmFs aWduCTE2CgkuZ2xvYmwJY2FtbFhfX2NvbGxhdHpMZW5fMTAzMApjYW1sWF9fY29sbGF0ekxlbl8x MDMwOgoJc3VicQkkOCwgJXJzcAouTDEwMzoKCW1vdnEJJXJheCwgJXJkaQoJbW92cQkkMSwgJXJz aQoJbW92cQk4KCVyYngpLCAlcmF4CgljbXBxCSVyc2ksICVyYXgKCWpuZQkuTDEwMgoJbW92cQkl cmRpLCAlcmF4CglhZGRxCSQ4LCAlcnNwCglyZXQKCS5hbGlnbgk0Ci5MMTAyOgoJeG9ycQklcmR4 LCAlcmR4Cgltb3ZxCSQxLCAlcnNpCgltb3ZxCTgoJXJieCksICVyYXgKCWFuZHEJJXJzaSwgJXJh eAoJY21wcQklcmR4LCAlcmF4CglqbmUJLkwxMDEKLkwxMDQ6CXN1YnEJJDI0LCAlcjE1Cgltb3Zx CWNhbWxfeW91bmdfbGltaXRAR09UUENSRUwoJXJpcCksICVyYXgKCWNtcHEJKCVyYXgpLCAlcjE1 CglqYgkuTDEwNQoJbGVhcQk4KCVyMTUpLCAlcmN4Cgltb3ZxCSQyMzAzLCAtOCglcmN4KQoJbW92 cQljYW1sX2ludDY0X29wc0BHT1RQQ1JFTCglcmlwKSwgJXJheAoJbW92cQklcmF4LCAoJXJjeCkK CW1vdnEJOCglcmJ4KSwgJXJheAoJc2FycQkkMSwgJXJheAoJbW92cQklcmF4LCA4KCVyY3gpCglq bXAJLkwxMDAKCS5hbGlnbgk0Ci5MMTAxOgouTDEwNzoJc3VicQkkMjQsICVyMTUKCW1vdnEJY2Ft bF95b3VuZ19saW1pdEBHT1RQQ1JFTCglcmlwKSwgJXJheAoJY21wcQkoJXJheCksICVyMTUKCWpi CS5MMTA4CglsZWFxCTgoJXIxNSksICVyY3gKCW1vdnEJJDIzMDMsIC04KCVyY3gpCgltb3ZxCWNh bWxfaW50NjRfb3BzQEdPVFBDUkVMKCVyaXApLCAlcmF4Cgltb3ZxCSVyYXgsICglcmN4KQoJbW92 cQkkMSwgJXJkeAoJbW92cQk4KCVyYngpLCAlcnNpCgltb3ZxCSQzLCAlcmF4CglpbXVscQklcnNp LCAlcmF4CglhZGRxCSVyZHgsICVyYXgKCW1vdnEJJXJheCwgOCglcmN4KQouTDEwMDoKCW1vdnEJ JXJkaSwgJXJheAoJYWRkcQkkMiwgJXJheAoJbW92cQklcmN4LCAlcmJ4CglqbXAJLkwxMDMKLkwx MDg6CWNhbGwJY2FtbF9jYWxsX2djQFBMVAouTDEwOToJam1wCS5MMTA3Ci5MMTA1OgljYWxsCWNh bWxfY2FsbF9nY0BQTFQKLkwxMDY6CWptcAkuTDEwNAoJLnR5cGUJY2FtbFhfX2NvbGxhdHpMZW5f MTAzMCxAZnVuY3Rpb24KCS5zaXplCWNhbWxYX19jb2xsYXR6TGVuXzEwMzAsLi1jYW1sWF9fY29s bGF0ekxlbl8xMDMwCgkudGV4dAoJLmFsaWduCTE2CgkuZ2xvYmwJY2FtbFhfX2xvb3BfMTAzMwpj YW1sWF9fbG9vcF8xMDMzOgoJc3VicQkkMjQsICVyc3AKLkwxMTM6Cgltb3ZxCSVyYXgsICVyZHgK CW1vdnEJOCglcmJ4KSwgJXJheAoJbW92cQkkMSwgJXJzaQoJbW92cQk4KCVyZHgpLCAlcmRpCglj bXBxCSVyc2ksICVyZGkKCWpuZQkuTDExMgoJYWRkcQkkMjQsICVyc3AKCXJldAoJLmFsaWduCTQK LkwxMTI6Cgltb3ZxCSVyYXgsIDgoJXJzcCkKCW1vdnEJJXJkeCwgMTYoJXJzcCkKCW1vdnEJKCVy YngpLCAlcmF4Cgltb3ZxCSVyYXgsIDAoJXJzcCkKCW1vdnEJJDMsICVyYXgKCW1vdnEJJXJkeCwg JXJieAoJY2FsbAljYW1sWF9fY29sbGF0ekxlbl8xMDMwQFBMVAouTDExNDoKCW1vdnEJJXJheCwg JXJzaQoJbW92cQkwKCVyc3ApLCAlcmJ4CgljbXBxCSVyYngsICVyc2kKCWpsZQkuTDExMQouTDEx NToJc3VicQkkMjQsICVyMTUKCW1vdnEJY2FtbF95b3VuZ19saW1pdEBHT1RQQ1JFTCglcmlwKSwg JXJheAoJY21wcQkoJXJheCksICVyMTUKCWpiCS5MMTE2CglsZWFxCTgoJXIxNSksICVyZGkKCW1v dnEJJDIwNDgsIC04KCVyZGkpCgltb3ZxCSVyc2ksICglcmRpKQoJbW92cQkxNiglcnNwKSwgJXJh eAoJbW92cQklcmF4LCA4KCVyZGkpCglqbXAJLkwxMTAKCS5hbGlnbgk0Ci5MMTExOgouTDExODoJ c3VicQkkMjQsICVyMTUKCW1vdnEJY2FtbF95b3VuZ19saW1pdEBHT1RQQ1JFTCglcmlwKSwgJXJh eAoJY21wcQkoJXJheCksICVyMTUKCWpiCS5MMTE5CglsZWFxCTgoJXIxNSksICVyZGkKCW1vdnEJ JDIwNDgsIC04KCVyZGkpCgltb3ZxCSVyYngsICglcmRpKQoJbW92cQk4KCVyc3ApLCAlcmF4Cglt b3ZxCSVyYXgsIDgoJXJkaSkKLkwxMTA6Ci5MMTIxOglzdWJxCSQ0OCwgJXIxNQoJbW92cQljYW1s X3lvdW5nX2xpbWl0QEdPVFBDUkVMKCVyaXApLCAlcmF4CgljbXBxCSglcmF4KSwgJXIxNQoJamIJ LkwxMjIKCWxlYXEJOCglcjE1KSwgJXJieAoJbW92cQkkMjA0OCwgLTgoJXJieCkKCW1vdnEJKCVy ZGkpLCAlcmF4Cgltb3ZxCSVyYXgsICglcmJ4KQoJbW92cQk4KCVyZGkpLCAlcmF4Cgltb3ZxCSVy YXgsIDgoJXJieCkKCWxlYXEJMjQoJXJieCksICVyYXgKCW1vdnEJJDIzMDMsIC04KCVyYXgpCglt b3ZxCWNhbWxfaW50NjRfb3BzQEdPVFBDUkVMKCVyaXApLCAlcmRpCgltb3ZxCSVyZGksICglcmF4 KQoJbW92cQkkMSwgJXJzaQoJbW92cQkxNiglcnNwKSwgJXJkaQoJbW92cQk4KCVyZGkpLCAlcmRp CglzdWJxCSVyc2ksICVyZGkKCW1vdnEJJXJkaSwgOCglcmF4KQoJam1wCS5MMTEzCi5MMTIyOglj YWxsCWNhbWxfY2FsbF9nY0BQTFQKLkwxMjM6CWptcAkuTDEyMQouTDExOToJY2FsbAljYW1sX2Nh bGxfZ2NAUExUCi5MMTIwOglqbXAJLkwxMTgKLkwxMTY6CWNhbGwJY2FtbF9jYWxsX2djQFBMVAou TDExNzoJam1wCS5MMTE1CgkudHlwZQljYW1sWF9fbG9vcF8xMDMzLEBmdW5jdGlvbgoJLnNpemUJ Y2FtbFhfX2xvb3BfMTAzMywuLWNhbWxYX19sb29wXzEwMzMKCS50ZXh0CgkuYWxpZ24JMTYKCS5n bG9ibAljYW1sWF9fZW50cnkKY2FtbFhfX2VudHJ5OgoJc3VicQkkOCwgJXJzcAouTDEyNDoKCW1v dnEJY2FtbFhfXzNAR09UUENSRUwoJXJpcCksICVyYngKCW1vdnEJY2FtbFhAR09UUENSRUwoJXJp cCksICVyYXgKCW1vdnEJJXJieCwgKCVyYXgpCgltb3ZxCWNhbWxYX18yQEdPVFBDUkVMKCVyaXAp LCAlcmJ4Cgltb3ZxCWNhbWxYQEdPVFBDUkVMKCVyaXApLCAlcmF4Cgltb3ZxCSVyYngsIDgoJXJh eCkKCW1vdnEJY2FtbFhAR09UUENSRUwoJXJpcCksICVyYXgKCW1vdnEJOCglcmF4KSwgJXJieAoJ bW92cQljYW1sWF9fMUBHT1RQQ1JFTCglcmlwKSwgJXJheAoJbW92cQkoJXJieCksICVyZGkKCWNh bGwJKiVyZGkKLkwxMjU6Cgltb3ZxCTgoJXJheCksICVyYXgKCXNhbHEJJDEsICVyYXgKCW9ycQkk MSwgJXJheAoJY2FsbAljYW1sUGVydmFzaXZlc19fc3RyaW5nX29mX2ludF8xMTMwQFBMVAouTDEy NjoKCW1vdnEJJXJheCwgJXJieAoJbW92cQljYW1sUGVydmFzaXZlc0BHT1RQQ1JFTCglcmlwKSwg JXJheAoJbW92cQkxODQoJXJheCksICVyYXgKCWNhbGwJY2FtbFBlcnZhc2l2ZXNfX291dHB1dF9z dHJpbmdfMTE5MUBQTFQKLkwxMjc6Cgltb3ZxCSQxLCAlcmF4CglhZGRxCSQ4LCAlcnNwCglyZXQK CS50eXBlCWNhbWxYX19lbnRyeSxAZnVuY3Rpb24KCS5zaXplCWNhbWxYX19lbnRyeSwuLWNhbWxY X19lbnRyeQoJLmRhdGEKCS50ZXh0CgkuZ2xvYmwJY2FtbFhfX2NvZGVfZW5kCmNhbWxYX19jb2Rl X2VuZDoKCS5kYXRhCgkuZ2xvYmwJY2FtbFhfX2RhdGFfZW5kCmNhbWxYX19kYXRhX2VuZDoKCS5s b25nCTAKCS5nbG9ibAljYW1sWF9fZnJhbWV0YWJsZQpjYW1sWF9fZnJhbWV0YWJsZToKCS5xdWFk CTkKCS5xdWFkCS5MMTI3Cgkud29yZAkxNwoJLndvcmQJMAoJLmFsaWduCTgKCS5sb25nCSguTDIw MDAwMCAtIC4pICsgMHhlMDAwMDAwMAoJLmxvbmcJMHgxNjgxMjAKCS5xdWFkCS5MMTI2Cgkud29y ZAkxNwoJLndvcmQJMAoJLmFsaWduCTgKCS5sb25nCSguTDIwMDAwMCAtIC4pICsgMHhlMDAwMDAw MAoJLmxvbmcJMHgxNjgyNzAKCS5xdWFkCS5MMTI1Cgkud29yZAkxNgoJLndvcmQJMAoJLmFsaWdu CTgKCS5xdWFkCS5MMTIzCgkud29yZAkzMgoJLndvcmQJMgoJLndvcmQJMTYKCS53b3JkCTUKCS5h bGlnbgk4CgkucXVhZAkuTDEyMAoJLndvcmQJMzIKCS53b3JkCTMKCS53b3JkCTMKCS53b3JkCTgK CS53b3JkCTE2CgkuYWxpZ24JOAoJLnF1YWQJLkwxMTcKCS53b3JkCTMyCgkud29yZAkyCgkud29y ZAkxNgoJLndvcmQJNwoJLmFsaWduCTgKCS5xdWFkCS5MMTE0Cgkud29yZAkzMgoJLndvcmQJMwoJ LndvcmQJMAoJLndvcmQJOAoJLndvcmQJMTYKCS5hbGlnbgk4CgkucXVhZAkuTDEwOQoJLndvcmQJ MTYKCS53b3JkCTIKCS53b3JkCTMKCS53b3JkCTUKCS5hbGlnbgk4CgkucXVhZAkuTDEwNgoJLndv cmQJMTYKCS53b3JkCTIKCS53b3JkCTMKCS53b3JkCTUKCS5hbGlnbgk4Ci5MMjAwMDAwOgoJLmFz Y2l6CSJwZXJ2YXNpdmVzLm1sIgoJLmFsaWduCTgKCS5zZWN0aW9uIC5ub3RlLkdOVS1zdGFjaywi IiwlcHJvZ2JpdHMK --MP_/fwTjk1oGbYzqxDnp/iQrgC.--