From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id EBEF1BC57 for ; Sun, 12 Dec 2010 19:22:22 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ArgAAJehBE3RVaEzimdsb2JhbACDXKArCBUBAQEKCQwHDwYlpgOJNDyCGINvLohWAQEDBYEcgzV0BI51 X-IronPort-AV: E=Sophos;i="4.59,333,1288566000"; d="scan'208";a="83622190" Received: from mail-fx0-f51.google.com ([209.85.161.51]) by mail2-smtp-roc.national.inria.fr with ESMTP; 12 Dec 2010 19:22:22 +0100 Received: by fxm5 with SMTP id 5so4980452fxm.10 for ; Sun, 12 Dec 2010 10:22:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:date:from:to:cc:subject :message-id:in-reply-to:references:x-mailer:mime-version :content-type:content-transfer-encoding; bh=4yoMKRWQmzDKmiQ3WhESII6a6KqXSIJwtCJZzl+82ao=; b=n4lzfbEzYkr1EiclXhIhFqww4SUQBzeDXPEMdzAy9BAi4N1XNEDSl9GphQ4FQFOo9o JYm3eddS72uI98/zleoGyPb3o1zukacHa0144LG6LGQ4lvAQ4JYxw5hYdqFNsPgiH8rs 11XiMY97iZqSNv+kVCqWpk+eHJfHz+7TIOBOc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:in-reply-to:references:x-mailer :mime-version:content-type:content-transfer-encoding; b=Gmigjxyah75REfP+m0cpVsXlPBHKh24vnL9IX5eV3BnvHf5WiOppsJW+XJmC8qtT8D J+bq/Lmh4pe9iPJxM3uMKlkprtLXkhbt/kyFPHedQ6y2V0p+JXiwT8dD6UOyjjwfDDYS vav7tSlAuuV+VhIDQt8PXGJMDs0Gd6gIKX7uc= Received: by 10.223.72.197 with SMTP id n5mr3404260faj.8.1292178142271; Sun, 12 Dec 2010 10:22:22 -0800 (PST) Received: from deb0 ([79.114.96.209]) by mx.google.com with ESMTPS id n7sm1448455fam.11.2010.12.12.10.22.21 (version=SSLv3 cipher=RC4-MD5); Sun, 12 Dec 2010 10:22:21 -0800 (PST) Date: Sun, 12 Dec 2010 20:22:17 +0200 From: =?UTF-8?B?VMO2csO2aw==?= Edwin To: "Jon Harrop" Cc: Subject: Re: Value types (Was: [Caml-list] ocamlopt LLVM support) Message-ID: <20101212202217.4ee72dc3@deb0> In-Reply-To: <039301cb9a26$9214c2e0$b63e48a0$@com> References: <036001cb9a0c$725acef0$57106cd0$@com> <20101212175524.73a8e285@deb0> <038301cb9a20$13af7d10$3b0e7730$@com> <20101212192632.6536a647@deb0> <039301cb9a26$9214c2e0$b63e48a0$@com> X-Mailer: Claws Mail 3.7.6 (GTK+ 2.20.1; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam: no; 0.00; ocamlopt:01 ocamlopt:01 ocaml's:01 fwiw:01 gcc:01 cheers:01 edwin:98 edwin:98 1090:98 wrote:01 wrote:01 inline:01 caml-list:01 int:01 caml:02 On Sun, 12 Dec 2010 18:01:13 -0000 "Jon Harrop" wrote: > T=C3=B6r=C3=B6k Edwin wrote: > > Do you really need to use Int64 for that though? Won't the 63-bit > > version do? >=20 > I'm running 32-bit. That explains it, in a 32-bit chroot my modified version is still slow. I thought that 32-bit systems are not that relevant anymore (except for Windows, but then people start moving to 64-bit there also). >=20 > > > I am unable to reproduce your results. Here, the time falls from > > > 24s to 19.5s (using ocamlopt 3.12.0 on Intel x86) which is still > > > 26=C3=97 slower than HLVM. >=20 > Sorry, I'm actually using an Opteron x86 (logged in from an Intel > x86!). >=20 > > Do you still have 'idiv' in the compiled code? See my attached > > assembly, and compare it with yours please. > > I was doing the test on 64-bit, with ocamlopt 3.11.2 and 3.12.0. >=20 > I get what appear to be calls to C code: >=20 > camlCollatz__collatzLen_1030: > subl $8, %esp > .L103: > movl %eax, 4(%esp) > movl %ebx, 0(%esp) > pushl $camlCollatz__10 > pushl %ebx > movl $caml_equal, %eax > call caml_c_call Yes, that is quite bad. I don't know how OCaml's code generator works, but it looks like it calls the C implementation if the CPU doesn't support the operation directly. And since this is 32-bit you need all the extra pushes and movs to do actually call something. If only it could inline those calls, then it could optimize away most of the overhead (LLVM would help here again). >=20 > > FWIW the original code took 2.8 seconds here, so only 4x slower > > (this is an AMD Phenom II x6 1090T CPU). It probably depends how > > fast/slow the 'idiv' is on your CPU. >=20 > The performance of idiv is irrelevant here. The bottleneck may be > those C calls but I don't understand why they are being generated. I think for the same reason gcc has __udivdi3 in libgcc: there is no direct way of executing a 64-bit divide on a 32-bit machine, and it saves code space to do it in a function. However that doesn't make much sense for mul and add, which don't need that many instructions to implement on 32-bit. >=20 > Cheers, > Jon. >=20 >=20