From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.2 required=5.0 tests=AWL,SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from discorde.inria.fr (discorde.inria.fr [192.93.2.38]) by yquem.inria.fr (Postfix) with ESMTP id A162FBC69 for ; Sun, 22 Apr 2007 18:12:17 +0200 (CEST) Received: from ik-out-1112.google.com (ik-out-1112.google.com [66.249.90.178]) by discorde.inria.fr (8.13.6/8.13.6) with ESMTP id l3MGCHoH024584 for ; Sun, 22 Apr 2007 18:12:17 +0200 Received: by ik-out-1112.google.com with SMTP id c29so1496007ika for ; Sun, 22 Apr 2007 09:12:17 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=Iu2RpBx/tbzGD4BwdrbGNPl0LazSrpWeQE3YS7JM9Fozv7aEHRwdJKg9Ci2M0UpYEj27WlmGzzTqVJOLRw4q5RHaFo+todrGho3LZw2kT0jJfgPDRCnNezmp52jWcyZBTqkus6zK8wgzh+KyPCvSgs8u3l6ag92lFs2ZZMu1pFo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=ga1tmGHbCGeee1wM+xsS2tnsxmdR5nqTsG581yq6eUi5rnSWoEv5taD/JdM/ZJUwFDAfHhnSPm/2YJCIm6oHZDLcpnW0kpTRRKEXjueDsVrzBPQRxnAJj0QehbkiDeeE/HLBEsMiKM4FCkCDVqcmt5qtCwNipYSsh1zfZiAJUHs= Received: by 10.78.204.20 with SMTP id b20mr868336hug.1177258336867; Sun, 22 Apr 2007 09:12:16 -0700 (PDT) Received: by 10.78.16.3 with HTTP; Sun, 22 Apr 2007 09:12:16 -0700 (PDT) Message-ID: Date: Sun, 22 Apr 2007 12:12:16 -0400 From: "Markus Mottl" To: "Xavier Leroy" Subject: Re: [Caml-list] Slow allocations with 64bit code? Cc: ocaml , "yaron jane" In-Reply-To: <462B37A0.4050205@inria.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <462B37A0.4050205@inria.fr> X-j-chkmail-Score: MSGID : 462B8961.000 on discorde : j-chkmail score : X : 0/20 1 0.000 -> 1 X-Miltered: at discorde with ID 462B8961.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; markus:01 mottl:01 markus:01 mottl:01 allocations:01 allocations:01 arrays:01 computations:01 locality:01 model:01 ocaml:01 timings:01 ocamlopt:01 model:01 alloc:01 On 4/22/07, Xavier Leroy wrote: > > I wonder whether others have already noticed that allocations may > > surprisingly be slower on 64bit platforms than on 32bit ones. > > As already mentioned, on 64-bit platforms almost all Caml data > representations are twice as large as on 32-bit platforms (exceptions: > strings, float arrays), so the processor has twice as much data to > move through its memory subsystem. Interesting, I was obviously under the wrong assumption that a 64bit machine would scale appropriately when accessing 64bit words in memory. Of course, I'm aware that cache effects also play a role, but the minor heap should easily fit into the cache of any modern machine in any case, and it's not like this experiment is eating memory. > However, you certainly don't get a slowdown by a factor of 2, for two > reasons: 1- the processor doesn't spend all its time doing memory > accesses, there are some computations here and there; 2- cache lines > are much bigger than 32 bits, meaning that accessing 64 bits at a > given address is much cheaper than accessing two 32-bit > quantities at two random addresses (spatial locality). > > Moreover, x86 in 64-bit mode is much more compiler-friendly than in > 32-bit mode: twice as many registers, a sensible floating-point model > at last. So, OCaml in 64-bit mode generates better code than in > 32-bit mode. > > All in all, your 10% slowdown seems reasonable and in line with what > others reported using C benchmarks. This seems reasonable. It just seemed surprising to me that in some of my tests a 64bit machine could be slower handling even "large" Int64-values than in 32bit-mode, in which it always has to perform two memory accesses and possibly some additional computation steps. > Be careful with timings: I've seen simple changes in code placement > (e.g. introducing or removing dead code) cause performance differences > in excess of 20%. It's an unfortunate fact of today's processors that > their performance is very hard to predict. This surely also requires some caution when interpreting mini-benchmarks. > ocamlopt compiles module initialization code in the so-called > "compact" model, where code size is reduced by not open-coding some > operations such as heap allocation, but instead going through > auxiliary functions like "caml_alloc2". This makes sense since > initialization code is usually large but not performance-critical. > I recommend you put performance-critical code in functions, not in the > initialization code. Thanks, this is a very important bit of information that I wasn't aware of! I used to run mini-benchmarks from initialization code in most cases, which is obviously a bad idea... Regards, Markus -- Markus Mottl http://www.ocaml.info markus.mottl@gmail.com