From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Xavier.Leroy@inria.fr>
X-Original-To: caml-list@sympa.inria.fr
Delivered-To: caml-list@sympa.inria.fr
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by sympa.inria.fr (Postfix) with ESMTPS id E30BB7EE51
	for <caml-list@sympa.inria.fr>; Wed, 24 Apr 2013 19:40:16 +0200 (CEST)
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  Xavier.Leroy@inria.fr) identity=pra; client-ip=212.27.42.2;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="Xavier.Leroy@inria.fr";
  x-sender="Xavier.Leroy@inria.fr";
  x-conformance=sidf_compatible
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  Xavier.Leroy@inria.fr) identity=mailfrom;
  client-ip=212.27.42.2;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="Xavier.Leroy@inria.fr";
  x-sender="Xavier.Leroy@inria.fr";
  x-conformance=sidf_compatible
Received-SPF: None (mail3-smtp-sop.national.inria.fr: no sender
  authenticity information available from domain of
  postmaster@smtp2-g21.free.fr) identity=helo;
  client-ip=212.27.42.2;
  receiver=mail3-smtp-sop.national.inria.fr;
  envelope-from="Xavier.Leroy@inria.fr";
  x-sender="postmaster@smtp2-g21.free.fr";
  x-conformance=sidf_compatible
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AvsBAJUXeFHUGyoClGdsb2JhbABQgzwBvmSBABYOAQEBAQcNCQkUAyWCHwEBAwEBUyUGCwshFg8JAwIBAgFFEwYCAQEQh3oKCL4xjW2BShaDNQOXHIYOjiI
X-IPAS-Result: AvsBAJUXeFHUGyoClGdsb2JhbABQgzwBvmSBABYOAQEBAQcNCQkUAyWCHwEBAwEBUyUGCwshFg8JAwIBAgFFEwYCAQEQh3oKCL4xjW2BShaDNQOXHIYOjiI
X-IronPort-AV: E=Sophos;i="4.87,542,1363129200"; 
   d="scan'208";a="12095464"
Received: from smtp2-g21.free.fr ([212.27.42.2])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 24 Apr 2013 19:40:15 +0200
Received: from [192.168.1.2] (unknown [82.237.71.191])
	by smtp2-g21.free.fr (Postfix) with ESMTP id 4F4E14B00C6
	for <caml-list@inria.fr>; Wed, 24 Apr 2013 19:40:12 +0200 (CEST)
Message-ID: <517818FB.8050002@inria.fr>
Date: Wed, 24 Apr 2013 19:40:11 +0200
From: Xavier Leroy <Xavier.Leroy@inria.fr>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5
MIME-Version: 1.0
To: caml-list@inria.fr
References: <20130424183543.e3a4290382f7f9ce7b522a57@gmail.com>
In-Reply-To: <20130424183543.e3a4290382f7f9ce7b522a57@gmail.com>
X-Enigmail-Version: 1.4.6
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Caml-list] ackermann microbenchmark strange results

On 24/04/13 12:35, ygrek wrote:

> Running `make bench` here consistently gives the following (ack1,
  ack3 - tuples, ack2, ack4 - curried) : [...]
> Moreover, the generated assembly code for the main loop is the same,
> afaics. The only difference is the initialization of structure
> fields and the initial call to ack. Please can anybody explain the
> performance difference?

As others said, probably code placement.  A good tool to explore these
issues under Linux is "perf", in particular "perf stats", which lets
you count various hardware events.  Running it on your 4 programs,
you'll see that the number of instructions executed is almost exactly
the same, but the branch mispredictions (for instance) vary
significantly, in a way correlated with the overall execution time.

(Note that this doesn't have anything to do with OCaml: you'd observe
crazy variations like these with any compiled language.)

> I understand that microbenchmarks are no way the basis to draw
> performance conclusions upon, but I cannot explain these results to
> myself in any meaninful way.

Nobody can except perhaps a few engineers at Intel or AMD who have the
whole microarchitecture of their processors imprinted in their heads.
(And then they will not tell you.)  See, modern microarchitectures
contain so many clever heuristics that performance is generally very
good but essentially unpredictable...

rixed@happyleptic.org adds:

> Remember me of a paper I read some years ago that was measuring the
> effect of the various optimisation levels of gcc against the effect
> of addresses choices (randomised using environment strings of
> various lengths!), and which conclusion was that the effect of -O3
> compared to -O2 was less than the effect of "choosing" a good
> environment string :-) Couldn't find it again using Google ; maybe
> someone remember this paper?

It's probably "Producing Wrong Data Without Doing Anything Obviously
Wrong!" by Diwan et al, ASPLOS 2009,
http://www-plan.cs.colorado.edu/diwan/asplos09.pdf
This paper made quite a splash in the compiler community...

- Xavier Leroy