From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id B3986BB84 for ; Tue, 18 Apr 2006 18:18:35 +0200 (CEST) Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k3IGIZBi024304 for ; Tue, 18 Apr 2006 18:18:35 +0200 Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id SAA19310 for ; Tue, 18 Apr 2006 18:18:34 +0200 (MET DST) Received: from [128.93.11.95] (estephe.inria.fr [128.93.11.95]) by concorde.inria.fr (8.13.0/8.13.0) with ESMTP id k3IGIX0w024290 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 18 Apr 2006 18:18:33 +0200 Message-ID: <44451158.8020004@inria.fr> Date: Tue, 18 Apr 2006 18:18:32 +0200 From: Xavier Leroy User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050322) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Michel Schinz Cc: caml-list@inria.fr Subject: Re: [Caml-list] Re: Performance of threaded interpreter on hyper-threaded CPU References: <4444A46C.5000102@inria.fr> In-Reply-To: X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Miltered: at concorde with ID 4445115B.001 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Miltered: at concorde with ID 44451159.000 by Joe's j-chkmail (http://j-chkmail.ensmp.fr)! X-Spam: no; 0.00; ocamlrun:01 gcc:01 gcc:01 ocaml:01 bytecode:01 compiler:01 model:01 intel's:01 caml-list:01 compiles:01 consistent:02 executable:03 threaded:03 processors:04 install:05 X-Spam-Checker-Version: SpamAssassin 3.0.3 (2005-04-27) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=disabled version=3.0.3 > However, your remark motivated me to measure the performance of a > single ocamlrun executable running on the various Pentium 4 I have at > hand, and the results are interesting... Random thoughts: The performance variations between the gcc versions confirm my impression that gcc is getting "too clever for its own good" -- carefully hand-optimized code like the OCaml bytecode interpreter is best served by a compiler that compiles code nearly as written. (Think gcc 2.95.) The P4 microarchitecture is known for its weird performance model: some code runs very fast, some similar code very slow. In my experience, AMD processors as well as the Pentium-M are much more consistent performance-wise. If you really want to understand what's going on, you need a good performance analysis tool. Timing runs will tell you nothing. Intel's VTUNE is king of the hill here, but the Windows version is costly and I could never install the free Linux version. - Xavier Leroy