From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: * X-Spam-Status: No, score=1.9 required=5.0 tests=DNS_FROM_RFC_ABUSE, SPF_NEUTRAL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 1F23EBC6C for ; Thu, 7 Feb 2008 13:30:22 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAFqIqkfVBJXyi2dsb2JhbACBWY5ZAQEBCAQECQoRBZto X-IronPort-AV: E=Sophos;i="4.25,316,1199660400"; d="scan'208";a="22349720" Received: from concorde.inria.fr ([192.93.2.39]) by mail4-smtp-sop.national.inria.fr with ESMTP; 07 Feb 2008 13:30:22 +0100 Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id m17CUL7a004483 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Thu, 7 Feb 2008 13:30:21 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAAFqIqkfVBJXyi2dsb2JhbACBWY5ZAQEBCAQECQoRBZto X-IronPort-AV: E=Sophos;i="4.25,316,1199660400"; d="scan'208";a="22349715" Received: from outmailhost.telefonica.net (HELO ctsmtpout4.frontal.correo) ([213.4.149.242]) by mail4-smtp-sop.national.inria.fr with ESMTP; 07 Feb 2008 13:30:21 +0100 Received: from tux-chan (79.146.220.253) by ctsmtpout4.frontal.correo (7.2.056.6) (authenticated as ferferse$telefonica.net) id 47A8E90D000BC83E for caml-list@inria.fr; Thu, 7 Feb 2008 13:29:52 +0100 Received: from batsman by tux-chan with local (Exim 3.36 #1 (Debian)) id 1JN5t1-0003lJ-00 for ; Thu, 07 Feb 2008 13:29:51 +0100 Date: Thu, 7 Feb 2008 13:29:50 +0100 From: Mauricio Fernandez To: caml-list@inria.fr Subject: Re: [Caml-list] OCaml image blending performance Message-ID: <20080207122950.GA11491@tux-chan> Mail-Followup-To: caml-list@inria.fr References: <854c25eb0802061229o34a6155dncca9d8492cfe6932@mail.gmail.com> <200802062334.02485.jon@ffconsultancy.com> <20080207110154.GA24561@annexia.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080207110154.GA24561@annexia.org> User-Agent: Mutt/1.5.11 Sender: =?iso-8859-1?Q?Mauricio_Julio_Fern=E1ndez_Pradier?= X-Miltered: at concorde with ID 47AAF9DD.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 inlining:01 ml':01 ocaml:01 byte:01 speedup:01 unrolling:01 decl:01 bigarray:01 ocamlopt:01 bigarray:01 -unsafe:01 $1,:98 $1,:98 wrote:01 On Thu, Feb 07, 2008 at 11:01:54AM +0000, Richard Jones wrote: > On Wed, Feb 06, 2008 at 11:34:02PM +0000, Jon Harrop wrote: > > In this case, most of the speed loss can be regained by simply > > aggressively inlining everything, which is exactly what you (ab)used > > C macros for in the C code. > > I don't understand this. In 'blend2.ml' (which I was responsible for) > C macros are used to inline all the OCaml functions the same as in the > C version. Yet it's still 70% slower than the C version. > > My suspicion was that it was to do with his use of a string as a byte > array. I get a 30% speedup by unrolling the BLEND loop and performing some additional CSE (just getting rid of that extra (j_+3), actually). I think the major culprit is 31-bit integer arithmetic: there are lots of "orl $1, ...", "sarl $1, ...", decl and incl in the generated code. If Bigarray.Array1 had unsafe_get and unsafe_set, operating on int32 values could be faster (ocamlopt is often smart enough to do without boxed int32s). Currently, bigarray bound checking is performed even with -unsafe, however. -- Mauricio Fernandez - http://eigenclass.org