From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA32754; Mon, 4 Jun 2001 21:47:47 +0200 (MET DST)
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA00038 for <caml-list@pauillac.inria.fr>; Mon, 4 Jun 2001 21:47:46 +0200 (MET DST)
Received: from smtp1.xs4all.nl (smtp1.xs4all.nl [194.109.127.131])
	by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id f54JljX13327
	for <caml-list@inria.fr>; Mon, 4 Jun 2001 21:47:45 +0200 (MET DST)
Received: from beertje.william.bogus (williamc.xs4all.nl [213.84.56.92])
	by smtp1.xs4all.nl (8.9.3/8.9.3) with ESMTP id VAA25329;
	Mon, 4 Jun 2001 21:47:42 +0200 (CEST)
Received: (from williamc@localhost)
	by beertje.william.bogus (8.11.2/8.11.2/SuSE Linux 8.11.1-0.5) id f54Jpdc11928;
	Mon, 4 Jun 2001 21:51:39 +0200
X-Authentication-Warning: beertje.william.bogus: williamc set sender to williamc@paneris.org using -f
From: William Chesters <williamc@paneris.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Message-ID: <15131.59080.327155.47983@beertje.william.bogus>
Date: Mon, 4 Jun 2001 21:51:36 +0200 (CEST)
To: <caml-list@inria.fr>
Subject: [Caml-list] OCaml Speed for Block Convolutions
In-Reply-To: <002f01c0ecf9$d028a3b0$210148bf@dylan>
References: <002f01c0ecf9$d028a3b0$210148bf@dylan>
X-Mailer: VM 6.75 under 21.1 (patch 14) "Cuyahoga Valley" XEmacs Lucid
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

David McClain writes:
 > ....well, I hate to say this... but I did recode the innermost loop in C and
 > it now runs more than 5 times faster... Straight recoding into C bought 4x,
 > and using better math brought that up to 5x.
 > 
 > I think the big thing here is that the OCaml code was producing huge amounts
 > of garbage, despite preallocated buffers with which all the processing was
 > reading and writing data. The ancillary closures and tuple args were just
 > eating my shirt...

I can easily imagine that, with two caveats: tuples passed directly to
functions do seem to get elided, while on the other hand apparently
atomic float accumulators can cause more garbage than you might think.
E.g.

        let a = ref 0. in
        for i = 0 to n-1 do
          a := !a +. Array.unsafe_get xs i
        done

makes garbage---`a' isn't unboxed---while

        type floatref = { mutable it: float }

        let a = { it = 0. } in
        for i = 0 to n-1 do
          a := !a +. Array.unsafe_get xs i
        done

doesn't.  The effect on the visible quality of the assembler for the
inner loop is dramatic, and so is the speed improvement ... basically
using polymorphic data structures is a bad idea.  I wonder if this is
a limitation you have run up against?

Some years ago I made a library using camlp4 for supporting tensor
notation, e.g.

        tens x[I] = a[I J] b[J]

using both automatically generated C and vanilla caml.  When I
recently noticed the above point about not using polymorphic
references, I found there was rather little difference in performane
between the C and ocaml versions.
-------------------
To unsubscribe, mail caml-list-request@inria.fr.  Archives: http://caml.inria.fr