From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id VAA32754; Mon, 4 Jun 2001 21:47:47 +0200 (MET DST) Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id VAA00038 for ; Mon, 4 Jun 2001 21:47:46 +0200 (MET DST) Received: from smtp1.xs4all.nl (smtp1.xs4all.nl [194.109.127.131]) by nez-perce.inria.fr (8.11.1/8.10.0) with ESMTP id f54JljX13327 for ; Mon, 4 Jun 2001 21:47:45 +0200 (MET DST) Received: from beertje.william.bogus (williamc.xs4all.nl [213.84.56.92]) by smtp1.xs4all.nl (8.9.3/8.9.3) with ESMTP id VAA25329; Mon, 4 Jun 2001 21:47:42 +0200 (CEST) Received: (from williamc@localhost) by beertje.william.bogus (8.11.2/8.11.2/SuSE Linux 8.11.1-0.5) id f54Jpdc11928; Mon, 4 Jun 2001 21:51:39 +0200 X-Authentication-Warning: beertje.william.bogus: williamc set sender to williamc@paneris.org using -f From: William Chesters MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15131.59080.327155.47983@beertje.william.bogus> Date: Mon, 4 Jun 2001 21:51:36 +0200 (CEST) To: Subject: [Caml-list] OCaml Speed for Block Convolutions In-Reply-To: <002f01c0ecf9$d028a3b0$210148bf@dylan> References: <002f01c0ecf9$d028a3b0$210148bf@dylan> X-Mailer: VM 6.75 under 21.1 (patch 14) "Cuyahoga Valley" XEmacs Lucid Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk David McClain writes: > ....well, I hate to say this... but I did recode the innermost loop in C and > it now runs more than 5 times faster... Straight recoding into C bought 4x, > and using better math brought that up to 5x. > > I think the big thing here is that the OCaml code was producing huge amounts > of garbage, despite preallocated buffers with which all the processing was > reading and writing data. The ancillary closures and tuple args were just > eating my shirt... I can easily imagine that, with two caveats: tuples passed directly to functions do seem to get elided, while on the other hand apparently atomic float accumulators can cause more garbage than you might think. E.g. let a = ref 0. in for i = 0 to n-1 do a := !a +. Array.unsafe_get xs i done makes garbage---`a' isn't unboxed---while type floatref = { mutable it: float } let a = { it = 0. } in for i = 0 to n-1 do a := !a +. Array.unsafe_get xs i done doesn't. The effect on the visible quality of the assembler for the inner loop is dramatic, and so is the speed improvement ... basically using polymorphic data structures is a bad idea. I wonder if this is a limitation you have run up against? Some years ago I made a library using camlp4 for supporting tensor notation, e.g. tens x[I] = a[I J] b[J] using both automatically generated C and vanilla caml. When I recently noticed the above point about not using polymorphic references, I found there was rather little difference in performane between the C and ocaml versions. ------------------- To unsubscribe, mail caml-list-request@inria.fr. Archives: http://caml.inria.fr