From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id AAA05929; Sun, 19 Jan 2003 00:40:31 +0100 (MET)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id AAA05835 for <caml-list@pauillac.inria.fr>; Sun, 19 Jan 2003 00:40:30 +0100 (MET)
Received: from speakeasy.org (dsl081-017-141.sea1.dsl.speakeasy.net [64.81.17.141])
	by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id h0INeSv19907
	for <caml-list@inria.fr>; Sun, 19 Jan 2003 00:40:28 +0100 (MET)
Received: (from shawnw@localhost)
	by speakeasy.org (8.10.1/8.10.1) id h0INotY11756
	for caml-list@inria.fr; Sat, 18 Jan 2003 15:50:55 -0800
Date: Sat, 18 Jan 2003 15:50:54 -0800
From: Shawn Wagner <shawnw@speakeasy.org>
To: caml-list@inria.fr
Subject: Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
Message-ID: <20030118155054.C22850@speakeasy.org>
Mail-Followup-To: caml-list@inria.fr
References: <3E15B3B3.3040106@163.com> <20030103071042.T22850@speakeasy.org> <20030104193118.A26208@pauillac.inria.fr> <200301181749.48295.oleg_inconnu@myrealbox.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2.5i
In-Reply-To: <200301181749.48295.oleg_inconnu@myrealbox.com>; from oleg_inconnu@myrealbox.com on Sat, Jan 18, 2003 at 05:49:47PM -0500
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

On Sat, Jan 18, 2003 at 05:49:47PM -0500, Oleg wrote:
> On Saturday 04 January 2003 01:31 pm, Xavier Leroy wrote:
> > Apparently, the ocamlopt-generated code
> > offers less instruction-level parallelism than the g++-generated code
> > for the float computations.  Still, I haven't really understood where
> > the factor of 2 comes from.  
> 
> It's been a couple of weeks. I'm wondering if you got any new insights into 
> this?
> 
> Just as wild guess: the code contains calls to "sin" and "cos" on the same 
> value. Perhaps GCC manages to optimize those into one call to "sincos"

It doesn't. I tried making a C++ version that does when I was fooling around
with it. Didn't really help. The single greatest speed increase I got (Which
did something like cut the runtime in half) was -ffast-math, which cuts out
the trig function calls in favor of direct use of the proper x86
instructions. But the inlined-sincos (__sincos()) in glibc causes a segfault
on my athlon when I tried using it. :P

Something like gcc's -ffast-math for ocamlopt would be nice, but improving
the scheduler is probably of more general use, making it able to target code
for specific CPUs like gcc does with good results. Targeting athlon instead
of i386 cut the almabench time by 30 seconds, for example.

I don't know anything about the code-generation bits of ocamlopt, though, so
I have no idea how big of a project that would be.

-- 
Shawn Wagner
shawnw@speakeasy.org
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners