Re: [Caml-list] speed

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Re: [Caml-list] speed
  2003-01-03 11:47 ` [Caml-list] speed Noel Welsh
@ 2003-01-02 16:45   ` Chet Murthy
  0 siblings, 0 replies; 29+ messages in thread
From: Chet Murthy @ 2003-01-02 16:45 UTC (permalink / raw)
  To: Noel Welsh; +Cc: onlyclimb, caml-list

[I work for IBM.  Notwithstanding, I'm really not a Java shill.  But,
well, try whatever you do on the IBM JDKs, or the _Sunsoft_ _Solaris_
JDKs.]

Actually, I think you'll find that for tight integer loops and
floating-point stuff, Java is already as fast as good C++.  After all,
that's what the JIT guys optimized first and best.  That said, I've
found that in fact, you can get as good performance from Java, as from
Perl or Caml.  You just gotta really optimize your Java code in
strange (non-JDK-compliant) ways.

--chet--

>>>>> "NW" == Noel Welsh <noelwelsh@yahoo.com> writes:

    NW> --- onlyclimb <onlyclimb@163.com> wrote:

    >> Is it normal that my ocaml program is only 2 times faster than
    >> the java counterpart ?(using the same method and complied into
    >> native. jdk is 1.4.1

    NW> It depends entirely on the program.  I wouldn't expect a huge
    NW> difference in speed in, say, an HTTP server where most of the
    NW> time is spent waiting for the disk. In numeric applications I
    NW> would expect O'Caml to be significantly faster than Java.
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] speed
  2003-01-03 13:32 ` Xavier Leroy
@ 2003-01-02 17:52   ` Chet Murthy
  2003-01-03 14:53     ` Sven Luther
  2003-01-02 17:53   ` Coyote Gulch test in Caml (was Re: [Caml-list] speed ) Chet Murthy
  2003-01-05  1:13   ` [Caml-list] speed Brian Hurt
  2 siblings, 1 reply; 29+ messages in thread
From: Chet Murthy @ 2003-01-02 17:52 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: onlyclimb, caml-list

Not to contradict Xavier, because in essence, he is right -- Caml is
indeed far faster than Java on any realistic applications in almost
any area I have ever bothered to try -- but the story as to Java is
actually rather complicated.

(1) different JDKs exhibit remarkably different results on real-world
examples, as their implementors have different backgrounds.  I
remember that the first JITs all did great on integer and
floating-point loops, and that was _it_ -- the rest of the time, they
were often slower than just a hack like inlining interpreter
code-segments.  This is just a human thing.

(2) different JDKs from different manufacturers exhibit different
behaviours.  E.g., I find that the Sunsoft JDKs on Solaris are a lot
faster than the Javasoft JDKs on Solaris.  I also find (no, I'm not
shilling for IBM) that IBM's JDK on Linux is a lot faster than
Javasoft's.  There are, again, social issues involved here, which I am
not sure I am at liberty to discuss.

That said, by and large I find that when you don't go near issues of
allocation and interprocedural optimization, Java is and can be as
fast as Caml.  *However*, when you _do_ go near those things, e.g. if
you do anything I/O or string-processing-intensive, well,

  go get a rocking chair, 'cos you're gonna have a looong wait.

--chet--

P.S. Or get thee to a caml and get it done.  *grin*
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-03 13:32 ` Xavier Leroy
  2003-01-02 17:52   ` Chet Murthy
@ 2003-01-02 17:53   ` Chet Murthy
  2003-01-03 15:10     ` Shawn Wagner
  2003-01-05  1:13   ` [Caml-list] speed Brian Hurt
  2 siblings, 1 reply; 29+ messages in thread
From: Chet Murthy @ 2003-01-02 17:53 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: onlyclimb, caml-list


If anybody's ported this to Caml, I'd love to get a copy.

Cheers,
--chet--

P.S. If nobody has, well, I guess I'll have to roll up my sleeves and
do it.

>>>>> "XL" == Xavier Leroy <xavier.leroy@inria.fr> writes:

    XL> More seriously: Java is nowhere as fast as a good C++ compiler
    XL> (see e.g. http://www.coyotegulch.com/reviews/almabench.html
    XL> for an independent, cross-language benchmark in numerical
    XL> computing),
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Caml-list] Re: speed
  2003-01-03 16:00 [Caml-list] speed onlyclimb
@ 2003-01-03 11:38 ` Clemens Hintze
  2003-01-03 11:47 ` [Caml-list] speed Noel Welsh
  2003-01-03 13:32 ` Xavier Leroy
  2 siblings, 0 replies; 29+ messages in thread
From: Clemens Hintze @ 2003-01-03 11:38 UTC (permalink / raw)
  To: caml-list

In mailing-list.ocaml, you wrote:
> Is it normal that my ocaml program is only 2 times faster than the
> java counterpart ?(using the same method and complied into native.
> jdk is 1.4.1

I do not believe, that this is normal, to be expected, or general
case.  There are, for sure, programs that may be faster in Java than
in O'Caml. But those should be exceptions!

Also it could be, that you have a excellent written Java application,
using all performance improvements that are possible with Java and
comparing that against a poorly written O'Caml application using all
anti-performance improvements possible ;-)

Would you mind to post some code showing the performance difference? I
am very interested in that as I "sell" O'Caml as generally faster than
Java to some of my colleagues, and I do not want to be a story teller
;-)


Thanks in advance,
Clemens.

-- 
Clemens Hintze  mailto: c.hintze (at) gmx.net
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] speed
  2003-01-03 16:00 [Caml-list] speed onlyclimb
  2003-01-03 11:38 ` [Caml-list] speed Clemens Hintze
@ 2003-01-03 11:47 ` Noel Welsh
  2003-01-02 16:45   ` Chet Murthy
  2003-01-03 13:32 ` Xavier Leroy
  2 siblings, 1 reply; 29+ messages in thread
From: Noel Welsh @ 2003-01-03 11:47 UTC (permalink / raw)
  To: onlyclimb, caml-list


--- onlyclimb <onlyclimb@163.com> wrote:
> Is it normal that my ocaml program is only 2 times
> faster than the java 
> counterpart ?(using the same method and complied
> into native. jdk is 1.4.1

It depends entirely on the program.  I wouldn't expect
a huge difference in speed in, say, an HTTP server
where most of the time is spent waiting for the disk. 
In numeric applications I would expect O'Caml to be
significantly faster than Java.

Noel

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] speed
  2003-01-03 16:00 [Caml-list] speed onlyclimb
  2003-01-03 11:38 ` [Caml-list] speed Clemens Hintze
  2003-01-03 11:47 ` [Caml-list] speed Noel Welsh
@ 2003-01-03 13:32 ` Xavier Leroy
  2003-01-02 17:52   ` Chet Murthy
                     ` (2 more replies)
  2 siblings, 3 replies; 29+ messages in thread
From: Xavier Leroy @ 2003-01-03 13:32 UTC (permalink / raw)
  To: onlyclimb; +Cc: caml-list

> Is it normal that my ocaml program is only 2 times faster than the java 
> counterpart ?(using the same method and complied into native. jdk is 1.4.1

You know, many compiler researchers would kill their whole families to
get speedups by a factor of 2 :-)

James Gosling gave a talk at INRIA recently where he repeated the
party line that JDK 1.4 runs as fast, or even faster, than C++.
So, by transitivity, you're implying that OCaml is twice as fast as C++.
Yippee! 

More seriously: Java is nowhere as fast as a good C++ compiler (see
e.g. http://www.coyotegulch.com/reviews/almabench.html for an
independent, cross-language benchmark in numerical computing),
but it's not that slow either.  A factor of 2 slower than ocamlopt
sounds broadly reasonable, especially if the program doesn't stress
the GC too much.  Bagley's shootout (http://www.bagley.org/~doug/shootout/)
seems to suggest a larger factor (JDK 1.3 slightly slower than OCaml
bytecode), but his figures may be lowered by Java's slow start-up times.

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] speed
  2003-01-02 17:52   ` Chet Murthy
@ 2003-01-03 14:53     ` Sven Luther
  2003-01-03 15:28       ` Erol Akarsu
  0 siblings, 1 reply; 29+ messages in thread
From: Sven Luther @ 2003-01-03 14:53 UTC (permalink / raw)
  To: Chet Murthy; +Cc: Xavier Leroy, onlyclimb, caml-list

On Thu, Jan 02, 2003 at 12:52:25PM -0500, Chet Murthy wrote:
> 
> Not to contradict Xavier, because in essence, he is right -- Caml is
> indeed far faster than Java on any realistic applications in almost
> any area I have ever bothered to try -- but the story as to Java is
> actually rather complicated.
> 
> (1) different JDKs exhibit remarkably different results on real-world
> examples, as their implementors have different backgrounds.  I
> remember that the first JITs all did great on integer and
> floating-point loops, and that was _it_ -- the rest of the time, they
> were often slower than just a hack like inlining interpreter
> code-segments.  This is just a human thing.

Do you have any idea how gcj does, compared to ocamlopt maybe ? After
all, if i am not wrong, both generate native code.

Friendly,

Sven Luther
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-02 17:53   ` Coyote Gulch test in Caml (was Re: [Caml-list] speed ) Chet Murthy
@ 2003-01-03 15:10     ` Shawn Wagner
  2003-01-03 15:56       ` Oleg
  2003-01-04 18:31       ` Xavier Leroy
  0 siblings, 2 replies; 29+ messages in thread
From: Shawn Wagner @ 2003-01-03 15:10 UTC (permalink / raw)
  To: caml-list

On Thu, Jan 02, 2003 at 12:53:07PM -0500, Chet Murthy wrote:
> 
> If anybody's ported this to Caml, I'd love to get a copy.

http://raevnos.pennmush.org/code/almabench-ocaml.tar.gz

It's pretty much a straight translation of the C++ version, and not very
impressive speed-wise on my system compared to the C++ one.

-- 
Shawn Wagner
shawnw@speakeasy.org
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] speed
  2003-01-03 14:53     ` Sven Luther
@ 2003-01-03 15:28       ` Erol Akarsu
  0 siblings, 0 replies; 29+ messages in thread
From: Erol Akarsu @ 2003-01-03 15:28 UTC (permalink / raw)
  To: Sven Luther; +Cc: Chet Murthy, Xavier Leroy, onlyclimb, caml-list

Hi All,

What is the best way to access to huge Java libraries created so far from ocaml?
I am looking for efficient way, easy-to-write wrappers. I am actually intersted in
accessing to Web services utilities , RDF parsers and some multiagent libraries
all written java.

Thanks

Erol Akarsu

Sven Luther wrote:

> On Thu, Jan 02, 2003 at 12:52:25PM -0500, Chet Murthy wrote:
> >
> > Not to contradict Xavier, because in essence, he is right -- Caml is
> > indeed far faster than Java on any realistic applications in almost
> > any area I have ever bothered to try -- but the story as to Java is
> > actually rather complicated.
> >
> > (1) different JDKs exhibit remarkably different results on real-world
> > examples, as their implementors have different backgrounds.  I
> > remember that the first JITs all did great on integer and
> > floating-point loops, and that was _it_ -- the rest of the time, they
> > were often slower than just a hack like inlining interpreter
> > code-segments.  This is just a human thing.
>
> Do you have any idea how gcj does, compared to ocamlopt maybe ? After
> all, if i am not wrong, both generate native code.
>
> Friendly,
>
> Sven Luther
> -------------------
> To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
> Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-03 15:10     ` Shawn Wagner
@ 2003-01-03 15:56       ` Oleg
  2003-01-04 18:31       ` Xavier Leroy
  1 sibling, 0 replies; 29+ messages in thread
From: Oleg @ 2003-01-03 15:56 UTC (permalink / raw)
  To: Shawn Wagner, caml-list

On Friday 03 January 2003 10:10 am, Shawn Wagner wrote:
> On Thu, Jan 02, 2003 at 12:53:07PM -0500, Chet Murthy wrote:
> > If anybody's ported this to Caml, I'd love to get a copy.
>
> http://raevnos.pennmush.org/code/almabench-ocaml.tar.gz
>
> It's pretty much a straight translation of the C++ version, and not very
> impressive speed-wise on my system compared to the C++ one.

Here are my results (Pentium 4M 1.8 Ghz, 256 MB 266 Hz RAM):

IFC 7.0 : 18
ICC 7.0 : 26
G++ 3.2.1 : 47
Ocaml :  90

Adding "-xW -fno-alias" makes ICC as fast as IFC.

Oleg
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Caml-list] speed
@ 2003-01-03 16:00 onlyclimb
  2003-01-03 11:38 ` [Caml-list] speed Clemens Hintze
                   ` (2 more replies)
  0 siblings, 3 replies; 29+ messages in thread
From: onlyclimb @ 2003-01-03 16:00 UTC (permalink / raw)
  To: caml-list

Is it normal that my ocaml program is only 2 times faster than the java 
counterpart ?(using the same method and complied into native. jdk is 1.4.1



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-03 15:10     ` Shawn Wagner
  2003-01-03 15:56       ` Oleg
@ 2003-01-04 18:31       ` Xavier Leroy
  2003-01-18 22:49         ` Oleg
  1 sibling, 1 reply; 29+ messages in thread
From: Xavier Leroy @ 2003-01-04 18:31 UTC (permalink / raw)
  To: caml-list

> http://raevnos.pennmush.org/code/almabench-ocaml.tar.gz
> It's pretty much a straight translation of the C++ version, and not very
> impressive speed-wise on my system compared to the C++ one.

Thanks a lot for the OCaml translation.  As you say, the speed of the
OCaml version is about 50% of that of the C++ version, both on Athlon
with g++, and on Alpha with the Tru64 cxx compiler.  This is both
reassuring and disappointing:

Reassuring, because our blanket performance statement "OCaml
delivers at least 50% of the performance of a decent C compiler" is
not invalidated :-)

Disappointing, because the assembly code generated by ocamlopt isn't
too ugly despite the code not being very Caml-ish in style.  In
particular, (almost) all float and ref boxing is correctly eliminated.
Given this, I was expecting maybe 75% of the performances of C++, not
50%.  Simple hand optimization (CSE, loop unrolling) doesn't affect
the speed significantly.  Apparently, the ocamlopt-generated code
offers less instruction-level parallelism than the g++-generated code
for the float computations.  Still, I haven't really understood where
the factor of 2 comes from.  

Thanks again for an interesting benchmark,

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] speed
  2003-01-03 13:32 ` Xavier Leroy
  2003-01-02 17:52   ` Chet Murthy
  2003-01-02 17:53   ` Coyote Gulch test in Caml (was Re: [Caml-list] speed ) Chet Murthy
@ 2003-01-05  1:13   ` Brian Hurt
  2003-01-05  1:48     ` Michael Vanier
  2 siblings, 1 reply; 29+ messages in thread
From: Brian Hurt @ 2003-01-05  1:13 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: onlyclimb, caml-list

Woo hoo.  Language advocacy with benchmarks again.

Feel free to replace this whole post with a comment about "lies, damned 
lies, and cross language benchmarks".  It amounts to the same thing.

On Fri, 3 Jan 2003, Xavier Leroy wrote:

> > Is it normal that my ocaml program is only 2 times faster than the java 
> > counterpart ?(using the same method and complied into native. jdk is 1.4.1
> 
> You know, many compiler researchers would kill their whole families to
> get speedups by a factor of 2 :-)
> 
> James Gosling gave a talk at INRIA recently where he repeated the
> party line that JDK 1.4 runs as fast, or even faster, than C++.

Quibble #1: *what* C++?  Most of the time, when I see C++ benchmarked,
what's really being benchmarked is C compiled with a C++ compiler, or at
most C with classes.  My experience with C++ tells me that if you actually
use the features of C++- RTTI, templates, STL, exceptions, operator
overloading, etc- the code you produce is often much *slower* than Java.  
With a language as feature rich/bloated as C++, which subset of the
language you use makes a huge difference in your performance.  Ocaml has
the same problem in a lot of ways.

Quibble #2: define "equivelent program".

> So, by transitivity, you're implying that OCaml is twice as fast as C++.
> Yippee! 
> 
> More seriously: Java is nowhere as fast as a good C++ compiler (see
> e.g. http://www.coyotegulch.com/reviews/almabench.html for an
> independent, cross-language benchmark in numerical computing),

I note the coyote gulch benchmark shows IBM's Java to be more-or-less on
par with GCC 3.2.  I note, btw, that GCC 3.2 is signifigantly better at
optimization than GCC 2.9x, producing code about 10% faster on average
IIRC according to the GCC maintainers themselves.  Which tells me that
IBM's Java *is* better than GCC 2.9x.  Which is still the most commonly
used compiler on Linux systems.  Ditto for Windows.  My own experience and
tests show me that MS VC++ 6.0 is no better than, and in many cases worse
than, GCC 2.9x for optimization.

> but it's not that slow either.  A factor of 2 slower than ocamlopt
> sounds broadly reasonable, especially if the program doesn't stress
> the GC too much.  Bagley's shootout (http://www.bagley.org/~doug/shootout/)
> seems to suggest a larger factor (JDK 1.3 slightly slower than OCaml
> bytecode), but his figures may be lowered by Java's slow start-up times.

Startup costs dominate in Bagley's shootout.  Look at matrix 
multiplication- the fastest tests (C, C++, and Ocaml) are running in 
70-110 milliseconds.  Most timers are accurate only to ~10 milliseconds, 
which means the time for the C program to run could be anything from 
600 millisecond to 800 milliseconds, for an error of +/-14.3%.

Java has huge start up costs.  First off, you have the JIT.  Then, there 
is a time delay before hotspot kicks in an actually starts optimizing the 
code to any signifigant extent.  Notice that the pro-Java benchmarks run 
the code to be benchmarked a few thousands or tens of thousands of times 
before starting the timer, so that the hotspot optimizer has already been 
over the code a couple of times.  Or at least once, to bypass JIT time.   
Is this a legitimate tactic?  Lies, damned lies, and cross-language 
benchmarks.

Note that I can also claim, with a straight face, that Ocaml is 5x 
*slower* than Java.  Take a look at Bagley's shootout on matrix 
multiplication, comparing byte-code interpreted Java with byte-code 
intepreted Ocaml.  Which is a much more apples to apples comparison.

Then there is the question of *future* performance of the languages.  In 
the pro-Java camp, I direct your attention to HP's Dynamo project:
http://www.arstechnica.com/reviews/1q00/dynamo/dynamo-1.html
http://www.hpl.hp.com/cambridge/projects/Dynamo/
which showed that a virtual PA-RISC emulator could run the code up to 20%
faster than running the same code native.  In the pro-Ocaml camp, Caml's
innate ease of reasoning about code open up, I think, a much larger array
of potiental optimizations for the compiler.

Of course, Java, Ocaml, and C++ all pale in comparison to the performance 
of hand-tuned assembly language.  Ergo, anyone who is using performance of 
the generated code as the primary reason for picking a language should, by 
all logic, be coding in assembly language.

Note that I, personally, think that performance should be the last reason 
used to pick a language.  Things like correctness of the code, available 
libraries and environments, and existing talents and skills of the 
workforce, should instead take precedence.

Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] speed
  2003-01-05  1:13   ` [Caml-list] speed Brian Hurt
@ 2003-01-05  1:48     ` Michael Vanier
  0 siblings, 0 replies; 29+ messages in thread
From: Michael Vanier @ 2003-01-05  1:48 UTC (permalink / raw)
  To: brian.hurt; +Cc: xavier.leroy, onlyclimb, caml-list


> Date: Sat, 4 Jan 2003 19:13:11 -0600 (CST)
> From: Brian Hurt <brian.hurt@qlogic.com>
> 
> Startup costs dominate in Bagley's shootout.  Look at matrix 
> multiplication- the fastest tests (C, C++, and Ocaml) are running in 
> 70-110 milliseconds.  Most timers are accurate only to ~10 milliseconds, 
> which means the time for the C program to run could be anything from 
> 600 millisecond to 800 milliseconds, for an error of +/-14.3%.
> 
> Java has huge start up costs.  First off, you have the JIT.  Then, there 
> is a time delay before hotspot kicks in an actually starts optimizing the 
> code to any signifigant extent.  Notice that the pro-Java benchmarks run 
> the code to be benchmarked a few thousands or tens of thousands of times 
> before starting the timer, so that the hotspot optimizer has already been 
> over the code a couple of times.  Or at least once, to bypass JIT time.   
> Is this a legitimate tactic?  Lies, damned lies, and cross-language 
> benchmarks.

I think it is a legitimate tactic.  If your code can run in 100
milliseconds, I could care less about performance.  I want high performance
for programs that are going to run for hours, days, or weeks.  For these,
startup costs should hardly matter.


> Note that I, personally, think that performance should be the last reason 
> used to pick a language.  Things like correctness of the code, available 
> libraries and environments, and existing talents and skills of the 
> workforce, should instead take precedence.
> 
> Brian
> 

True, but it depends a lot on the application.  If you're doing heavy
graphics or big simulations, you simply can't ignore performance.

Mike
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-04 18:31       ` Xavier Leroy
@ 2003-01-18 22:49         ` Oleg
  2003-01-18 23:50           ` Shawn Wagner
                             ` (3 more replies)
  0 siblings, 4 replies; 29+ messages in thread
From: Oleg @ 2003-01-18 22:49 UTC (permalink / raw)
  To: Xavier Leroy, caml-list

On Saturday 04 January 2003 01:31 pm, Xavier Leroy wrote:
> Apparently, the ocamlopt-generated code
> offers less instruction-level parallelism than the g++-generated code
> for the float computations.  Still, I haven't really understood where
> the factor of 2 comes from.  

It's been a couple of weeks. I'm wondering if you got any new insights into 
this?

Just as wild guess: the code contains calls to "sin" and "cos" on the same 
value. Perhaps GCC manages to optimize those into one call to "sincos"
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-18 22:49         ` Oleg
@ 2003-01-18 23:50           ` Shawn Wagner
  2003-01-20 21:23             ` David Chase
  2003-01-19 10:33           ` Siegfried Gonzi
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 29+ messages in thread
From: Shawn Wagner @ 2003-01-18 23:50 UTC (permalink / raw)
  To: caml-list

On Sat, Jan 18, 2003 at 05:49:47PM -0500, Oleg wrote:
> On Saturday 04 January 2003 01:31 pm, Xavier Leroy wrote:
> > Apparently, the ocamlopt-generated code
> > offers less instruction-level parallelism than the g++-generated code
> > for the float computations.  Still, I haven't really understood where
> > the factor of 2 comes from.  
> 
> It's been a couple of weeks. I'm wondering if you got any new insights into 
> this?
> 
> Just as wild guess: the code contains calls to "sin" and "cos" on the same 
> value. Perhaps GCC manages to optimize those into one call to "sincos"

It doesn't. I tried making a C++ version that does when I was fooling around
with it. Didn't really help. The single greatest speed increase I got (Which
did something like cut the runtime in half) was -ffast-math, which cuts out
the trig function calls in favor of direct use of the proper x86
instructions. But the inlined-sincos (__sincos()) in glibc causes a segfault
on my athlon when I tried using it. :P

Something like gcc's -ffast-math for ocamlopt would be nice, but improving
the scheduler is probably of more general use, making it able to target code
for specific CPUs like gcc does with good results. Targeting athlon instead
of i386 cut the almabench time by 30 seconds, for example.

I don't know anything about the code-generation bits of ocamlopt, though, so
I have no idea how big of a project that would be.

-- 
Shawn Wagner
shawnw@speakeasy.org
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-18 22:49         ` Oleg
  2003-01-18 23:50           ` Shawn Wagner
@ 2003-01-19 10:33           ` Siegfried Gonzi
  2003-01-19 10:34           ` Siegfried Gonzi
  2003-01-21  9:56           ` [Caml-list] Re: Coyote Gulch test in Caml Xavier Leroy
  3 siblings, 0 replies; 29+ messages in thread
From: Siegfried Gonzi @ 2003-01-19 10:33 UTC (permalink / raw)
  To: caml-list

Oleg wrote:

>On Saturday 04 January 2003 01:31 pm, Xavier Leroy wrote:
>
>>Apparently, the ocamlopt-generated code
>>offers less instruction-level parallelism than the g++-generated code
>>for the float computations.  Still, I haven't really understood where
>>the factor of 2 comes from.  
>>
>
>It's been a couple of weeks. I'm wondering if you got any new insights into 
>this?
>
I am wondering whether they did analysize the Bigloo (Scheme) results:

[according to Manuel based on code by S. Gonzi; see comp.lang.scheme]

          Compiler                                              usr+usr
-----------------------------------------------------------+---------------
ocamlopt -unsafe -noassert -inline 2:                           95.01s
bigloo -Obench -jvm (jdk1.3.1):                                 55.73s
java (jdk1.3.1):                                                52.53s
bigloo -Obench -copt "-ffast-math -fomit-frame-pointer -O3":    40.57s
gcc -ffast-math -fomit-frame-pointer -O3:                       38.37s

Btw: the Stalin compiler produces code (note: common Scheme operators) 
which runs faster than the C++ version even.


S. Gonzi


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-18 22:49         ` Oleg
  2003-01-18 23:50           ` Shawn Wagner
  2003-01-19 10:33           ` Siegfried Gonzi
@ 2003-01-19 10:34           ` Siegfried Gonzi
  2003-01-21  9:56           ` [Caml-list] Re: Coyote Gulch test in Caml Xavier Leroy
  3 siblings, 0 replies; 29+ messages in thread
From: Siegfried Gonzi @ 2003-01-19 10:34 UTC (permalink / raw)
  To: caml-list

Oleg wrote:

>On Saturday 04 January 2003 01:31 pm, Xavier Leroy wrote:
>
>>Apparently, the ocamlopt-generated code
>>offers less instruction-level parallelism than the g++-generated code
>>for the float computations.  Still, I haven't really understood where
>>the factor of 2 comes from.  
>>
>
>It's been a couple of weeks. I'm wondering if you got any new insights into 
>this?
>
I am wondering whether they did analysize the Bigloo (Scheme) results:

[according to Manuel based on code by S. Gonzi; see comp.lang.scheme]

          Compiler                                              usr+usr
-----------------------------------------------------------+---------------
ocamlopt -unsafe -noassert -inline 2:                           95.01s
bigloo -Obench -jvm (jdk1.3.1):                                 55.73s
java (jdk1.3.1):                                                52.53s
bigloo -Obench -copt "-ffast-math -fomit-frame-pointer -O3":    40.57s
gcc -ffast-math -fomit-frame-pointer -O3:                       38.37s

Btw: the Stalin compiler produces code (note: common Scheme operators) 
which runs faster than the C++ version even.


S. Gonzi

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-18 23:50           ` Shawn Wagner
@ 2003-01-20 21:23             ` David Chase
  2003-01-20 21:39               ` Nickolay Semyonov-Kolchin
  0 siblings, 1 reply; 29+ messages in thread
From: David Chase @ 2003-01-20 21:23 UTC (permalink / raw)
  To: caml-list

At 03:50 PM 1/18/2003 -0800, Shawn Wagner wrote:
>On Sat, Jan 18, 2003 at 05:49:47PM -0500, Oleg wrote:
>> On Saturday 04 January 2003 01:31 pm, Xavier Leroy wrote:
>> > Apparently, the ocamlopt-generated code
>> > offers less instruction-level parallelism than the g++-generated code
>> > for the float computations.  Still, I haven't really understood where
>> > the factor of 2 comes from.  
>
>> Just as wild guess: the code contains calls to "sin" and "cos" on the same 
>> value. Perhaps GCC manages to optimize those into one call to "sincos"
>
>It doesn't. I tried making a C++ version that does when I was fooling around
>with it. Didn't really help. The single greatest speed increase I got (Which
>did something like cut the runtime in half) was -ffast-math, which cuts out
>the trig function calls in favor of direct use of the proper x86
>instructions. But the inlined-sincos (__sincos()) in glibc causes a segfault
>on my athlon when I tried using it. :P

Just a silly question, but if you want sin and cos to go faster,
how much accuracy are you willing to trade away for improved
performance?  Just for example, by using the Pentium instructions,
you reduce the number of (accurate) significant bits in the result
from 53 (IEEE double) to 13 (for some inputs between zero and 2*PI).
(If you are using 64-bit mantissas, the worst case is only 4 bits of
accuracy.)

I've noticed over the years that people focus on speed over many other
things, usually because they can measure it.  Well, we can measure
accuracy of transcendental functions, too, so I thought I would
ask the question.  How much is enough for your application?  Of the
languages being benchmarked, which one has the most accurate
transcendental functions?  Is this less important than speed?

David Chase

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-20 21:23             ` David Chase
@ 2003-01-20 21:39               ` Nickolay Semyonov-Kolchin
  2003-01-21  0:54                 ` Brian Hurt
  2003-01-21 13:09                 ` David Chase
  0 siblings, 2 replies; 29+ messages in thread
From: Nickolay Semyonov-Kolchin @ 2003-01-20 21:39 UTC (permalink / raw)
  To: David Chase; +Cc: caml-list

On Tuesday 21 January 2003 02:23, David Chase wrote:
>
> I've noticed over the years that people focus on speed over many other
> things, usually because they can measure it.  Well, we can measure
> accuracy of transcendental functions, too, so I thought I would
> ask the question.  How much is enough for your application?  Of the
> languages being benchmarked, which one has the most accurate
> transcendental functions?  Is this less important than speed?
>

Speed and accuracy are different things. Matlab class software need accuracy, 
most computer games need speed. This is the reason of "-ffast-math" key in 
gcc. Ocaml lacks such key, and always produce ineffecient floating-point 
code.

Nickolay
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-20 21:39               ` Nickolay Semyonov-Kolchin
@ 2003-01-21  0:54                 ` Brian Hurt
  2003-01-21 13:09                 ` David Chase
  1 sibling, 0 replies; 29+ messages in thread
From: Brian Hurt @ 2003-01-21  0:54 UTC (permalink / raw)
  To: Nickolay Semyonov-Kolchin; +Cc: David Chase, caml-list

On Tue, 21 Jan 2003, Nickolay Semyonov-Kolchin wrote:

> On Tuesday 21 January 2003 02:23, David Chase wrote:
> 
> Speed and accuracy are different things. Matlab class software need
> accuracy, most computer games need speed. This is the reason of
> "-ffast-math" key in gcc. Ocaml lacks such key, and always produce
> ineffecient floating-point code.
> 

>From gcc's info:

`-ffast-math'
     This option allows GCC to violate some ANSI or IEEE rules and/or
     specifications in the interest of optimizing code for speed.  For
     example, it allows the compiler to assume arguments to the `sqrt'
     function are non-negative numbers and that no floating-point values
     are NaNs.

     This option should never be turned on by any `-O' option since it
     can result in incorrect output for programs which depend on an
     exact implementation of IEEE or ANSI rules/specifications for math
     functions.

Which raises a couple of questions.  The first question is wether 
-ffast-math mainly violates ANSI or IEEE rules.  If ANSI, we're OK- we 
just define the Ocaml rules so we don't have to violate them.

But then this brings up the issue of conformity vr.s performance.  For
example- the x86 has its 80-bit FP registers in 8087-legacy mode, but
64-bit registers if you're using SSE2.  And PowerPC and PA-RISC both have
extended precision fused multiply-adds (that keep higher precision, i.e.
don't round, between the multiply and the adds).  For that matter, could a 
"conforming" implementation of Ocaml use the 32-bit single precision SSE-1 
registers?

http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
http://www.cs.berkeley.edu/~wkahan/Curmudge.pdf

As a general rule, I perfer the higher precision when it doesn't hurt 
enormously.  Basically, keeping things at at least 64-bit IEEE FP is a 
good idea- except in special cases, the speed advantage of going down to 
single precision.

Oh, and if we're talking about performance, memory behavior is much more 
important than precision of floating point primitives (so long as FP is in 
hardware).  A complex FP operation may take tens of clock cycles- but a 
cache miss now takes hundreds.  The most important paper about numeric 
performance of Ocaml might be this one:
http://www.cs.princeton.edu/~mjrg/fpca95.ps.Z

Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Caml-list] Re: Coyote Gulch test in Caml
  2003-01-18 22:49         ` Oleg
                             ` (2 preceding siblings ...)
  2003-01-19 10:34           ` Siegfried Gonzi
@ 2003-01-21  9:56           ` Xavier Leroy
  2003-01-21 15:57             ` Brian Hurt
  2003-01-27 16:58             ` Daniel Andor
  3 siblings, 2 replies; 29+ messages in thread
From: Xavier Leroy @ 2003-01-21  9:56 UTC (permalink / raw)
  To: Oleg; +Cc: caml-list

> On Saturday 04 January 2003 01:31 pm, Xavier Leroy wrote:
> > Apparently, the ocamlopt-generated code
> > offers less instruction-level parallelism than the g++-generated code
> > for the float computations.  Still, I haven't really understood where
> > the factor of 2 comes from.  

Oleg asks:

> It's been a couple of weeks. I'm wondering if you got any new insights into 
> this?

Yes: I'm just back from a trip to the US and had plenty of time to
kill during the transatlantic flights :-)

Apparently, one cause of inefficiency is excessive storing of
float results in memory temporaries.  The x86 is a wierd beast: while
loading floats from memory is quite fast (almost as fast as using a
float already on the register stack), storing (the fstp instruction)
seems to be quite expensive.  

Fortunately, a small modification to the ocamlopt x86 code generator
can remove many of these stores to temporaries in the case of
the Almabench test.   With this modification, the OCaml code runs at
2/3 the speed of the code generated by g++ -O3, which is still not
great but more in-line with previous numerical benchmarks.

I also played with a "-ffast-math" flag for ocamlopt, whereas some math
functions (sin, cos, sqrt, log) are directly expanded into x86
instructions.  With this, we get 85% of the performance of g++ -O3,
which isn't bad, and 2/3 of the performance of g++ -O3 -ffast-math.

At any rate, the changes above to the OCaml code generator need to be
tested more before possible inclusion in the next release.  Never
trust code that you wrote in an airplane, especially while fighting
for the armrest with an elderly central European lady who doesn't
understand any of the languages that you speak :-)

> Just as wild guess: the code contains calls to "sin" and "cos" on the same 
> value. Perhaps GCC manages to optimize those into one call to "sincos"

No, gcc doesn't do that.  But perhaps the Intel compiler does.

David Chase warns:

> Just a silly question, but if you want sin and cos to go faster,
> how much accuracy are you willing to trade away for improved
> performance?  Just for example, by using the Pentium instructions,
> you reduce the number of (accurate) significant bits in the result
> from 53 (IEEE double) to 13 (for some inputs between zero and 2*PI).
> (If you are using 64-bit mantissas, the worst case is only 4 bits of
> accuracy.)

I didn't know that.  At any rate, the sin() and cos() functions from
the Linux libm probably suffer from this loss of precision too,
because they are of the following form:

cos:    fcos instruction
        if operand was in the [-2^64,2^64] range, return
        reduce operand modulo 2pi
        fcos instruction
        return

Hence, using fcos rather than calling cos() should give the same (not
very exact) result as long as the operand is in the [-2^64,2^64] range,
and return a nonsensical result otherwise.

Nickolay Semyonov-Kolchin asks:

> But then this brings up the issue of conformity vr.s performance.  For
> example- the x86 has its 80-bit FP registers in 8087-legacy mode, but
> 64-bit registers if you're using SSE2.  And PowerPC and PA-RISC both have
> extended precision fused multiply-adds (that keep higher precision, i.e.
> don't round, between the multiply and the adds).

ocamlopt uses 80-bit floats for intermediate results on the x86, and
the multiply-add instruction on the PowerPC.  It is true that this can
cause the final results to differ from those of the bytecode compiler,
which uses strict 64-bit float arithmetic, but I believe this is
acceptable, both for the additional speed and because the result is
"more exact" from a numerical analysis standpoint.

> For that matter, could a 
> "conforming" implementation of Ocaml use the 32-bit single precision SSE-1 
> registers?

Using single-precision FP is questionable because of the important
loss in precision.  However, SSE-2 supports double precision
arithmetic on SSE registers, and that could be an adequate target for
ocamlopt-generated code.  I plan to experiment with this soon.

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-20 21:39               ` Nickolay Semyonov-Kolchin
  2003-01-21  0:54                 ` Brian Hurt
@ 2003-01-21 13:09                 ` David Chase
  2003-01-21 13:15                   ` Daniel Andor
  2003-01-21 20:26                   ` Nickolay Semyonov-Kolchin
  1 sibling, 2 replies; 29+ messages in thread
From: David Chase @ 2003-01-21 13:09 UTC (permalink / raw)
  To: caml-list

At 02:39 AM 1/21/2003 +0500, Nickolay Semyonov-Kolchin wrote:
>Speed and accuracy are different things. Matlab class software need accuracy, 
>most computer games need speed. This is the reason of "-ffast-math" key in 
>gcc. Ocaml lacks such key, and always produce ineffecient floating-point 
>code.

But how much accuracy do computer games need?  First-class implementations
of sin/cos in software are quite fast, indeed faster (in certain non-trivial
ranges) than the hardware itself.  If it happens that you could determine
how *little* accuracy you actually need, it could go faster yet :-).

David Chase


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-21 13:09                 ` David Chase
@ 2003-01-21 13:15                   ` Daniel Andor
  2003-01-21 20:26                   ` Nickolay Semyonov-Kolchin
  1 sibling, 0 replies; 29+ messages in thread
From: Daniel Andor @ 2003-01-21 13:15 UTC (permalink / raw)
  To: David Chase, caml-list

On Tuesday 21 January 2003 1:09 pm, David Chase wrote:
> At 02:39 AM 1/21/2003 +0500, Nickolay Semyonov-Kolchin wrote:
> >Speed and accuracy are different things. Matlab class software need
> > accuracy, most computer games need speed. This is the reason of
> > "-ffast-math" key in gcc. Ocaml lacks such key, and always produce
> > ineffecient floating-point code.
>
> But how much accuracy do computer games need?  First-class implementations
> of sin/cos in software are quite fast, indeed faster (in certain
> non-trivial ranges) than the hardware itself.  If it happens that you could
> determine how *little* accuracy you actually need, it could go faster yet
> :-).

If only a few significant figures are of interest to you then you could use 
lookup tables, no?  How would the lookup overhead compare with the actual 
computation time?

D

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Re: Coyote Gulch test in Caml
  2003-01-21  9:56           ` [Caml-list] Re: Coyote Gulch test in Caml Xavier Leroy
@ 2003-01-21 15:57             ` Brian Hurt
  2003-01-27 16:58             ` Daniel Andor
  1 sibling, 0 replies; 29+ messages in thread
From: Brian Hurt @ 2003-01-21 15:57 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Oleg, caml-list

On Tue, 21 Jan 2003, Xavier Leroy wrote:

> Nickolay Semyonov-Kolchin asks:
> 
> > But then this brings up the issue of conformity vr.s performance.  For
> > example- the x86 has its 80-bit FP registers in 8087-legacy mode, but
> > 64-bit registers if you're using SSE2.  And PowerPC and PA-RISC both have
> > extended precision fused multiply-adds (that keep higher precision, i.e.
> > don't round, between the multiply and the adds).
> 
> > For that matter, could a 
> > "conforming" implementation of Ocaml use the 32-bit single precision SSE-1 
> > registers?
> 

I think these were me.  But thanks for answering them.

BTW, I'm willing to try out your code.  I'm writting some numerical code 
for Ocaml, and like all the optimization I can get :-).

Brian


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: Coyote Gulch test in Caml (was Re: [Caml-list] speed )
  2003-01-21 13:09                 ` David Chase
  2003-01-21 13:15                   ` Daniel Andor
@ 2003-01-21 20:26                   ` Nickolay Semyonov-Kolchin
  1 sibling, 0 replies; 29+ messages in thread
From: Nickolay Semyonov-Kolchin @ 2003-01-21 20:26 UTC (permalink / raw)
  To: David Chase, caml-list

On Tuesday 21 January 2003 18:09, David Chase wrote:
> At 02:39 AM 1/21/2003 +0500, Nickolay Semyonov-Kolchin wrote:
> >Speed and accuracy are different things. Matlab class software need
> > accuracy, most computer games need speed. This is the reason of
> > "-ffast-math" key in gcc. Ocaml lacks such key, and always produce
> > ineffecient floating-point code.
>
> But how much accuracy do computer games need?  First-class implementations
> of sin/cos in software are quite fast, indeed faster (in certain
> non-trivial ranges) than the hardware itself.  If it happens that you could
> determine how *little* accuracy you actually need, it could go faster yet
> :-).
>

That depends. "Computer games" was just an example of software that don't 
require great accuracy.

Nickolay
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Re: Coyote Gulch test in Caml
  2003-01-21  9:56           ` [Caml-list] Re: Coyote Gulch test in Caml Xavier Leroy
  2003-01-21 15:57             ` Brian Hurt
@ 2003-01-27 16:58             ` Daniel Andor
  2003-01-28  8:27               ` Christian Lindig
  1 sibling, 1 reply; 29+ messages in thread
From: Daniel Andor @ 2003-01-27 16:58 UTC (permalink / raw)
  To: caml-list; +Cc: Brian Hurt, Xavier Leroy

On Tuesday 21 January 2003 9:56 am, Xavier Leroy wrote:
[snip various optimisationsm, including:]
> I also played with a "-ffast-math" flag for ocamlopt, whereas some math
> functions (sin, cos, sqrt, log) are directly expanded into x86
> instructions.  With this, we get 85% of the performance of g++ -O3,
> which isn't bad, and 2/3 of the performance of g++ -O3 -ffast-math.
>
> At any rate, the changes above to the OCaml code generator need to be
> tested more before possible inclusion in the next release.  Never
> trust code that you wrote in an airplane, especially while fighting
> for the armrest with an elderly central European lady who doesn't
> understand any of the languages that you speak :-)

Would it be possible to try some of these optimisation patches, if you have 
them? (such as float storing, inlining, SSE2?) 

I would be particularly interested to see how you include it in the ocamlopt 
compiler code, which I'm trying to understand...

Incidentally, what's a good way to learn the layout and functioning of the 
asmcomp compiler? (docs?)

Thanks,
Daniel.
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] Re: Coyote Gulch test in Caml
  2003-01-27 16:58             ` Daniel Andor
@ 2003-01-28  8:27               ` Christian Lindig
  0 siblings, 0 replies; 29+ messages in thread
From: Christian Lindig @ 2003-01-28  8:27 UTC (permalink / raw)
  To: Daniel Andor; +Cc: caml-list

On Mon, Jan 27, 2003 at 04:58:37PM +0000, Daniel Andor wrote:
> Incidentally, what's a good way to learn the layout and functioning of the 
> asmcomp compiler? (docs?)

I once gave a talk for an internal compiler series that looked at the
intermediate representation of the OCaml native code compiler. You can
find the slides "Data Structures in the O'Caml Back End" on my home page
under 'talks'. The slides contain mostly data structures rather than
explanations but might serve as a road map.

-- Christian

-- 
Christian Lindig         http://www.eecs.harvard.edu/~lindig/
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Caml-list] speed
@ 2003-01-07 16:03 isaac gouy
  0 siblings, 0 replies; 29+ messages in thread
From: isaac gouy @ 2003-01-07 16:03 UTC (permalink / raw)
  To: caml-list

>> Java has huge start up costs... 
>> Lies, damned lies, and cross-language benchmarks.

> I think it is a legitimate tactic.
> If your code can run in 100 milliseconds, I could 
> care less about performance. 

You're both right.
Benchmarks give invaluable insight into specific
problems.

If the specific problem involves long-running
processes then that's what we should be benchmarking:
it doesn't matter what happened in the first second.

If the specific problem involves starting a program
hundreds-of-times a day to do a small computation then
that's what we should be benchmarking: it doesn't
matter what the performance was after millions of
iterations.

best wishes, Isaac

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2003-01-30  7:41 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-01-03 16:00 [Caml-list] speed onlyclimb
2003-01-03 11:38 ` [Caml-list] speed Clemens Hintze
2003-01-03 11:47 ` [Caml-list] speed Noel Welsh
2003-01-02 16:45   ` Chet Murthy
2003-01-03 13:32 ` Xavier Leroy
2003-01-02 17:52   ` Chet Murthy
2003-01-03 14:53     ` Sven Luther
2003-01-03 15:28       ` Erol Akarsu
2003-01-02 17:53   ` Coyote Gulch test in Caml (was Re: [Caml-list] speed ) Chet Murthy
2003-01-03 15:10     ` Shawn Wagner
2003-01-03 15:56       ` Oleg
2003-01-04 18:31       ` Xavier Leroy
2003-01-18 22:49         ` Oleg
2003-01-18 23:50           ` Shawn Wagner
2003-01-20 21:23             ` David Chase
2003-01-20 21:39               ` Nickolay Semyonov-Kolchin
2003-01-21  0:54                 ` Brian Hurt
2003-01-21 13:09                 ` David Chase
2003-01-21 13:15                   ` Daniel Andor
2003-01-21 20:26                   ` Nickolay Semyonov-Kolchin
2003-01-19 10:33           ` Siegfried Gonzi
2003-01-19 10:34           ` Siegfried Gonzi
2003-01-21  9:56           ` [Caml-list] Re: Coyote Gulch test in Caml Xavier Leroy
2003-01-21 15:57             ` Brian Hurt
2003-01-27 16:58             ` Daniel Andor
2003-01-28  8:27               ` Christian Lindig
2003-01-05  1:13   ` [Caml-list] speed Brian Hurt
2003-01-05  1:48     ` Michael Vanier
2003-01-07 16:03 isaac gouy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).