caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Q: Efficient operations on complex numbers
@ 2004-01-23 10:24 Jan Kybic
  2004-01-24  0:31 ` Nuutti Kotivuori
  0 siblings, 1 reply; 5+ messages in thread
From: Jan Kybic @ 2004-01-23 10:24 UTC (permalink / raw)
  To: caml-list

Hello,
        I would appreciate if you could give me same help on the
following topic:

Q: How to make efficient computation with complex values in OCaml?

Subquestions:

Q1: Can the boxing of arguments be avoided?

Q2: My ocamlopt (from Linux Mandrake ocaml-3.06-12mdk.i586.rpm) does
    not like the '-ffast-math' switch. Why?

Q2: Is OCaml the right language for me, is there any other 
    you would recommend for me to consider?

Background
==========

I want to develop a fairly complex numerical algorithm
involving hierarchical Boundary Element Method for the electromagnetic
problem, i.e. working with surface meshes, huge data, complicated data
structures, and computational kernel using complex numbers.
 
My first attempt at coding it was in C++ but I did not achieve to
program the full version of the algorithm, since I was simply getting
lost in details of the implementation, handling of the data etc. 

My second attempt was to recode the algorithm in Haskell using ghc. This went
fine and I was quite enjoying it. However, when I started to test the
program on realistic size problems. I found that unwanted laziness
kept creeping in that I could neither detect nor control. The program
consumed a lot of memory and was quite slow, even though the
computational kernel was fast enough. I improved
things a little by adding `seq` all over the place but I gave up in
the end.

My third attempt was to code again in C++, this time using a
third-party library. It went fine while I was doing simple things but
then what I could not make the library do what I needed and with the
limited documentation I had, I was not able to extend it.

So here I am, trying to do the whole thing again, this time in OCaml,
because I hope it would allow me to design the data
structures I will need and at the same time the resulting algorithm
should be sufficiently efficiently compiled, run fast and not used too
much memory.

My problem
==========

So far, I have implemented a part of the computational kernel in
OCaml that calculates approximations using spherical harmonics. It
uses complex values. The Ocaml version is basically is straightforward
rewrite of the C++ version and is almost exactly twice as slow as the
C++ version. This is intriguing because the parts code that only uses real
arithmetic is about as fast as the corresponding C++ version. So why
is working with complex numbers so expensive in OCaml and what can be
done about it?

Experiments
===========

1) I compiled the following trivial function (using ocamlopt on a Linux
box):

    let calc_sqrt _ = 
      let rec calc_sqrt' x i = 
        if i<=0 then x 
        else calc_sqrt' (0.5 *. ( x +. 2.0 /. x )) (i-1)
      in calc_sqrt' 2.0 10000000

It takes 0.4s while the corresponding C++ version 

    double calc_sqrt()
    {
      double x=2.0 ; 
      for (int i=0;i<10000000;i++)  x=0.5 * (x+2.0/x) ;
      return x ;
    }

takes 0.34s. That's fair enough, very result good for OCaml indeed.

2) I did the same thing, only with complex numbers from the Complex
module:

    let calc_sqrt_c _ = 
      let two = { Complex.re=2.0 ; Complex.im=0.0 } and 
          half = { Complex.re=0.5 ; Complex.im=0.0 }
      in
      let rec calc_sqrt_c' x i = 
        if i<=0 then x 
        else calc_sqrt_c' (Complex.mul half 
               ( Complex.add x (Complex.div two x ))) 
                             (i-1)
      in calc_sqrt_c' two 10000000

and in C++:

        
    typedef std::complex<double> complex_type ;

    complex_type calc_sqrt_c()
    {
      complex_type two(2.0,0), half(0.5,0) ; 
      complex_type x=two ;
      for (int i=0;i<10000000;i++)  x=half * (x+two/x) ;
      return x ;
    }


Here however, the OCaml version takes 1.25s against 0.56s in C++,
i.e. a 2-fold slowdown.

3) I could improve the OCaml time to 0.98s by including the code from
   the Complex module in my source file. (Is there a way of achieving
   the same result without actually copying it?) On the other hand, it
   did not matter whether I used tuples or records to represent the
   complex numbers, whether I defined the 'add' and 'mul' functions
   locally or at the top level, and whether I asked the compiler to
   inline or not. 

Speculation
===========

I deduce, that the main factor contributing to the slowdown, is the
boxing and unboxing of values. Can this be optimized away? Would MLton
do it? 

Conclusions
===========

I can live with 2-fold slowdown with respect to C++ if this permits me
to actually write the program. However, it seems to be a shame that
OCaml can generate code on par with C++ in some cases (real
arithmetic) but not in others. So if there is a way to do it, I would
like to know.

Thank you,
                Jan


-- 
-------------------------------------------------------------------------
Jan Kybic <kybic@ieee.org>                  tel. +420 2 2435 7264
       or <kybic@fel.cvut.cz>,     http://cmp.felk.cvut.cz/~kybic

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Q: Efficient operations on complex numbers
  2004-01-23 10:24 [Caml-list] Q: Efficient operations on complex numbers Jan Kybic
@ 2004-01-24  0:31 ` Nuutti Kotivuori
  2004-01-24  9:36   ` Xavier Leroy
  0 siblings, 1 reply; 5+ messages in thread
From: Nuutti Kotivuori @ 2004-01-24  0:31 UTC (permalink / raw)
  To: Jan Kybic; +Cc: caml-list

Jan Kybic wrote:
> 3) I could improve the OCaml time to 0.98s by including the code
> from the Complex module in my source file. (Is there a way of
> achieving the same result without actually copying it?) On the other
> hand, it did not matter whether I used tuples or records to
> represent the complex numbers, whether I defined the 'add' and 'mul'
> functions locally or at the top level, and whether I asked the
> compiler to inline or not.

I am not well versed enough to answer your other questions properly,
but this made me think there might be something fishy going around.

OCaml doesn't inline functions from other compilation units right
now. So inlining would be the natural cause of getting the speed
increase.

The OCaml compiler inlines by default - and turning off that inlining
isn't all that obvious. Calling 'ocamlopt -inline 0' will not turn it
off; but it will only inline if it doesn't increase code size. If you
truly wish to turn inlining off, you need to say 'ocamlopt -inline
-1'. And in the opposite direction it's similar. There is no simple
inlining enabled rule, but you have to specify the aggressivity to
inline yourself, for example by saying 'ocamlopt -inline 100'.

But, I have no idea if these will really make a difference, or even if
I have totally missed something here. Hopefully someone who knows a
bit more about OCaml performance with floats will chime in to tell
about it.

-- Naked

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Q: Efficient operations on complex numbers
  2004-01-24  0:31 ` Nuutti Kotivuori
@ 2004-01-24  9:36   ` Xavier Leroy
  2004-01-24 15:40     ` Nuutti Kotivuori
  2004-01-26  9:00     ` Jan Kybic
  0 siblings, 2 replies; 5+ messages in thread
From: Xavier Leroy @ 2004-01-24  9:36 UTC (permalink / raw)
  To: Nuutti Kotivuori; +Cc: Jan Kybic, caml-list

> > 3) I could improve the OCaml time to 0.98s by including the code
> > from the Complex module in my source file. (Is there a way of
> > achieving the same result without actually copying it?) On the other
> > hand, it did not matter whether I used tuples or records to
> > represent the complex numbers, whether I defined the 'add' and 'mul'
> > functions locally or at the top level, and whether I asked the
> > compiler to inline or not.
> 
> I am not well versed enough to answer your other questions properly,
> but this made me think there might be something fishy going around.
> 
> OCaml doesn't inline functions from other compilation units right
> now.

Yes, it does, if the .cmx files for the other compilation units are
available.  So, I'm a bit surprised about what Jan Kybic observed.
What I would have expected is that referring to the Complex module or
copying it inside the source file shouldn't change the generated
code, at least if you stay at the default inlining level.

Whether a function from a module M is inlinable or not is determined
when A is compiled, as a function of the inlining level (the -inline option)
given for that compilation.  Hence, copying the source for Complex in
your module and compiling with high -inline could allow more inlining
than what you'd get by referring to the precompiled Complex and
setting a high -inline level.  But you should get the same results by
recompiling Complex with the same -inline level.

Moreover, using tuples instead of records to represent complex numbers
results in additional boxing and should therefore be somewhat slower.

Coming back to Jan's initial question:

> > So far, I have implemented a part of the computational kernel in
> > OCaml that calculates approximations using spherical harmonics. It
> > uses complex values. The Ocaml version is basically is straightforward
> > rewrite of the C++ version and is almost exactly twice as slow as the
> > C++ version. This is intriguing because the parts code that only uses real
> > arithmetic is about as fast as the corresponding C++ version. So why
> > is working with complex numbers so expensive in OCaml and what can be
> > done about it?

You're correct that boxing (of complex numbers) contributes
significantly to the overhead.  For float arithmetic, the OCaml
compiler has special local unboxing elimination rules that enables it
to avoid the overhead of boxing in many situations (but not always).
These unboxing optimizations don't extend to data structures such as
complex numbers.

This said, a slowdown by a factor of 2 due to boxing alone is a bit
surprising -- usually, it's more like 1.5.  There might be other
tricks that the C++ compiler plays and that OCaml doesn't.  I'll look
at your code more carefully when I have some more time.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Q: Efficient operations on complex numbers
  2004-01-24  9:36   ` Xavier Leroy
@ 2004-01-24 15:40     ` Nuutti Kotivuori
  2004-01-26  9:00     ` Jan Kybic
  1 sibling, 0 replies; 5+ messages in thread
From: Nuutti Kotivuori @ 2004-01-24 15:40 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Jan Kybic, caml-list

Xavier Leroy wrote:
>> OCaml doesn't inline functions from other compilation units right
>> now.
>
> Yes, it does, if the .cmx files for the other compilation units are
> available.

Ah! Sorry!

Hmm. That might actually create some troubles for me, I'll see about
it. Have to make sure the .cmx files are available in some cases and
are not in some other.

Thanks for setting me straight!

-- Naked

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Caml-list] Q: Efficient operations on complex numbers
  2004-01-24  9:36   ` Xavier Leroy
  2004-01-24 15:40     ` Nuutti Kotivuori
@ 2004-01-26  9:00     ` Jan Kybic
  1 sibling, 0 replies; 5+ messages in thread
From: Jan Kybic @ 2004-01-26  9:00 UTC (permalink / raw)
  To: caml-list

Xavier Leroy:
> Yes, it does, if the .cmx files for the other compilation units are
> available.  So, I'm a bit surprised about what Jan Kybic observed.
> What I would have expected is that referring to the Complex module or
> copying it inside the source file shouldn't change the generated
> code, at least if you stay at the default inlining level.

Thank you for your reply. Yes, you are right, if I inline at
sufficiently high level (say 'ocamplopt -inline 10), then using the
Complex module is as fast as copying the code. 

As for the slowdown, it apparently depends on the compiler and/or
processor. On my system:

    Intel(R) Pentium(R) M processor 1400MHz
    Mandrake Linux release 9.1 (Bamboo) for i586
    gcc version 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk)
    The Objective Caml native-code compiler, version 3.07

I find that Ocaml program takes about 170~200% of the time of the C++
version, while on Christophe TROESTLER's machine (Intel(R) Pentium(R)
4 CPU 2.40GHz)), who kindly repeated my experiments, it was just
140%. So this is really a complex issue with many variables.

If I gain more insight into it, I will keep you posted.

Jan

-- 
-------------------------------------------------------------------------
Jan Kybic <kybic@ieee.org>                  tel. +420 2 2435 7264
       or <kybic@fel.cvut.cz>,     http://cmp.felk.cvut.cz/~kybic

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2004-01-26  9:00 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-23 10:24 [Caml-list] Q: Efficient operations on complex numbers Jan Kybic
2004-01-24  0:31 ` Nuutti Kotivuori
2004-01-24  9:36   ` Xavier Leroy
2004-01-24 15:40     ` Nuutti Kotivuori
2004-01-26  9:00     ` Jan Kybic

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).