caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Comparison of OCaml and MLton for numerics
@ 2007-05-31  5:50 Yuanchen Zhu
  2007-05-31  6:17 ` [Caml-list] " Jon Harrop
                   ` (3 more replies)
  0 siblings, 4 replies; 71+ messages in thread
From: Yuanchen Zhu @ 2007-05-31  5:50 UTC (permalink / raw)
  To: caml-list

Greetings!

I've been using Ocaml for implementing computer graphics algorithms.
One of the algorithms that I implemented was:

Li etal, "Compressing and Companding High Dynamic Range Images with
Subband Architectures"

For intellectual entertainment, I next converted the code to SML, and
got some interesting performance numbers on Ocaml and MLton. Since the
implementation is not a trivial micro benchmark, I thought I'd share
it with the group. I also have some questions regarding the future of
ocaml.

The most expensive part of the algorithm is image convolution. The
convolution kernel is separable, so the actual computation is done for
one horizontal/vertical strip at a time, i.e., it is 1-d convolution.

For rapid prototyping and easier maintenance, I used high-order
functions a lot in my code. For example, the horizontal convolution
routine looks something like the following:

let hconvolve kern (img:t) r =
  let ac y x s (kx, ky) = s +. ky *. getReflected img y (x + kx) 1.0 r in
    mapi (fun y x _ -> Array.fold_left (ac y x) 0.0 kern) img

Where the mapi function is basically a 2D version of Array.mapi.

The code is compiled using ocamlopt with arguments: -inline 4 -S
-ffast-math -unsafe. Next I translated the Caml code to Standard ML
and compiled using MLton with no additional arguments.

The performance numbers were as following:

Ocaml (unsafe) : user: 39.674s, real: 41.356s
MLton (safe):  user:  17.981s, real: 21.968s

Note that I didn't find an option that turns off array bound-checking
for MLton, so the MLton version is safe.

Clearly polymorphic function application in the inner-loop is killing
Ocaml. So I recoded the convolution function using for-loops. The code
is now much uglier:

let hconvolve kern (img:t) r =
  let sup = Array.length kern - 1 in
  let img' = create (size img) in
    for y = 0 to height img - 1 do
      for x = 0 to width img - 1 do
        let s = ref 0.0 in
          for i = 0 to sup do
            let (kx, ky) = kern.(i) in
              s := !s +. ky *. getReflected img y (x + kx) 1.0 r
          done;
          img'.(y).(x) <- !s;
      done
    done;
    img'

The new running time is:

Ocaml (unsafe) : user: 21.477s, real: 23.366s

which is much in line with MLton:

MLton (safe):  user:  17.981s, real: 21.968s

Although note that the MLton version has array-bound check enabled and
used the two-line high order function version of hconvolve.

So the moral of the story: To use Ocaml for numerically intensive
work, code in C style in the inner loops.

This brings me to the next question: is there any plan to implement
type specialization optimization for ocamlopt? For numerics, this is
really crucial if you want write both in an elegant functional style
and get good performance. Also, I remember reading somewhere that the
current code base of Ocaml is ill-suited for implementing this kind of
optimization. May I ask what exactly needs to be done to the current
code base in order to support that? I have some compiler-writing
background and this sounds like an interesting project to work in my
past time.


Regards,
Yuanchen


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  5:50 Comparison of OCaml and MLton for numerics Yuanchen Zhu
@ 2007-05-31  6:17 ` Jon Harrop
  2007-05-31  6:32   ` skaller
  2007-05-31  7:31   ` Yuanchen Zhu
  2007-05-31  7:11 ` Daniel Bünzli
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 71+ messages in thread
From: Jon Harrop @ 2007-05-31  6:17 UTC (permalink / raw)
  To: caml-list

On Thursday 31 May 2007 06:50:05 Yuanchen Zhu wrote:
> The performance numbers were as following:
>
> Ocaml (unsafe) : user: 39.674s, real: 41.356s
> MLton (safe):  user:  17.981s, real: 21.968s

You may be interested to know that there are no optimizing SML compilers for 
AMD64, which is a much better platform for numerical work:

  http://www.ffconsultancy.com/languages/ray_tracer/results.html

OCaml is over 60% faster on this benchmark.

Having said that, I notice that twice as many people are downloading the x86 
demos on my site compared to the x64.

> let hconvolve kern (img:t) r =
>   let sup = Array.length kern - 1 in
>   let img' = create (size img) in
>     for y = 0 to height img - 1 do
>       for x = 0 to width img - 1 do
>         let s = ref 0.0 in
>           for i = 0 to sup do
>             let (kx, ky) = kern.(i) in
>               s := !s +. ky *. getReflected img y (x + kx) 1.0 r

I can think of various ways to rearrange this that might help performance.

> The new running time is:
>
> Ocaml (unsafe) : user: 21.477s, real: 23.366s

What is the running time for safe OCaml?

> which is much in line with MLton:
>
> MLton (safe):  user:  17.981s, real: 21.968s

What platforms and architectures did you benchmark on? May we have the code to 
benchmark it ourselves?

> Although note that the MLton version has array-bound check enabled and
> used the two-line high order function version of hconvolve.

You might also try an FFT-based convolution if your filter is dense.

> So the moral of the story: To use Ocaml for numerically intensive
> work, code in C style in the inner loops.

Absolutely.

> This brings me to the next question: is there any plan to implement
> type specialization optimization for ocamlopt? For numerics, this is
> really crucial if you want write both in an elegant functional style
> and get good performance. Also, I remember reading somewhere that the
> current code base of Ocaml is ill-suited for implementing this kind of
> optimization. May I ask what exactly needs to be done to the current
> code base in order to support that? I have some compiler-writing
> background and this sounds like an interesting project to work in my
> past time.

Writing OCaml programs that generate OCaml programs is by far your best bet 
here. We use a replacement standard library that uses autogenerated code to 
eliminate boxing and perform unrolling and type specialization where 
possible.

As I can autogenerate my code, I would much rather the OCaml developers 
concentrated on things that I cannot get around, like the lack of a 32-bit 
float storage type and a more efficient internal representation of complex 
numbers.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  6:17 ` [Caml-list] " Jon Harrop
@ 2007-05-31  6:32   ` skaller
  2007-05-31  7:31   ` Yuanchen Zhu
  1 sibling, 0 replies; 71+ messages in thread
From: skaller @ 2007-05-31  6:32 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Thu, 2007-05-31 at 07:17 +0100, Jon Harrop wrote:
> On Thursday 31 May 2007 06:50:05 Yuanchen Zhu wrote:
> > The performance numbers were as following:
> >
> > Ocaml (unsafe) : user: 39.674s, real: 41.356s
> > MLton (safe):  user:  17.981s, real: 21.968s
> 
> You may be interested to know that there are no optimizing SML compilers for 
> AMD64, which is a much better platform for numerical work:
> 
>   http://www.ffconsultancy.com/languages/ray_tracer/results.html

I'm afraid your information is out of date. MLton runs on AMD64 now.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  5:50 Comparison of OCaml and MLton for numerics Yuanchen Zhu
  2007-05-31  6:17 ` [Caml-list] " Jon Harrop
@ 2007-05-31  7:11 ` Daniel Bünzli
  2007-05-31 15:15 ` Christophe Raffalli
  2007-05-31 15:16 ` Christophe Raffalli
  3 siblings, 0 replies; 71+ messages in thread
From: Daniel Bünzli @ 2007-05-31  7:11 UTC (permalink / raw)
  To: Yuanchen Zhu; +Cc: caml-list


Le 31 mai 07 à 07:50, Yuanchen Zhu a écrit :

> So the moral of the story: To use Ocaml for numerically intensive
> work, code in C style in the inner loops.

Look here http://caml.inria.fr/pub/old_caml_site/ocaml/numerical.html  
for an explanation.

Daniel


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  6:17 ` [Caml-list] " Jon Harrop
  2007-05-31  6:32   ` skaller
@ 2007-05-31  7:31   ` Yuanchen Zhu
  2007-05-31  9:08     ` Jon Harrop
  2007-05-31 12:12     ` skaller
  1 sibling, 2 replies; 71+ messages in thread
From: Yuanchen Zhu @ 2007-05-31  7:31 UTC (permalink / raw)
  To: caml-list

> > The new running time is:
> >
> > Ocaml (unsafe) : user: 21.477s, real: 23.366s
>
> What is the running time for safe OCaml?

Safe OCaml adds another 4.5s.

>
> > which is much in line with MLton:
> >
> > MLton (safe):  user:  17.981s, real: 21.968s
>
> What platforms and architectures did you benchmark on? May we have the code to
> benchmark it ourselves?

I am on a Pentium M 1.8GHz with 1GB mem, running Ubuntu 7.04, with
mlton_20061107 and ocaml 3.09.2. I've packed the code and uploaded it
at:

http://www.people.fas.harvard.edu/~yzhu/hdrRc.tar.bz2

Just wget it and unpack, go into the hdrRc directory, and run
./build-and-test.sh. Take a look at build-and-test.sh for the build
and test parameters.

I couldn't get Mlton to include the SMLNJ-lib's ArrayQSort module, so
I just copied the file containing it into the same directory. Also,
the Ocaml version parses its parameters (using the incredibly helpful
Arg module) so the executable requires some extra argument. The SML
version just hard code the parameters.

> You might also try an FFT-based convolution if your filter is dense.

It's true that I might be able to get better performance using FFT, or
even using a GPU based convolution, if I'm willing to invest a lot
more time and energy. But wouldn't it be nice to be able to write in
terse high-level style, but at the same time be able to compile it to
exectuable with performance close to C?

> > This brings me to the next question: is there any plan to implement
> > type specialization optimization for ocamlopt? For numerics, this is
> > really crucial if you want write both in an elegant functional style
> > and get good performance. Also, I remember reading somewhere that the
> > current code base of Ocaml is ill-suited for implementing this kind of
> > optimization. May I ask what exactly needs to be done to the current
> > code base in order to support that? I have some compiler-writing
> > background and this sounds like an interesting project to work in my
> > past time.
>
> Writing OCaml programs that generate OCaml programs is by far your best bet
> here. We use a replacement standard library that uses autogenerated code to
> eliminate boxing and perform unrolling and type specialization where
> possible.
>
> As I can autogenerate my code, I would much rather the OCaml developers
> concentrated on things that I cannot get around, like the lack of a 32-bit
> float storage type and a more efficient internal representation of complex
> numbers.

That sounds very interesting. Could you elaborate or give an example?


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  7:31   ` Yuanchen Zhu
@ 2007-05-31  9:08     ` Jon Harrop
  2007-05-31  9:22       ` Yuanchen Zhu
                         ` (2 more replies)
  2007-05-31 12:12     ` skaller
  1 sibling, 3 replies; 71+ messages in thread
From: Jon Harrop @ 2007-05-31  9:08 UTC (permalink / raw)
  To: caml-list

On Thursday 31 May 2007 08:31:04 Yuanchen Zhu wrote:
> http://www.people.fas.harvard.edu/~yzhu/hdrRc.tar.bz2

You are doing all this computation in your inner loop unnecessarily:

let getReflected (img:t) y x yr xr = 
  let w = width img and h = height img in
    if x >= 0 && x < w && y >= 0 && y < h then
      img.(y).(x)
    else
      let x = abs x and y = abs y in
      let xx = x/w and x' = x mod w and yy = y/h and y' = y mod h in
      let x',xr = if xx mod 2 = 0 then (x',1.0) else (w - 1 - x',xr) in
      let y',yr = if yy mod 2 = 0 then (y',1.0) else (h - 1 - y',yr) in
        img.(y').(x') *. xr *.yr

Hoist as much as you can from the inner loop and this program will run much 
faster in any language.

> > As I can autogenerate my code, I would much rather the OCaml developers
> > concentrated on things that I cannot get around, like the lack of a
> > 32-bit float storage type and a more efficient internal representation of
> > complex numbers.
>
> That sounds very interesting. Could you elaborate or give an example?

Complex numbers are unboxed in OCaml. If they were equivalent to a C struct 
then performance would be much better.

OCaml only handles 64-bit floats because there is no point in computing with 
32-bit floats any more. However, there is point in storing 32-bit floats as, 
when you have a lot of them, your program uses half as much heap and is twice 
as cache coherent. My ray tracer is an excellent example of a program that 
can benefit from this.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  9:08     ` Jon Harrop
@ 2007-05-31  9:22       ` Yuanchen Zhu
  2007-05-31 10:27         ` Jon Harrop
  2007-05-31  9:24       ` Yuanchen Zhu
  2007-05-31 10:25       ` Loup Vaillant
  2 siblings, 1 reply; 71+ messages in thread
From: Yuanchen Zhu @ 2007-05-31  9:22 UTC (permalink / raw)
  To: caml-list

>
> You are doing all this computation in your inner loop unnecessarily:
>
> let getReflected (img:t) y x yr xr =
>   let w = width img and h = height img in
>     if x >= 0 && x < w && y >= 0 && y < h then
>       img.(y).(x)
>     else
>       let x = abs x and y = abs y in
>       let xx = x/w and x' = x mod w and yy = y/h and y' = y mod h in
>       let x',xr = if xx mod 2 = 0 then (x',1.0) else (w - 1 - x',xr) in
>       let y',yr = if yy mod 2 = 0 then (y',1.0) else (h - 1 - y',yr) in
>         img.(y').(x') *. xr *.yr
>
> Hoist as much as you can from the inner loop and this program will run much
> faster in any language.

Yes, I am aware of the inefficiency here. Although because of branch
prediction on modern CPUs, the actual running time overhead would not
be that big.

My point, however, is that MLton and OCaml are being fed the same
code, and if OCaml performs specializing and proper inlining, it will
get almost twice its current performance.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  9:08     ` Jon Harrop
  2007-05-31  9:22       ` Yuanchen Zhu
@ 2007-05-31  9:24       ` Yuanchen Zhu
  2007-05-31 10:25       ` Loup Vaillant
  2 siblings, 0 replies; 71+ messages in thread
From: Yuanchen Zhu @ 2007-05-31  9:24 UTC (permalink / raw)
  To: caml-list

> > > As I can autogenerate my code, I would much rather the OCaml developers
> > > concentrated on things that I cannot get around, like the lack of a
> > > 32-bit float storage type and a more efficient internal representation of
> > > complex numbers.
> >
> > That sounds very interesting. Could you elaborate or give an example?
>
> Complex numbers are unboxed in OCaml. If they were equivalent to a C struct
> then performance would be much better.
>
> OCaml only handles 64-bit floats because there is no point in computing with
> 32-bit floats any more. However, there is point in storing 32-bit floats as,
> when you have a lot of them, your program uses half as much heap and is twice
> as cache coherent. My ray tracer is an excellent example of a program that
> can benefit from this.
>

Umm sorry. I mean the autogeneration part.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  9:08     ` Jon Harrop
  2007-05-31  9:22       ` Yuanchen Zhu
  2007-05-31  9:24       ` Yuanchen Zhu
@ 2007-05-31 10:25       ` Loup Vaillant
  2007-05-31 10:30         ` Jon Harrop
  2 siblings, 1 reply; 71+ messages in thread
From: Loup Vaillant @ 2007-05-31 10:25 UTC (permalink / raw)
  To: caml-list

2007/5/31, Jon Harrop <jon@ffconsultancy.com>:
>
> OCaml only handles 64-bit floats because there is no point in computing with
> 32-bit floats any more. However, there is point in storing 32-bit floats as,
> when you have a lot of them, your program uses half as much heap and is twice
> as cache coherent. My ray tracer is an excellent example of a program that
> can benefit from this.

Talking about lots of float, I suppose you meant arrays. In that case,
can't you use the bigarray module? Or does the overhead of the FFI
cancel out the benefit of the improved cache coherency?

(note: I know little about bigarrays, and nothing about your code)

Cheers,
Loup


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  9:22       ` Yuanchen Zhu
@ 2007-05-31 10:27         ` Jon Harrop
  2007-05-31 21:30           ` Alain Frisch
  0 siblings, 1 reply; 71+ messages in thread
From: Jon Harrop @ 2007-05-31 10:27 UTC (permalink / raw)
  To: caml-list

On Thursday 31 May 2007 10:22:34 Yuanchen Zhu wrote:
> Yes, I am aware of the inefficiency here. Although because of branch
> prediction on modern CPUs, the actual running time overhead would not
> be that big.

That is not true.

> My point, however, is that MLton and OCaml are being fed the same
> code, and if OCaml performs specializing and proper inlining, it will  
> get almost twice its current performance.

The OCaml compilers are designed to handle good code.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31 10:25       ` Loup Vaillant
@ 2007-05-31 10:30         ` Jon Harrop
  0 siblings, 0 replies; 71+ messages in thread
From: Jon Harrop @ 2007-05-31 10:30 UTC (permalink / raw)
  To: caml-list

On Thursday 31 May 2007 11:25:40 Loup Vaillant wrote:
> Talking about lots of float, I suppose you meant arrays.

Not necessarily big arrays. Look at the ray tracer, for example:

  http://www.ffconsultancy.com/languages/ray_tracer/

Lots of float^3 vectors, float * float^3 spheres and so on.

> In that case, 
> can't you use the bigarray module? Or does the overhead of the FFI
> cancel out the benefit of the improved cache coherency?

I couldn't get big arrays to give better performance here.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  7:31   ` Yuanchen Zhu
  2007-05-31  9:08     ` Jon Harrop
@ 2007-05-31 12:12     ` skaller
  1 sibling, 0 replies; 71+ messages in thread
From: skaller @ 2007-05-31 12:12 UTC (permalink / raw)
  To: Yuanchen Zhu; +Cc: caml-list, MLton developers

On Thu, 2007-05-31 at 15:31 +0800, Yuanchen Zhu wrote:
> > > The new running time is:
> > >
> > > Ocaml (unsafe) : user: 21.477s, real: 23.366s
> >
> > What is the running time for safe OCaml?
> 
> Safe OCaml adds another 4.5s.
> 
> >
> > > which is much in line with MLton:
> > >
> > > MLton (safe):  user:  17.981s, real: 21.968s

http://www.people.fas.harvard.edu/~yzhu/hdrRc.tar.bz2

Results on my box, amd64 single core 3200 Athlon, 1MG, Ubuntu 7.04:
remove the -align 8 from mlton, it crashes the experimental build,
remove -ffast-math from ocaml, this is not a valid option for 3.10:

MLton:        27.15
Unsafe Ocaml: 19.59
Safe Ocaml:   21.38

Note the mlton amd64 build is NOT optimised for machine
level performance (it's a bootstrap build being checked for
correctness).

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  5:50 Comparison of OCaml and MLton for numerics Yuanchen Zhu
  2007-05-31  6:17 ` [Caml-list] " Jon Harrop
  2007-05-31  7:11 ` Daniel Bünzli
@ 2007-05-31 15:15 ` Christophe Raffalli
  2007-05-31 15:23   ` Jon Harrop
  2007-05-31 15:16 ` Christophe Raffalli
  3 siblings, 1 reply; 71+ messages in thread
From: Christophe Raffalli @ 2007-05-31 15:15 UTC (permalink / raw)
  To: Doctor Jerri

Yuanchen Zhu a écrit :
>
>            let (kx, ky) = kern.(i) in

You should avoid array of tuples of float in OCaml, tuple of float are
(unfortunatly) not unboxed and array of tuple neither, so you have two
inderection (boxes) inside the array.
Either use array of records (you will have one box), one array of double
size or two separate arrays (no boxing at all). The difference should be
quite big (it would be interesting if you post the result).

One of the point of MLTon (if I remember well)  is to always unbox
floats  ... While in OCaml, you have to be carefull.

I really would like an "unboxed array" type in OCaml rather than a
special optimisation for floats. Then, the solution with an unboxed
array of records would be reasonnable.

Christophe Raffalli





^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31  5:50 Comparison of OCaml and MLton for numerics Yuanchen Zhu
                   ` (2 preceding siblings ...)
  2007-05-31 15:15 ` Christophe Raffalli
@ 2007-05-31 15:16 ` Christophe Raffalli
  3 siblings, 0 replies; 71+ messages in thread
From: Christophe Raffalli @ 2007-05-31 15:16 UTC (permalink / raw)
  To: caml-list

Yuanchen Zhu a écrit :
>
>            let (kx, ky) = kern.(i) in

You should avoid array of tuples of float in OCaml, tuple of float are
(unfortunatly) not unboxed and array of tuple neither, so you have two
inderection (boxes) inside the array.
Either use array of records (you will have one box), one array of double
size or two separate arrays (no boxing at all). The difference should be
quite big (it would be interesting if you post the result).

One of the point of MLTon (if I remember well)  is to always unbox
floats  ... While in OCaml, you have to be carefull.

I really would like an "unboxed array" type in OCaml rather than a
special optimisation for floats. Then, the solution with an unboxed
array of records would be reasonnable.

Christophe Raffalli






^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31 15:15 ` Christophe Raffalli
@ 2007-05-31 15:23   ` Jon Harrop
  2007-05-31 15:35     ` Christophe Raffalli
  0 siblings, 1 reply; 71+ messages in thread
From: Jon Harrop @ 2007-05-31 15:23 UTC (permalink / raw)
  To: caml-list

On Thursday 31 May 2007 16:15:50 Christophe Raffalli wrote:
> Either use array of records (you will have one box), one array of double
> size or two separate arrays (no boxing at all).

That would be good advice were kx and ky both floats but, alas, one is not. So 
using a record doesn't buy you anything (only all-float records are unboxed) 
and you cannot use a double length array. I tried two arrays and it was 
significantly faster.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31 15:23   ` Jon Harrop
@ 2007-05-31 15:35     ` Christophe Raffalli
       [not found]       ` <604682010705310923o5a1ee0eiee5ae697da9d3c60@mail.gmail.com>
  0 siblings, 1 reply; 71+ messages in thread
From: Christophe Raffalli @ 2007-05-31 15:35 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop a écrit :
> On Thursday 31 May 2007 16:15:50 Christophe Raffalli wrote:
>   
>> Either use array of records (you will have one box), one array of double
>> size or two separate arrays (no boxing at all).
>>     
>
> That would be good advice were kx and ky both floats but, alas, one is not. So 
> using a record doesn't buy you anything (only all-float records are unboxed) 
> and you cannot use a double length array. I tried two arrays and it was 
> significantly faster.
>
>   
Oh, I did not see that kx was an int ... So two records in the only 
solution (and it is also preferable for MLTon as I imagine that
array of tuple are not optimized ...)

Christophe Raffalli


^ permalink raw reply	[flat|nested] 71+ messages in thread

* [Caml-list] Comparison of OCaml and MLton for numerics
       [not found]       ` <604682010705310923o5a1ee0eiee5ae697da9d3c60@mail.gmail.com>
@ 2007-05-31 20:14         ` Stephen Weeks
  0 siblings, 0 replies; 71+ messages in thread
From: Stephen Weeks @ 2007-05-31 20:14 UTC (permalink / raw)
  To: caml-list

> Oh, I did not see that kx was an int ... So two records in the only
> solution (and it is also preferable for MLTon as I imagine that
> array of tuple are not optimized ...)

MLton does flatten arrays of tuples, depending on how they are used.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31 10:27         ` Jon Harrop
@ 2007-05-31 21:30           ` Alain Frisch
  2007-06-01  1:22             ` skaller
  0 siblings, 1 reply; 71+ messages in thread
From: Alain Frisch @ 2007-05-31 21:30 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop wrote:
>> My point, however, is that MLton and OCaml are being fed the same
>> code, and if OCaml performs specializing and proper inlining, it will  
>> get almost twice its current performance.
> 
> The OCaml compilers are designed to handle good code.

Could you elaborate? Do you mean that a code than would benefit from
inlining is not a good code?

 Alain


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-05-31 21:30           ` Alain Frisch
@ 2007-06-01  1:22             ` skaller
  2007-06-01  1:36               ` Erik de Castro Lopo
  2007-06-01  5:30               ` Alain Frisch
  0 siblings, 2 replies; 71+ messages in thread
From: skaller @ 2007-06-01  1:22 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Jon Harrop, caml-list

On Thu, 2007-05-31 at 23:30 +0200, Alain Frisch wrote:
> Jon Harrop wrote:
> >> My point, however, is that MLton and OCaml are being fed the same
> >> code, and if OCaml performs specializing and proper inlining, it will  
> >> get almost twice its current performance.
> > 
> > The OCaml compilers are designed to handle good code.
> 
> Could you elaborate? Do you mean that a code than would benefit from
> inlining is not a good code?

A general comment may explain this: some systems specifically
provide performance which is readily computable. For example
in the design of STL all the functions provided are fast
with specified O() performance. Slower functions like
'List.nth'  are not provided because the speed of a program
is not evident in the syntax.

So what I believe Jon and Xavier mean here is that the
Ocaml compilers compile code down to stuff which is easily
predicted in terms of the input syntax. no magic like
invariant code motion: What You See is What You Get.

The idea is that this gives the programmer *control* over
performance. It may require more work, but the lack of
'magic' which can defeat manual optimisation attempts is seen
as a virtue.

Basically the code is seen as that: an encoding of an algorithm.
If you want it to run faster, change your encoding.

The opposite approach -- to add as much smarts to the optimiser
as possible -- can generate much better code in many circumstances,
but it requires much more knowledge of complex internals by the
programmer to change the generated encoding where the magic didn't
work so well -- and in turn this puts pressure on the compiler vendor
to improve the 'smartness' of their optimisation heuristics ..
simply because on one else has the expertise to do so.

Someone (as usual no URL sorry) wrote a paper roughly titled
'guaranteed optimisations' which is actually an interesting
perspective on this whole scenario.

The fact is, no programmer can possible handle the complex
recoding an automatic algorithm can, so there is always going
to be a tension between 'do it yourself' and 'automagical'
optimisation strategies. 

Ocaml seems to pick a good mix. CF: dypgen GLR parser,
old version: 95++% of all compile time. New version with
recoding of data structures etc is down to about 20--%
of compile time .. it's over an order of magnitude faster.

IMHO: whilst quite a lot is known about how to optimise
executable code .. almost nothing is understood about how
to optimise data structures (automatically I mean).

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  1:22             ` skaller
@ 2007-06-01  1:36               ` Erik de Castro Lopo
  2007-06-01  2:21                 ` skaller
  2007-06-01  5:30               ` Alain Frisch
  1 sibling, 1 reply; 71+ messages in thread
From: Erik de Castro Lopo @ 2007-06-01  1:36 UTC (permalink / raw)
  To: caml-list

skaller wrote:

> Someone (as usual no URL sorry) wrote a paper roughly titled
> 'guaranteed optimisations' which is actually an interesting
> perspective on this whole scenario.

I found a bunch of slides titled "The Guaranteed Optimization
Clause of the Macro-Writer's Bill of Rights":

   http://www.cs.indiana.edu/~chaynes/danfest/dyb.pdf

Erik
-- 
-----------------------------------------------------------------
Erik de Castro Lopo
-----------------------------------------------------------------
"OSX - because making unix useable was easier than fixing windows."
    -- from slashdot


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  1:36               ` Erik de Castro Lopo
@ 2007-06-01  2:21                 ` skaller
  2007-06-01  2:49                   ` Erick Tryzelaar
  0 siblings, 1 reply; 71+ messages in thread
From: skaller @ 2007-06-01  2:21 UTC (permalink / raw)
  To: Erik de Castro Lopo; +Cc: caml-list

On Fri, 2007-06-01 at 11:36 +1000, Erik de Castro Lopo wrote:
> skaller wrote:
> 
> > Someone (as usual no URL sorry) wrote a paper roughly titled
> > 'guaranteed optimisations' which is actually an interesting
> > perspective on this whole scenario.
> 
> I found a bunch of slides titled "The Guaranteed Optimization
> Clause of the Macro-Writer's Bill of Rights":
> 
>    http://www.cs.indiana.edu/~chaynes/danfest/dyb.pdf

The paper i think of was a master or PhD thesis..

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  2:21                 ` skaller
@ 2007-06-01  2:49                   ` Erick Tryzelaar
  2007-06-01  3:05                     ` skaller
  0 siblings, 1 reply; 71+ messages in thread
From: Erick Tryzelaar @ 2007-06-01  2:49 UTC (permalink / raw)
  To: skaller; +Cc: Erik de Castro Lopo, caml-list

skaller wrote:
> On Fri, 2007-06-01 at 11:36 +1000, Erik de Castro Lopo wrote:
>   
>> skaller wrote:
>>
>>     
>>> Someone (as usual no URL sorry) wrote a paper roughly titled
>>> 'guaranteed optimisations' which is actually an interesting
>>> perspective on this whole scenario.
>>>       
>> I found a bunch of slides titled "The Guaranteed Optimization
>> Clause of the Macro-Writer's Bill of Rights":
>>
>>    http://www.cs.indiana.edu/~chaynes/danfest/dyb.pdf
>>     
>
> The paper i think of was a master or PhD thesis..
>
>   

How about this one?

Guaranteed Optimization for Domain-Specific Programming
http://citeseer.ist.psu.edu/veldhuizen03guaranteed.html


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  2:49                   ` Erick Tryzelaar
@ 2007-06-01  3:05                     ` skaller
  0 siblings, 0 replies; 71+ messages in thread
From: skaller @ 2007-06-01  3:05 UTC (permalink / raw)
  To: Erick Tryzelaar; +Cc: caml-list

On Thu, 2007-05-31 at 19:49 -0700, Erick Tryzelaar wrote:
> skaller wrote:

> How about this one?
> 
> Guaranteed Optimization for Domain-Specific Programming
> http://citeseer.ist.psu.edu/veldhuizen03guaranteed.html

Yep, that's the one. Thanks.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  1:22             ` skaller
  2007-06-01  1:36               ` Erik de Castro Lopo
@ 2007-06-01  5:30               ` Alain Frisch
  2007-06-01  5:39                 ` Jon Harrop
                                   ` (3 more replies)
  1 sibling, 4 replies; 71+ messages in thread
From: Alain Frisch @ 2007-06-01  5:30 UTC (permalink / raw)
  To: skaller; +Cc: Jon Harrop, caml-list

skaller wrote:
> So what I believe Jon and Xavier mean here is that the
> Ocaml compilers compile code down to stuff which is easily
> predicted in terms of the input syntax. no magic like
> invariant code motion: What You See is What You Get.

I generally agree that automatic optimizations should not change
too much the behavior of programs unless they are very easy to
understand and predict; indeed we don't want to rely on subtle
properties of the code and see the performances degrade unexpectedly
when we change a tiny thing.

Having said that, I don't see how opposing a more effective inlining and
specialization strategy would let the programmers write better code.
Quite the contrary (manual inlining? come on...) in my opinion. These
optimizations are not so much about producing more efficient programs;
there're about letting the programmer write cleaner code. So maybe the
current Ocaml compiler expects good code as Jon says, but it forces us
to write not-so-good one :-/  A functional programmer has reasons to
become schizophrenic if he must ask himself questions such as "should I
make this piece of code a function or write it inline?" or "should I
restrict the type of this polymorphic function?" or "should I use
Array.iter or write a for loop?" all the time. MLton's strength is that
you don't have to pay the price for abstraction, i.e. cleaning up your
program (by factorizing it or making it more modular) does not degrade
performance. I have no experience with MLton, but I don't believe that
performances are much more difficult to predict than with OCaml (Stephen?).

  Alain


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  5:30               ` Alain Frisch
@ 2007-06-01  5:39                 ` Jon Harrop
  2007-06-01  6:36                   ` Yuanchen Zhu
  2007-06-01  8:09                 ` skaller
                                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 71+ messages in thread
From: Jon Harrop @ 2007-06-01  5:39 UTC (permalink / raw)
  To: Alain Frisch; +Cc: caml-list

On Friday 01 June 2007 06:30:51 you wrote:
> Having said that, I don't see how opposing a more effective inlining and
> specialization strategy would let the programmers write better code.

My impression is that this is unsolved in the context of OCaml. The compiler 
lets you fiddle with the inlining threshold but the results are 
unpredictable: inlining often worsens performance in symbolic code.

> Quite the contrary (manual inlining? come on...) in my opinion. These
> optimizations are not so much about producing more efficient programs;
> there're about letting the programmer write cleaner code. So maybe the
> current Ocaml compiler expects good code as Jon says, but it forces us
> to write not-so-good one :-/

I see no better alternative. In this case, I don't even think a well-written 
version would be significantly longer. Look at my ray tracer, the fast OCaml 
is as ugly as the fast C++ even though the C++ compiler does many of the 
optimizations that people have cited.

> A functional programmer has reasons to 
> become schizophrenic if he must ask himself questions such as "should I
> make this piece of code a function or write it inline?" or "should I
> restrict the type of this polymorphic function?" or "should I use
> Array.iter or write a for loop?" all the time.

I think you should do whatever is clearest if you ask yourself such questions. 
Only rewrite in a more efficient form if the profiler tells you that you 
must.

I agree that more optimization by the OCaml compiler might be nice (e.g. 
specializing higher-order functions over float arrays) but I do not believe 
it would have helped in this case.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  5:39                 ` Jon Harrop
@ 2007-06-01  6:36                   ` Yuanchen Zhu
  0 siblings, 0 replies; 71+ messages in thread
From: Yuanchen Zhu @ 2007-06-01  6:36 UTC (permalink / raw)
  To: Jon Harrop; +Cc: Alain Frisch, caml-list

> I agree that more optimization by the OCaml compiler might be nice (e.g.
> specializing higher-order functions over float arrays) but I do not believe
> it would have helped in this case.
>
Haven't the tests shown that specializing higher-order functions would
help A LOT in this case? Running time is almost halfed when the
high-order function is manually expanded using for-loops and reference
cells, approaching that of MLton.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  5:30               ` Alain Frisch
  2007-06-01  5:39                 ` Jon Harrop
@ 2007-06-01  8:09                 ` skaller
  2007-06-01  8:53                   ` Richard Jones
  2007-06-01 11:29                 ` Yaron Minsky
  2007-06-01 14:15                 ` Stephen Weeks
  3 siblings, 1 reply; 71+ messages in thread
From: skaller @ 2007-06-01  8:09 UTC (permalink / raw)
  To: Alain Frisch; +Cc: Jon Harrop, caml-list

On Fri, 2007-06-01 at 07:30 +0200, Alain Frisch wrote:
> skaller wrote:
> > So what I believe Jon and Xavier mean here is that the
> > Ocaml compilers compile code down to stuff which is easily
> > predicted in terms of the input syntax. no magic like
> > invariant code motion: What You See is What You Get.
> 
> I generally agree that automatic optimizations should not change
> too much the behavior of programs unless they are very easy to
> understand and predict; indeed we don't want to rely on subtle
> properties of the code and see the performances degrade unexpectedly
> when we change a tiny thing.
> 
> Having said that, I don't see how opposing a more effective inlining and
> specialization strategy would let the programmers write better code.

Clearly I'm not opposing anything: just making a point that there
is a good reason to have 'dumb' code: not that the reason is universal
or pervasive.

> Quite the contrary (manual inlining? come on...) in my opinion. 

Well actually, inlining is a good example to consider IMHO.

Good inlining algorithms are very hard. Felix relies *heavily*
on inlining to create optimisation opportunities via monomorphisation,
parameter replacement, and simply eliminating calling overheads.

But still, the algorithm is crude and I found it necessary
to allow:

	fun f ...           // maybe inline
	inline fun f 	    // try REALLY HARD to inline
	noinline fun f      // never inline


Whilst this isn't "hand inlining" it is a weak form of manual 
control over a difficult algorithm.

>  A functional programmer has reasons to
> become schizophrenic if he must ask himself questions such as "should I
> make this piece of code a function or write it inline?"

hehe .. obviously best to start off schiz, then there's no
problem becoming so! :)

>  or "should I
> restrict the type of this polymorphic function?" or "should I use
> Array.iter or write a for loop?" all the time. 

Or ..

> MLton's strength is that
> you don't have to pay the price for abstraction, i.e. cleaning up your
> program (by factorizing it or making it more modular) does not degrade
> performance. 

.. choice of functors vs type variables for polymorphism? Which all
SML/Ocaml programmers have .. the language is clearly Schizoid here:)

We need a good doctor (theorist) to get rid of this malady.



-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  8:09                 ` skaller
@ 2007-06-01  8:53                   ` Richard Jones
  2007-06-01  8:59                     ` Richard Jones
  2007-06-01 10:02                     ` skaller
  0 siblings, 2 replies; 71+ messages in thread
From: Richard Jones @ 2007-06-01  8:53 UTC (permalink / raw)
  To: skaller; +Cc: Alain Frisch, caml-list

On Fri, Jun 01, 2007 at 06:09:27PM +1000, skaller wrote:
> But still, the algorithm is crude and I found it necessary
> to allow:
> 
> 	fun f ...           // maybe inline
> 	inline fun f 	    // try REALLY HARD to inline
> 	noinline fun f      // never inline

Actually it's more useful to control the inlining of function when
they are applied, rather than when they are defined.  You probably
only want f to be inlined in a few known places, and not inlined the
rest of the time (unless f is extremely trivial).

In more general terms, it was found that turning off inlining in the
Linux kernel reduced the code size by 25%:

  http://lwn.net/Articles/166172/

I bet that actually improved performance too, but unfortunately in the
article above they don't measure that.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  8:53                   ` Richard Jones
@ 2007-06-01  8:59                     ` Richard Jones
  2007-06-01  9:22                       ` Stephan Tolksdorf
  2007-06-01  9:32                       ` Stephan Tolksdorf
  2007-06-01 10:02                     ` skaller
  1 sibling, 2 replies; 71+ messages in thread
From: Richard Jones @ 2007-06-01  8:59 UTC (permalink / raw)
  To: caml-list

Another article on the same topic:

  http://lwn.net/Articles/82495/

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  8:59                     ` Richard Jones
@ 2007-06-01  9:22                       ` Stephan Tolksdorf
  2007-06-01  9:49                         ` Richard Jones
  2007-06-01  9:32                       ` Stephan Tolksdorf
  1 sibling, 1 reply; 71+ messages in thread
From: Stephan Tolksdorf @ 2007-06-01  9:22 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

Richard Jones wrote:
> Another article on the same topic:
> 
>   http://lwn.net/Articles/82495/
> 

3% probably lies well in the error of margin. I find this comment much 
more informative and in accordance with my experiences:

http://lwn.net/Articles/167474/

Without inlining a lot of modern C++ code would be unusable slow. As 
Alain said, inlining allows you to use abstractions without having to 
pay the usual penalty for it.

Regards,
   Stephan


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  8:59                     ` Richard Jones
  2007-06-01  9:22                       ` Stephan Tolksdorf
@ 2007-06-01  9:32                       ` Stephan Tolksdorf
  1 sibling, 0 replies; 71+ messages in thread
From: Stephan Tolksdorf @ 2007-06-01  9:32 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

Richard Jones wrote:
> Another article on the same topic:
> 
>   http://lwn.net/Articles/82495/
> 

3% probably lies well in the error of margin. I find this comment much
more informative and in accordance with my experiences:

http://lwn.net/Articles/167474/

Without inlining a lot of modern C++ code would be unusable slow. As
Alain said, inlining allows you to use abstractions without having to
pay the usual penalty for it.

Regards,
   Stephan


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  9:22                       ` Stephan Tolksdorf
@ 2007-06-01  9:49                         ` Richard Jones
  0 siblings, 0 replies; 71+ messages in thread
From: Richard Jones @ 2007-06-01  9:49 UTC (permalink / raw)
  To: Stephan Tolksdorf; +Cc: caml-list

On Fri, Jun 01, 2007 at 11:22:43AM +0200, Stephan Tolksdorf wrote:
> Richard Jones wrote:
> >Another article on the same topic:
> >
> >  http://lwn.net/Articles/82495/
> >
> 
> 3% probably lies well in the error of margin. I find this comment much 
> more informative and in accordance with my experiences:
> 
> http://lwn.net/Articles/167474/
> 
> Without inlining a lot of modern C++ code would be unusable slow. As 
> Alain said, inlining allows you to use abstractions without having to 
> pay the usual penalty for it.

Good point.  It just seems to confirm that you (or a compiler) cannot
possibly do good inlining unless you have a great deal of information
about the runtime behaviour of the program.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  8:53                   ` Richard Jones
  2007-06-01  8:59                     ` Richard Jones
@ 2007-06-01 10:02                     ` skaller
  1 sibling, 0 replies; 71+ messages in thread
From: skaller @ 2007-06-01 10:02 UTC (permalink / raw)
  To: Richard Jones; +Cc: Alain Frisch, caml-list

On Fri, 2007-06-01 at 09:53 +0100, Richard Jones wrote:
> On Fri, Jun 01, 2007 at 06:09:27PM +1000, skaller wrote:
> > But still, the algorithm is crude and I found it necessary
> > to allow:
> > 
> > 	fun f ...           // maybe inline
> > 	inline fun f 	    // try REALLY HARD to inline
> > 	noinline fun f      // never inline
> 
> Actually it's more useful to control the inlining of function when
> they are applied, rather than when they are defined. 

I plan to implement that too.

> In more general terms, it was found that turning off inlining in the
> Linux kernel reduced the code size by 25%:

> I bet that actually improved performance too, but unfortunately in the
> article above they don't measure that.

Yes, I believe that. Smaller is usually faster.

However in Felix it isn't quite so simple. The 'general' method
of calling a function involves:

(a) call 'malloc()' to allocate heap store
(b) link the store into the garbage collector
(c) construct the object by binding its environment in
(d) clone the object so recursion works
(e) apply the object by passing its apply() method the arguments
(f) later on, garbage collect the object

ALL these steps are eliminated by inlining, and that also
provides further optimisation opportunities. In particular,
it typically reduces all non-recursive function call chains
to a flat sequence of primitive monomorphic mutators.

The resulting code is VERY fast. The code with all closures
isn't quite so fast :)

The compiler does other optimisations, these are just
the extreme cases. 

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  5:30               ` Alain Frisch
  2007-06-01  5:39                 ` Jon Harrop
  2007-06-01  8:09                 ` skaller
@ 2007-06-01 11:29                 ` Yaron Minsky
  2007-06-01 11:43                   ` Erik de Castro Lopo
  2007-06-01 11:49                   ` David MENTRE
  2007-06-01 14:15                 ` Stephen Weeks
  3 siblings, 2 replies; 71+ messages in thread
From: Yaron Minsky @ 2007-06-01 11:29 UTC (permalink / raw)
  To: Alain Frisch; +Cc: skaller, caml-list

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

On 6/1/07, Alain Frisch <Alain.Frisch@inria.fr> wrote:
>
>  MLton's strength is that
> you don't have to pay the price for abstraction, i.e. cleaning up your
> program (by factorizing it or making it more modular) does not degrade
> performance. I have no experience with MLton, but I don't believe that
> performances are much more difficult to predict than with OCaml
> (Stephen?).


I could not agree with this sentiment more.  Stephen actually now works at
Jane Street, and since his arrival he's taught us a  number of techniques
for using modules and functors to organize and factor code more
effectively.  Some of these techniques were obviously born in the context of
Mlton, where they have no performance penalty.  It's downright annoying to
have to avoid these techniques in performance-sensitive code in OCaml.  In
other words, factoring out with functors and modules is good style, but
OCaml penalizes you for it.

y

[-- Attachment #2: Type: text/html, Size: 1276 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 11:29                 ` Yaron Minsky
@ 2007-06-01 11:43                   ` Erik de Castro Lopo
  2007-06-01 11:58                     ` Jon Harrop
  2007-06-01 12:40                     ` Erik de Castro Lopo
  2007-06-01 11:49                   ` David MENTRE
  1 sibling, 2 replies; 71+ messages in thread
From: Erik de Castro Lopo @ 2007-06-01 11:43 UTC (permalink / raw)
  To: caml-list

Yaron Minsky wrote:

> I could not agree with this sentiment more.  Stephen actually now works at
> Jane Street, and since his arrival he's taught us a  number of techniques
> for using modules and functors to organize and factor code more
> effectively.  Some of these techniques were obviously born in the context of
> Mlton, where they have no performance penalty.  It's downright annoying to
> have to avoid these techniques in performance-sensitive code in OCaml.  In
> other words, factoring out with functors and modules is good style, but
> OCaml penalizes you for it.

Can the defunctorizor help?

    http://www.lri.fr/~signoles/ocamldefun/manual.html

Erik
-- 
-----------------------------------------------------------------
Erik de Castro Lopo
-----------------------------------------------------------------
"He who writes the code gets to choose his license, and nobody
else gets to complain" -- Linus Torvalds


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 11:29                 ` Yaron Minsky
  2007-06-01 11:43                   ` Erik de Castro Lopo
@ 2007-06-01 11:49                   ` David MENTRE
  2007-06-01 14:41                     ` skaller
  2007-06-01 16:14                     ` Markus Mottl
  1 sibling, 2 replies; 71+ messages in thread
From: David MENTRE @ 2007-06-01 11:49 UTC (permalink / raw)
  To: Yaron Minsky; +Cc: Alain Frisch, caml-list, skaller

Hello,

2007/6/1, Yaron Minsky <yminsky@cs.cornell.edu>:
> In other words,
> factoring out with functors and modules is good style, but OCaml penalizes
> you for it.

A naive and somewhat provocative question: is the performance penalty
a real issue in your production code or just a known overhead that is
easily solved by having a more powerful computer? In other words, is
the complexity price of better optimizations justified considering its
real impact in production code?

And if you consider this debate in a more general view: OCaml has a
number of known deficiencies that pop up regularly on this list (new
calmp4 doc, handling of parallelism for multi-core machines, lack of a
recognised CPAN-like OCaml repository, etc.). We all know that the
INRIA team has limited manpower. On which topic should they invest
their time? How can we work, as a community, to solve those issues?

Yours,
david


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 11:43                   ` Erik de Castro Lopo
@ 2007-06-01 11:58                     ` Jon Harrop
  2007-06-01 13:49                       ` Julien Signoles
  2007-06-01 12:40                     ` Erik de Castro Lopo
  1 sibling, 1 reply; 71+ messages in thread
From: Jon Harrop @ 2007-06-01 11:58 UTC (permalink / raw)
  To: caml-list

On Friday 01 June 2007 12:43:26 Erik de Castro Lopo wrote:
> Yaron Minsky wrote:
> > I could not agree with this sentiment more.  Stephen actually now works
> > at Jane Street, and since his arrival he's taught us a  number of
> > techniques for using modules and functors to organize and factor code
> > more
> > effectively.  Some of these techniques were obviously born in the context
> > of Mlton, where they have no performance penalty.  It's downright
> > annoying to have to avoid these techniques in performance-sensitive code
> > in OCaml.  In other words, factoring out with functors and modules is
> > good style, but OCaml penalizes you for it.
>
> Can the defunctorizor help?
>
>     http://www.lri.fr/~signoles/ocamldefun/manual.html
>
> Erik

Indeed, after you defunctorize what performance penalties are left by modules?

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 11:43                   ` Erik de Castro Lopo
  2007-06-01 11:58                     ` Jon Harrop
@ 2007-06-01 12:40                     ` Erik de Castro Lopo
  2007-06-01 13:56                       ` Julien Signoles
  1 sibling, 1 reply; 71+ messages in thread
From: Erik de Castro Lopo @ 2007-06-01 12:40 UTC (permalink / raw)
  To: caml-list

Erik de Castro Lopo wrote:

> Can the defunctorizor help?
> 
>     http://www.lri.fr/~signoles/ocamldefun/manual.html

Oops, it seems that the ocamldefun will only compile with
ocaml-3.06.

Erik
-- 
-----------------------------------------------------------------
Erik de Castro Lopo
-----------------------------------------------------------------
"Microsoft treats security vulnerabilities as public relations
problems."  -- Bruce Schneier


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 11:58                     ` Jon Harrop
@ 2007-06-01 13:49                       ` Julien Signoles
  2007-06-01 14:18                         ` Stephen Weeks
  0 siblings, 1 reply; 71+ messages in thread
From: Julien Signoles @ 2007-06-01 13:49 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list


> Indeed, after you defunctorize what performance penalties are left by modules?

Indeed, really depends on the context ;-).

If you don't use many operations which come from a functor application 
(compared with the number of costly operations in your application), the 
performance penalty of a functor application is probably not relevant.

However, in some cases, defunctorization may produce a good speedup, 
especially if you use massive inlining (e.g. ocamlopt -inline 1000). On 
the contrary, defunctorization may produce cache problem because the size 
of the defunctorized code may be very bigger than the size of the initial 
code.

Julien
-- 
mailto:Julien.Signoles@lri.fr ; http://www.lri.fr/~signoles
"In theory, practice and theory are the same,
but in practice they are different" (Larry McVoy)


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 12:40                     ` Erik de Castro Lopo
@ 2007-06-01 13:56                       ` Julien Signoles
  0 siblings, 0 replies; 71+ messages in thread
From: Julien Signoles @ 2007-06-01 13:56 UTC (permalink / raw)
  To: Erik de Castro Lopo; +Cc: caml-list


>> Can the defunctorizor help?
>>
>>     http://www.lri.fr/~signoles/ocamldefun/manual.html
>
> Oops, it seems that the ocamldefun will only compile with
> ocaml-3.06.

You're right (unfortunatly). No plan in the next few months to produce a 
new version. Maybe latter but it does not completely depends on me...

Julien
-- 
mailto:Julien.Signoles@lri.fr ; http://www.lri.fr/~signoles
"In theory, practice and theory are the same,
but in practice they are different" (Larry McVoy)


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01  5:30               ` Alain Frisch
                                   ` (2 preceding siblings ...)
  2007-06-01 11:29                 ` Yaron Minsky
@ 2007-06-01 14:15                 ` Stephen Weeks
  2007-06-01 14:37                   ` Brian Hurt
  2007-06-01 14:39                   ` Eric Cooper
  3 siblings, 2 replies; 71+ messages in thread
From: Stephen Weeks @ 2007-06-01 14:15 UTC (permalink / raw)
  To: Alain Frisch; +Cc: skaller, caml-list

> MLton's strength is that you don't have to pay the price for abstraction,
> i.e. cleaning up your program (by factorizing it or making it more modular)
> does not degrade performance. I have no experience with MLton, but I don't
> believe that performances are much more difficult to predict than with OCaml
> (Stephen?).

Performance is much more difficult to predict with MLton than with OCaml.  Even
worse, with whole-program optimization a small change in one part of the program
can affect performance in another part.  That having been said, I would gladly
take that drawback in exchange for the benefits of whole-program optimization.
Eliminating the price of abstraction causes a change in mindset that helps one
to avoid the mistake of premature optimization and to worry more about
correctness and getting the code done.  Also, although the performance of MLton
is less predictable, it's not like the generated code is sometimes twice as fast
as it would be with separate compilation and sometimes twice as slow.  In
reality, it's always as fast, and often, when the optimizations kick in, it's
significantly faster (2, 3, 5 times).  BTW, I'm not specifically comparing OCaml
and MLton here -- I'm making the general observation that whole-program
optimization adds optimization opportunities.

Defunctorization is one example of the many whole-program optimizations in MLton
that allow more abstract code without penalties.  To answer a couple other
questions on the list -- defunctorization can indeed give significant
performance improvements, and after defunctorization, there are no performance
penalties left from modules (indeed, there are no modules left :-).  Also, the
performance penalty at Jane Street (and other places) from functors is real, and
causes people to rewrite code, manually duplicate code, etc.  And, it prevents
people from using functors in the first place, because they have a (conscious or
unconscious) understanding of the cost.

Unpredictable performance is a fact of life with computers.  It's not possible
to eliminate it, even with a sufficiently simple compilation approach.  There
are way too many other factors.  I think it's better to encourage people to
program in the cleanest way possible, and then to profile and improve their code
if necessary.  The same arguments that one can make for simple compilation
strategies leading to predictable performance could be used to argue that we
should all program in C or in assembly.


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 13:49                       ` Julien Signoles
@ 2007-06-01 14:18                         ` Stephen Weeks
  2007-06-01 14:43                           ` Julien Signoles
  2007-06-01 14:57                           ` Brian Hurt
  0 siblings, 2 replies; 71+ messages in thread
From: Stephen Weeks @ 2007-06-01 14:18 UTC (permalink / raw)
  To: Julien Signoles; +Cc: Jon Harrop, caml-list

> However, in some cases, defunctorization may produce a good speedup,
> especially if you use massive inlining (e.g. ocamlopt -inline 1000). On the
> contrary, defunctorization may produce cache problem because the size of the
> defunctorized code may be very bigger than the size of the initial code.

I've never observed this problem in practice using MLton, and don't know anyone
in the MLton world who has.  Has this actually been observed using the OCaml
defunctorizer?


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 14:15                 ` Stephen Weeks
@ 2007-06-01 14:37                   ` Brian Hurt
  2007-06-01 14:39                   ` Eric Cooper
  1 sibling, 0 replies; 71+ messages in thread
From: Brian Hurt @ 2007-06-01 14:37 UTC (permalink / raw)
  To: Stephen Weeks; +Cc: caml-list

Stephen Weeks wrote:

> The same arguments that one can make for simple compilation
> strategies leading to predictable performance could be used to argue 
> that we
> should all program in C or in assembly.


What's humorous about this statement is that most C compilers now are 
implementing a lot of these fancy optimizations, even whole-program 
optimizations (see LLVM), which makes predicting their performance 
equally tricky to predict. 

I think the real reason Ocaml doesn't have advanced optimizations and 
whole program analysis is just one of time vr.s value.  It hasn't been 
valuable enough to someone yet to take the time and put in the work to 
implement them.  The position of most people on this list, including 
INRIA, seems to be "it'd be nice, and we'd definately use it if it were 
available, but at the moment we're doing something more important/more 
interesting/else..."

Brian


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 14:15                 ` Stephen Weeks
  2007-06-01 14:37                   ` Brian Hurt
@ 2007-06-01 14:39                   ` Eric Cooper
  1 sibling, 0 replies; 71+ messages in thread
From: Eric Cooper @ 2007-06-01 14:39 UTC (permalink / raw)
  To: caml-list

On Fri, Jun 01, 2007 at 10:15:04AM -0400, Stephen Weeks wrote:
> I think it's better to encourage people to program in the cleanest
> way possible, and then to profile and improve their code if
> necessary.

It seems natural to use a version control system so that one can
evolve and maintain the "clean" trunk of the code, while keeping the
accumulated wisdom of manual performance optimizations as branches /
patch sets.  I wonder if there are any opportunities for integration
of this model into the language itself (type-safe patch sets?) or its
tools.

-- 
Eric Cooper             e c c @ c m u . e d u


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 11:49                   ` David MENTRE
@ 2007-06-01 14:41                     ` skaller
  2007-06-01 16:52                       ` Jon Harrop
  2007-06-01 16:14                     ` Markus Mottl
  1 sibling, 1 reply; 71+ messages in thread
From: skaller @ 2007-06-01 14:41 UTC (permalink / raw)
  To: David MENTRE; +Cc: Yaron Minsky, Alain Frisch, caml-list

On Fri, 2007-06-01 at 13:49 +0200, David MENTRE wrote:
> Hello,
> 
> 2007/6/1, Yaron Minsky <yminsky@cs.cornell.edu>:
> > In other words,
> > factoring out with functors and modules is good style, but OCaml penalizes
> > you for it.
> 
> A naive and somewhat provocative question: is the performance penalty
> a real issue in your production code or just a known overhead that is
> easily solved by having a more powerful computer? 

Many calculations such as financial option pricing have 
performance exponential in lookahead time. These calculations are run 
regularly with varying parameters.

Being able to run the calculations with twice the number of 
parameter values in an overnight run is valuable, and gives
the finance house an edge over their competitors.

Paying 4 times more dollars for a CPU that is twice as fast is a 
very expensive solution compared to an optimising compiler ..
and if you paid this money you'd be even MORE inclined to want
to use optimised software to get best use out of your investment.

So, in my opinion .. yes, performance matters, and faster CPUs
don't really help. However the choice of a good language like
Ocaml is also made on the basis of programmer performance,
not just run time performance (otherwise Jane St would be writing
everything in assembler .. :)

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 14:18                         ` Stephen Weeks
@ 2007-06-01 14:43                           ` Julien Signoles
  2007-06-01 14:57                           ` Brian Hurt
  1 sibling, 0 replies; 71+ messages in thread
From: Julien Signoles @ 2007-06-01 14:43 UTC (permalink / raw)
  To: Stephen Weeks; +Cc: Jon Harrop, caml-list


>> However, in some cases, defunctorization may produce a good speedup,
>> especially if you use massive inlining (e.g. ocamlopt -inline 1000). On the
>> contrary, defunctorization may produce cache problem because the size of 
>> the
>> defunctorized code may be very bigger than the size of the initial code.
>
> I've never observed this problem in practice using MLton, and don't know 
> anyone
> in the MLton world who has.  Has this actually been observed using the OCaml
> defunctorizer?

Indeed, I don't observe such a problem in practice but I guess it is 
possible because the size of the defunctorized code may be up to 20x 
(maybe more) the size of the initial code (observed in practice). Just 
combine this with massive inlining on very-big massively-functorized 
application...

Of course, MLton is much more used than ocamldefun. So I hope you're right.

Julien
-- 
mailto:Julien.Signoles@lri.fr ; http://www.lri.fr/~signoles
"In theory, practice and theory are the same,
but in practice they are different" (Larry McVoy)


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 14:18                         ` Stephen Weeks
  2007-06-01 14:43                           ` Julien Signoles
@ 2007-06-01 14:57                           ` Brian Hurt
  2007-06-01 15:40                             ` Alain Frisch
  2007-06-01 23:26                             ` skaller
  1 sibling, 2 replies; 71+ messages in thread
From: Brian Hurt @ 2007-06-01 14:57 UTC (permalink / raw)
  To: Stephen Weeks; +Cc: caml-list

Stephen Weeks wrote:

>> However, in some cases, defunctorization may produce a good speedup,
>> especially if you use massive inlining (e.g. ocamlopt -inline 1000). 
>> On the
>> contrary, defunctorization may produce cache problem because the size 
>> of the
>> defunctorized code may be very bigger than the size of the initial code.
>
>
> I've never observed this problem in practice using MLton, and don't 
> know anyone
> in the MLton world who has.  Has this actually been observed using the 
> OCaml
> defunctorizer?
>

Not with the Ocaml defunctorizor, but in other contexts I have indeed 
seen issues where inlining functions signifigantly decreased 
performance, due to cache thrashing.

And I know people (my dad) who've seen program sizes reduce by a factor 
of 3 with a one *word* change in the source code.  Short story: A base 
class in a large C++ function had an inline virtual destructor, which 
then had to be inlined everywhere in the code where an object that 
inherited from that class was being freed.  Removing the inline keyword 
signifigantly increased performance and radically decreased code size.  
The code change was opposed because "inlining functions makes code faster".

Another example I've seen, although it's smaller, is in branch 
prediction.  CPUs keep track, per branch, of wether branches tend to be 
taken or not.  Branch prediction is then used to speculatively execute 
code- but the problem is that if they're mispredicted, the cost is large 
(10-20+ clock cycles, smaller than the 100-350+ clock cycles of a cache 
miss, but still signifigant compared to the cost of a function call).  
They only keep track of a limited number of branches, however.  By 
inlining, and duplicating, the code, you're putting more pressure on the 
branch prediction logic, and are having more branches be mispredicted, 
with associated cost.

My experience has been that inlining is only a win in three cases: 1) 
where the function being inlined is so trivial and so small that the 
size and cost of the function call is the same as the rest of the 
function. Given that the size of a function call to a known location is 
like 5 bytes on the x86, and the cost of a function call the last time I 
measured it was like 1.5 cycles for the call, plus 1-2 cycles per 
argument, I mean really effin small and simple functions here.  Or, 2) 
where the function is only called from one place, or 3) where inlining 
opened up signifigant other optimization opportunities.  The classic 
example for Ocaml here is replacing a call to Pervasives.compare with an 
integer compare.  Most of the rest of the time, inlining is either a 
break even proposition, or often a loss.

Which is why I consider Linus Torvals "real programmer" attitude dumb.  
In the first two cases, the compiler can easily determine that the 
inlining is a good idea. Counting the cost or size of a function is easy 
enough, and counting the number of places where the function is called 
from real easy.  And the third case, where inlining opens up new 
possibilities for optimization- that almost has to be done by the 
compiler, as it depends upon what optimizations the compiler can, and 
will, apply to the newly inlined function.  This is something I trust 
the compiler to do more than I trust even me to do correctly.

Brian


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 14:57                           ` Brian Hurt
@ 2007-06-01 15:40                             ` Alain Frisch
  2007-06-01 15:58                               ` Brian Hurt
  2007-06-01 16:47                               ` Jon Harrop
  2007-06-01 23:26                             ` skaller
  1 sibling, 2 replies; 71+ messages in thread
From: Alain Frisch @ 2007-06-01 15:40 UTC (permalink / raw)
  Cc: caml-list

Brian Hurt wrote:
> where the function is only called from one place, or 3) where inlining
> opened up signifigant other optimization opportunities.  The classic
> example for Ocaml here is replacing a call to Pervasives.compare with an
> integer compare.  Most of the rest of the time, inlining is either a
> break even proposition, or often a loss.

Another good situation is when inlining allows the compiler to turn a
function call to an unknown location into a direct function call (or no
function call at all). This happens as soon as you write "List.map (fun
x -> ...)". Inlining List.map would avoid the allocation of the closure
and a computed call (and then the local abstraction will also be inlined
in the body of the inlined List.map because it is used only once).
Currently, OCaml never inlines recursive functions.

-- Alain


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 15:40                             ` Alain Frisch
@ 2007-06-01 15:58                               ` Brian Hurt
  2007-06-01 16:25                                 ` Markus Mottl
  2007-06-01 16:47                               ` Jon Harrop
  1 sibling, 1 reply; 71+ messages in thread
From: Brian Hurt @ 2007-06-01 15:58 UTC (permalink / raw)
  To: Alain Frisch; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1305 bytes --]

Alain Frisch wrote:

>Brian Hurt wrote:
>  
>
>>where the function is only called from one place, or 3) where inlining
>>opened up signifigant other optimization opportunities.  The classic
>>example for Ocaml here is replacing a call to Pervasives.compare with an
>>integer compare.  Most of the rest of the time, inlining is either a
>>break even proposition, or often a loss.
>>    
>>
>
>Another good situation is when inlining allows the compiler to turn a
>function call to an unknown location into a direct function call (or no
>function call at all). This happens as soon as you write "List.map (fun
>x -> ...)". Inlining List.map would avoid the allocation of the closure
>and a computed call (and then the local abstraction will also be inlined
>in the body of the inlined List.map because it is used only once).
>Currently, OCaml never inlines recursive functions.
>
>  
>

This qualifies as an optimization opportunity- turning a call to 
caml_apply into a direct function call is an optimization.  Which may 
open up other optimization possibilities.  But that was my point- if the 
only thing you're getting out of inlining a function is skipping a 
function call (to a known location), then inlining generally isn't worth 
it- it's only worth it if it opens up other possibilities.

Brian


[-- Attachment #2: Type: text/html, Size: 1729 bytes --]

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 11:49                   ` David MENTRE
  2007-06-01 14:41                     ` skaller
@ 2007-06-01 16:14                     ` Markus Mottl
  2007-06-01 16:46                       ` Jon Harrop
  2007-06-01 17:13                       ` Jon Harrop
  1 sibling, 2 replies; 71+ messages in thread
From: Markus Mottl @ 2007-06-01 16:14 UTC (permalink / raw)
  To: David MENTRE; +Cc: caml-list

On 6/1/07, David MENTRE <dmentre@linux-france.org> wrote:
> 2007/6/1, Yaron Minsky <yminsky@cs.cornell.edu>:
> > In other words,
> > factoring out with functors and modules is good style, but OCaml penalizes
> > you for it.
>
> A naive and somewhat provocative question: is the performance penalty
> a real issue in your production code or just a known overhead that is
> easily solved by having a more powerful computer? In other words, is
> the complexity price of better optimizations justified considering its
> real impact in production code?

Absolutely!  E.g. we had to specialize hash tables for integer and
string keys, because the generic implementation calls a function for
each key comparison rather than generating specialized code for e.g.
integer comparisons.  This has a noticable impact in production
systems.

There are plenty of cases where I had to manually unfold function
definitions like e.g.:

  let f = g h

where h is some function.  This leads to a lot of code duplication,
and makes code considerably less readable (though a lot more efficient
if e.g. h is applied to each character in a string).

I'd surely be happy to see the addition of some (optional)
higher-level code transformations to OCaml.  Not just inlining, maybe
some partial evaluation of the resulting code, which could also reduce
code size if the compiler can prove that certain branches will not be
taken.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 15:58                               ` Brian Hurt
@ 2007-06-01 16:25                                 ` Markus Mottl
  0 siblings, 0 replies; 71+ messages in thread
From: Markus Mottl @ 2007-06-01 16:25 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Alain Frisch, caml-list

On 6/1/07, Brian Hurt <bhurt@janestcapital.com> wrote:
> But that was my point- if the only thing you're
> getting out of inlining a function is skipping a function call (to a known
> location), then inlining generally isn't worth it- it's only worth it if it
> opens up other possibilities.

I disagree.  Function calls can be pretty darn expensive.  If you have
a function that does not much more than branch on some value (many
branches, the first discriminated cases being taken more often than
others), then the function call alone may costs more than the
operation in average, but inlining would still lead to a lot of text
being duplicated.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 16:14                     ` Markus Mottl
@ 2007-06-01 16:46                       ` Jon Harrop
  2007-06-01 17:13                       ` Jon Harrop
  1 sibling, 0 replies; 71+ messages in thread
From: Jon Harrop @ 2007-06-01 16:46 UTC (permalink / raw)
  To: caml-list


You guys seem to have left me in the dust in this discussion. :-)

On Friday 01 June 2007 17:14:36 Markus Mottl wrote:
> Absolutely!  E.g. we had to specialize hash tables for integer and
> string keys

I wholeheartedly agree with this. OCaml is lightning fast for 1=2 but 
dreadfully slow for (1,2)=(2,3). I'm sure this can be addressed easily 
enough.

> I'd surely be happy to see the addition of some (optional)
> higher-level code transformations to OCaml.  Not just inlining, maybe
> some partial evaluation of the resulting code, which could also reduce
> code size if the compiler can prove that certain branches will not be
> taken.

General partial evaluation/specialization is another area that is too 
unpredictable to leave it entirely up to the compiler, IMHO. Like inlining, 
simple changes to existing programs have shown that "obvious" optimizations 
can slow things down a lot.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 15:40                             ` Alain Frisch
  2007-06-01 15:58                               ` Brian Hurt
@ 2007-06-01 16:47                               ` Jon Harrop
  1 sibling, 0 replies; 71+ messages in thread
From: Jon Harrop @ 2007-06-01 16:47 UTC (permalink / raw)
  To: caml-list; +Cc: Alain Frisch

On Friday 01 June 2007 16:40:46 Alain Frisch wrote:
> Another good situation is when inlining allows the compiler to turn a
> function call to an unknown location into a direct function call (or no
> function call at all). This happens as soon as you write "List.map (fun
> x -> ...)". Inlining List.map would avoid the allocation of the closure
> and a computed call (and then the local abstraction will also be inlined
> in the body of the inlined List.map because it is used only once).
> Currently, OCaml never inlines recursive functions.

This is an excellent example of something that can be done simply and 
effectively by specializing only stdlib calls.

Float array operations are another place where OCaml leaves expensive checks 
in the inner loop unnecessarily:

  Array.fold_left (+.) 0.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 14:41                     ` skaller
@ 2007-06-01 16:52                       ` Jon Harrop
  2007-06-01 23:33                         ` skaller
  0 siblings, 1 reply; 71+ messages in thread
From: Jon Harrop @ 2007-06-01 16:52 UTC (permalink / raw)
  To: caml-list

On Friday 01 June 2007 15:41:43 skaller wrote:
> Paying 4 times more dollars for a CPU that is twice as fast is a
> very expensive solution compared to an optimising compiler ..

Much more cost effective to implement more effective compiler features in 
house. All you need is the author of an excellent whole-program optimizing 
compiler for some related language.

On Friday 01 June 2007 12:29:00 Yaron Minsky wrote:
> I could not agree with this sentiment more.  Stephen actually now works at
> Jane Street, and since his arrival he's taught us a  number of techniques...

See. ;-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 16:14                     ` Markus Mottl
  2007-06-01 16:46                       ` Jon Harrop
@ 2007-06-01 17:13                       ` Jon Harrop
  2007-06-04 14:03                         ` Mike Furr
  1 sibling, 1 reply; 71+ messages in thread
From: Jon Harrop @ 2007-06-01 17:13 UTC (permalink / raw)
  To: caml-list

On Friday 01 June 2007 17:14:36 Markus Mottl wrote:
> Absolutely!  E.g. we had to specialize hash tables for integer and
> string keys, because the generic implementation calls a function for
> each key comparison rather than generating specialized code for e.g.
> integer comparisons.  This has a noticable impact in production
> systems.

Sets are another example. In that case, function call in trivial comparison is 
a side effect of using functors.

> ...
> I'd surely be happy to see the addition of some (optional)
> higher-level code transformations to OCaml.  Not just inlining, maybe
> some partial evaluation of the resulting code, which could also reduce
> code size if the compiler can prove that certain branches will not be
> taken.

The stdlib has a lot of scope for optimization. The Set implementation 
currently does not specialize 1-element nodes. Doing this improves 
performance by ~30%, partly by relieving the GC and partly by avoiding 
branches in common cases. I believe some compilers automate this optimization 
(trimming the bottom layer from trees).

Non tail-recursive functions in the stdlib is another example. You can easily 
get quadratic instead of linear behaviour without realising. Hash tables can 
have big hidden performance costs, especially for soft real-time work.

Actually, while we're here. I've long thought that the stdlib should provide 
abstract implementations of concrete data structures like RB trees and AVL 
trees, and functorize the Set and Map modules over the tree type they use. 
This would let people add new abstract data structures (I like purely 
functional sequences based on AVL trees) built upon solid concrete data 
structures from the stdlib rather than cutting and pasting code (one of the 
more embarassing OCaml FAQs).

Making this feasible by optimizing away the abstractions requires more than 
just defunctorizing though. You need to partially specialize by type, as 
Markus says. You also need to do whole-program transformations to flatten 
data structures. For example, a set would require:

  Node of 'a t * 'a * 'a t * height

and a sequence would require:

  Node of 'a t * 'a * 'a t * height * size

So a generic OCaml solution would add an indirection to the metadata that 
could be flattened out.

Not being hardcore enough to tinker with the OCaml compiler itself, I'd write 
an OCaml program that generated OCaml implementations of data structures with 
the necessary specialization. Indeed, I've already done this for 
low-dimensional vectors and matrices, but doing it for trees would be more 
interesting.

Just out of curiosity, how many of the optimizations being discussed can be 
done with camlp4?

Anyway, we should try to build a coherent list of optimizations we'd like and 
then try to prioritize them.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 14:57                           ` Brian Hurt
  2007-06-01 15:40                             ` Alain Frisch
@ 2007-06-01 23:26                             ` skaller
  2007-06-01 23:49                               ` Brian Hurt
  1 sibling, 1 reply; 71+ messages in thread
From: skaller @ 2007-06-01 23:26 UTC (permalink / raw)
  To: Brian Hurt; +Cc: Stephen Weeks, caml-list

On Fri, 2007-06-01 at 10:57 -0400, Brian Hurt wrote:
>  And the third case, where inlining opens up new 
> possibilities for optimization- that almost has to be done by the 
> compiler, as it depends upon what optimizations the compiler can, and 
> will, apply to the newly inlined function.  This is something I trust 
> the compiler to do more than I trust even me to do correctly.

It's NOT so easy to predict how much optimisation will result
from inlining. Just think about it, you have a tree of inlining
opportunities, if do you really want to attempt to estimate the
coefficients on N inlining choices just to decide if you'll
inline or not?

I doubt it. The compiler will make one guess whether to inline
or not based on a some fast heuristic, and then commit.

Felix inlines based on the number of statements in a function.
The default threshhold is 50. Inlining is done bottom up.
Recursive calls to children are inlined, others are not.
Tail-rec optimisation is done before each inlining operation.

Then, a monormorphisation pass is done, and the inlining
process repeated.

The HARD problem I haven't solved at all is how to improve
the recursive function inlining rule: since only recursive
calls to children are inlined, recursive calls to siblings
are not. This is really bad, because a self-tail-rec optimisation
might apply after a sibling inline, but the opportunity is lost
because I don't know how to safely inline siblings.

This sounds easy .. just inline until you spot a self call.

But it isn't easy, because that doesn't work. This is because
inlining a function requires 'cloning' all its children,
and a recursive call can be left as a call to the
original function .. so the call isn't recursive anymore.
Sibling inlining can then spew an infinite expansion.

Typeclasses make a real mess here: instances need to be kept
around until you're sure they're not needed, which in general
is after the post monomorphisation inlining phase, and, replacing
a 'virtual' with a concrete call messes up tracking of recursion
as well .. I'm not even sure my algorithm is safe in respect
of typeclass instantiation if the typeclass instance function
is recursive on the typeclass.


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 16:52                       ` Jon Harrop
@ 2007-06-01 23:33                         ` skaller
  0 siblings, 0 replies; 71+ messages in thread
From: skaller @ 2007-06-01 23:33 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Fri, 2007-06-01 at 17:52 +0100, Jon Harrop wrote:
> On Friday 01 June 2007 15:41:43 skaller wrote:
> > Paying 4 times more dollars for a CPU that is twice as fast is a
> > very expensive solution compared to an optimising compiler ..
> 
> Much more cost effective to implement more effective compiler features in 
> house. All you need is the author of an excellent whole-program optimizing 
> compiler for some related language.

Or you could make an SML->Ocaml preprocessor .. allowing
rapid prototyping with Ocaml and production compilation with
another tool.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 23:26                             ` skaller
@ 2007-06-01 23:49                               ` Brian Hurt
  2007-06-02  3:26                                 ` skaller
  0 siblings, 1 reply; 71+ messages in thread
From: Brian Hurt @ 2007-06-01 23:49 UTC (permalink / raw)
  To: skaller; +Cc: Brian Hurt, caml-list



On Sat, 2 Jun 2007, skaller wrote:

> On Fri, 2007-06-01 at 10:57 -0400, Brian Hurt wrote:
>>  And the third case, where inlining opens up new
>> possibilities for optimization- that almost has to be done by the
>> compiler, as it depends upon what optimizations the compiler can, and
>> will, apply to the newly inlined function.  This is something I trust
>> the compiler to do more than I trust even me to do correctly.
>
> It's NOT so easy to predict how much optimisation will result
> from inlining. Just think about it, you have a tree of inlining
> opportunities, if do you really want to attempt to estimate the
> coefficients on N inlining choices just to decide if you'll
> inline or not?

Nor is it easy for the programmer to guess how much optimization will 
result from inlining!  What with different compilers with different 
optimization strategies, complex interactions between compiler strategies, 
and even compiler strategies being enabled or disabled depending upon what 
compilation flags given.  Plus you have the effect of changing codes 
bases- any decision as to wether to inline or not has to be revisited 
every time either code changes.  Plus, the decision to inline is dependent 
upon code in probably widely disseperate locations.

>
> I doubt it. The compiler will make one guess whether to inline
> or not based on a some fast heuristic, and then commit.

Yep.  And I'm saying that heuristic is likely to be more accurate than the 
programmers guess.

Brian


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 23:49                               ` Brian Hurt
@ 2007-06-02  3:26                                 ` skaller
  0 siblings, 0 replies; 71+ messages in thread
From: skaller @ 2007-06-02  3:26 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

On Fri, 2007-06-01 at 19:49 -0400, Brian Hurt wrote:
> 
> On Sat, 2 Jun 2007, skaller wrote:
> 
> > On Fri, 2007-06-01 at 10:57 -0400, Brian Hurt wrote:
> >>  And the third case, where inlining opens up new
> >> possibilities for optimization- that almost has to be done by the
> >> compiler, as it depends upon what optimizations the compiler can, and
> >> will, apply to the newly inlined function.  This is something I trust
> >> the compiler to do more than I trust even me to do correctly.
> >
> > It's NOT so easy to predict how much optimisation will result
> > from inlining. Just think about it, you have a tree of inlining
> > opportunities, if do you really want to attempt to estimate the
> > coefficients on N inlining choices just to decide if you'll
> > inline or not?
> 
> Nor is it easy for the programmer to guess how much optimization will 
> result from inlining! 

That's partly true. For Felix, some functions are 'generated
by the system'. For example

	val x  = if a then b else c endif;

looks cute, but actually is just sugar for:

	val x = match a with
	| true => b
	| _ => c
	endmatch

which in turn is just sugar for

	val arg = a;
	fun matches_1 () => a == true;
	fun matches_2 () => true;
	fun handler_1() { tmp = b; }
	fun handler_2() { tmp = c; }

	if matches_1 () then handler_1()
	elif matches_2 () then handler_2()
	else error "Match failure";
	x = tmp;

except the real code is even worse than that. Felix guarantees
compiler generated functions are inlined: such functions are
always children and never recursive without going through a
user function .. however they might
exceed the inlining threshold .. they're inlined anyhow.

This is because such 'automagically' generated functions can't
be tagged 'inline' or 'noinline' by the end user.

The reason the guarantee is made is simple: it's the only way
to be sure the compiler reduces 'dumb C looking code' into
'dumb C code', that is, to ensure WYSIWIG. For example
the above code 'had better' result in code like:

	if(!a) goto l1;
	x = b;
	goto l2;
l1:
	x = c;
l2:


or Felix won't be as fast as C. The guarantee makes it possible
to simplify the front end, by reducing 'everything' to lambdas,
applications, and other fairly simply terms.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-01 17:13                       ` Jon Harrop
@ 2007-06-04 14:03                         ` Mike Furr
  2007-06-04 14:39                           ` Jon Harrop
  2007-06-10 12:10                           ` Jon Harrop
  0 siblings, 2 replies; 71+ messages in thread
From: Mike Furr @ 2007-06-04 14:03 UTC (permalink / raw)
  To: caml-list

Jon Harrop wrote:
> Actually, while we're here. I've long thought that the stdlib should provide 
> abstract implementations of concrete data structures like RB trees and AVL 
> trees, and functorize the Set and Map modules over the tree type they use. 
> This would let people add new abstract data structures (I like purely 
> functional sequences based on AVL trees) built upon solid concrete data 
> structures from the stdlib rather than cutting and pasting code (one of the 
> more embarassing OCaml FAQs).

My OCaml summer project to build an improved data structure library will 
(hopefully) address many of the issues you have raised.  Although I'm 
just getting started, I have already started using your suggestion of 
including a 1-element constructor for my trees as it does indeed seem to 
give a noticeable speedup.

> Making this feasible by optimizing away the abstractions requires more than 
> just defunctorizing though. You need to partially specialize by type, as 
> Markus says. You also need to do whole-program transformations to flatten 
> data structures. For example, a set would require:
> 
>   Node of 'a t * 'a * 'a t * height
> 
> and a sequence would require:
> 
>   Node of 'a t * 'a * 'a t * height * size
> 
> So a generic OCaml solution would add an indirection to the metadata that 
> could be flattened out.

Indeed this would be preferable.  Right now, my tree implementations 
include a generic tree mixin with the type constructor:

    Node of ('data,'inv) t * 'data * ('data,'inv) t * 'inv

forcing all invariant information into the last cell.  Implementations 
that require more than a single value here must use an extra level of 
indirection (so in your example, 'inv = height * size).  I don't think 
this will be too bad for performance since these values only need to be 
retrieved for re-balancing leaving read-only operations unaffected. 
However, this has the huge benefit of allowing me to only write *one* 
version of find, min/max, fold, iter, etc. for all of the trees I 
implement, which in and of itself is a big win. :)

-Mike


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-04 14:03                         ` Mike Furr
@ 2007-06-04 14:39                           ` Jon Harrop
  2007-06-04 15:33                             ` Mike Furr
  2007-06-04 18:08                             ` skaller
  2007-06-10 12:10                           ` Jon Harrop
  1 sibling, 2 replies; 71+ messages in thread
From: Jon Harrop @ 2007-06-04 14:39 UTC (permalink / raw)
  To: caml-list

On Monday 04 June 2007 15:03:45 Mike Furr wrote:
> My OCaml summer project to build an improved data structure library

This is an absolutely incredible project and I am eagerly awaiting your 
results. As I am sure you already know, Jean-Christophe Filliatre, Chris 
Okasaki and Markus Mottl have done fantastic work in this area already. I 
would be very happy to see this collated. Diego Fernandez also did some good 
work with Baire but it seems to have evaporated.

I'll gladly submit my stdlib2 if you're interested. Its main benefit is 
autogenerated functions (like iter, map, fold) that act upon tuples of 
different arities:

  let fa, fb, fc = T3.map foo (a, b, c)

You might also consider lazy rebalancing, as rebalancing is both expensive and 
often unnecessary.

> forcing all invariant information into the last cell.  Implementations
> that require more than a single value here must use an extra level of
> indirection (so in your example, 'inv = height * size).  I don't think
> this will be too bad for performance since these values only need to be
> retrieved for re-balancing leaving read-only operations unaffected.

A functional array implemented as a balanced binary tree needs the number of 
elements (size) of a sub tree just to index into the tree. So that read-only 
operation must indirect unnecessarily using your approach. However, the 
indirection is a much smaller overhead compared to the cost of functors or 
polymorphism when the comparison functions are trivial (which they normally 
are) using the current approach.

My feeling is that the use of functors in Set and Map are more of a hindrance 
than a benefit. The ubiquity of Hashtbl is testament to this. Also, the FAQ 
about having to use recursive modules to implement an expr type with a set of 
exprs in it. Using a record as F# does makes this easier. However, dispatch 
of comparison by run-time type makes the F# approach safer as well, which is 
always more important (and I have been bitten by this in OCaml).

> However, this has the huge benefit of allowing me to only write *one*
> version of find, min/max, fold, iter, etc. for all of the trees I
> implement, which in and of itself is a big win. :)

Perhaps we should consider the alternative approach where those functions are 
higher-order functions over the necessary low-level tree functions. This 
would only need support for inlining to recover optimal performance and it 
also lets you write the core functions only once.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-04 14:39                           ` Jon Harrop
@ 2007-06-04 15:33                             ` Mike Furr
  2007-06-04 18:08                             ` skaller
  1 sibling, 0 replies; 71+ messages in thread
From: Mike Furr @ 2007-06-04 15:33 UTC (permalink / raw)
  To: caml-list; +Cc: osp2007-ocaml-reins

   For now,  I just want to get Jon Harrop wrote:
> On Monday 04 June 2007 15:03:45 Mike Furr wrote:
>> My OCaml summer project to build an improved data structure library
> 
> This is an absolutely incredible project and I am eagerly awaiting your 
> results. As I am sure you already know, Jean-Christophe Filliatre, Chris 
> Okasaki and Markus Mottl have done fantastic work in this area already. I 
> would be very happy to see this collated. Diego Fernandez also did some good 
> work with Baire but it seems to have evaporated.

Yes, their work is a major influence of the project.

> I'll gladly submit my stdlib2 if you're interested. Its main benefit is 
> autogenerated functions (like iter, map, fold) that act upon tuples of 
> different arities:

Great, I would love to see what you've done!  Also, if you interested, 
you can subscribe to the mailing list for the project (no posts yet as 
I'm just getting started, but perhaps we should move our conversation 
there... CC'd).

http://groups.google.com/group/osp2007-ocaml-reins

> You might also consider lazy rebalancing, as rebalancing is both expensive and 
> often unnecessary.

Initially, I plan on supporting lazy rebalancing via the use of zippers. 
   This will also allow splay-like find semantics for arbitrary tree 
structures (although not quite as efficient as a native splay tree which 
of course will also be available).

> A functional array implemented as a balanced binary tree needs the number of 
> elements (size) of a sub tree just to index into the tree. So that read-only 
> operation must indirect unnecessarily using your approach. However, the 
> indirection is a much smaller overhead compared to the cost of functors or 
> polymorphism when the comparison functions are trivial (which they normally 
> are) using the current approach.

Indeed, this example may require a custom implementation to be 
efficient.  However, I have lots to do for the summer and will likely 
leave such performance tweaking until the end of the project.

> My feeling is that the use of functors in Set and Map are more of a hindrance 
> than a benefit. The ubiquity of Hashtbl is testament to this.

I'm not exactly sure what you mean by this.  Are you suggesting the 
performance is a hindrance or that the need for a separate 'module M = 
SMap.Make(...)' line is annoying?  One goal of my library will be that 
it will allow you to change data structures very easily.  For instance 
you might write the following while initially developing your code

   module MyMap = Poly.Map

This map would use the standard polymorphic compare function with keys 
of type 'a and values of type 'b (for some a,b).  Later, once your code 
has matured, you might know your keys are always of type (int*int) list.
Then you would simply change the module def to:

  module MyMap = Mono.AVLMap(List(Pair(Int,Int)))

and now you have a specific data structure which takes advantage of the 
faster monomorphic compare function.  The library will include modules 
for all of the base types (int, float, etc...) as well as hoisting 
common type constructors into functors (such as List() above). 
Therefore, you don't have to write any annoying boilerplate code to use 
the functors.

As far as performance goes, I plan on using functors where it makes the 
code most elegant and easy to use.  With all of the mention of 
ocamldefun on this mailing list, I imagine it won't be too long before 
we have version for ocaml 3.10 (or else I may write it myself!)

>> However, this has the huge benefit of allowing me to only write *one*
>> version of find, min/max, fold, iter, etc. for all of the trees I
>> implement, which in and of itself is a big win. :)
> 
> Perhaps we should consider the alternative approach where those functions are 
> higher-order functions over the necessary low-level tree functions. This 
> would only need support for inlining to recover optimal performance and it 
> also lets you write the core functions only once.

Perhaps.  This would require passing around a dictionary of functions 
and I'm not sure the resulting code would be as clean or any faster than 
functors.  Another way would be to build them on top of the libraries 
zipper-based iterators (but this will likely be less efficient).  I plan 
on doing this anyway to support arbitrary traversal strategies.  Again, 
perhaps at the end of the summer I can look into the performance 
characteristics of these ideas.

Cheers,
-Mike


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-04 14:39                           ` Jon Harrop
  2007-06-04 15:33                             ` Mike Furr
@ 2007-06-04 18:08                             ` skaller
       [not found]                               ` <9d3ec8300706041518y115d22bdsa120d4010261d841@mail.gmail.com>
  2007-06-04 22:44                               ` Pierre Etchemaïté
  1 sibling, 2 replies; 71+ messages in thread
From: skaller @ 2007-06-04 18:08 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Mon, 2007-06-04 at 15:39 +0100, Jon Harrop wrote:

> A functional array implemented as a balanced binary tree needs the number of 
> elements (size) of a sub tree just to index into the tree. 

It might be worth considering a functional version of Judy arrays:

http://judy.sourceforge.net/

They're fairly close to ideal: all operations are O(1),
they never need rebalancing, and they work just as well with
sparse arrays as compact ones. Judy arrays are digital tries
with compressed nodes. It's not clear whether a functional
variation would retain the performance claimed for the mutable
version.

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Fwd: [Caml-list] Comparison of OCaml and MLton for numerics
       [not found]                               ` <9d3ec8300706041518y115d22bdsa120d4010261d841@mail.gmail.com>
@ 2007-06-04 22:19                                 ` Till Varoquaux
  2007-06-04 23:40                                   ` Jon Harrop
  2007-06-05  2:24                                   ` skaller
  0 siblings, 2 replies; 71+ messages in thread
From: Till Varoquaux @ 2007-06-04 22:19 UTC (permalink / raw)
  To: OCaml

Forgot the reply-to-all....

---------- Forwarded message ----------
From: Till Varoquaux <till.varoquaux@gmail.com>
Date: Jun 5, 2007 12:18 AM
Subject: Re: [Caml-list] Comparison of OCaml and MLton for numerics
To: skaller <skaller@users.sourceforge.net>


Hmm....

At a first glance these could be translated to persistent data
structures, although with a cost  (they are "upside down": inserting a
new element modifies one of the leafs... but this is also the case
with red/black trees). It doesn't seem to me that most operations
would be O(1) (sounds too good to be true anyways*), I'd rather guess
O(ln(n)) (where log is base 256 and n is length of the element we are
storing/querying etc...). I do wonder how well these would scale when
used with big elements. they might prove an interesting back-end for
an implementation of maps (values would be stored in the nodes).

Most of the optimizations seem to reside more on the low level details
of the implementation rather than on the algorithmic side (which seems
fairly classical from a distance).

For more functional data structures you might be interested by vlists.
There is even an OCaml version.

Regards,
Till

* Since hashtables have to deal with collisions and hashing of large
values, I do not consider their operations to be O(1).
[1]http://en.wikipedia.org/wiki/VList
[2]http://www.imada.sdu.dk/~bardur/personal/45-programs/index.html


On 6/4/07, skaller <skaller@users.sourceforge.net> wrote:
> On Mon, 2007-06-04 at 15:39 +0100, Jon Harrop wrote:
>
> > A functional array implemented as a balanced binary tree needs the number of
> > elements (size) of a sub tree just to index into the tree.
>
> It might be worth considering a functional version of Judy arrays:
>
> http://judy.sourceforge.net/
>
> They're fairly close to ideal: all operations are O(1),
> they never need rebalancing, and they work just as well with
> sparse arrays as compact ones. Judy arrays are digital tries
> with compressed nodes. It's not clear whether a functional
> variation would retain the performance claimed for the mutable
> version.
>
> --
> John Skaller <skaller at users dot sf dot net>
> Felix, successor to C++: http://felix.sf.net
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-04 18:08                             ` skaller
       [not found]                               ` <9d3ec8300706041518y115d22bdsa120d4010261d841@mail.gmail.com>
@ 2007-06-04 22:44                               ` Pierre Etchemaïté
  2007-06-05  1:42                                 ` Jon Harrop
  1 sibling, 1 reply; 71+ messages in thread
From: Pierre Etchemaïté @ 2007-06-04 22:44 UTC (permalink / raw)
  To: caml-list

Le Tue, 05 Jun 2007 04:08:08 +1000, skaller <skaller@users.sourceforge.net> a écrit :

> On Mon, 2007-06-04 at 15:39 +0100, Jon Harrop wrote:
> 
> > A functional array implemented as a balanced binary tree needs the number of 
> > elements (size) of a sub tree just to index into the tree. 
> 
> It might be worth considering a functional version of Judy arrays:
> 
> http://judy.sourceforge.net/

The VList looks interesting too: O(1) operations (amortized), and
functional; Someone even already wrote a pure OCaml implementation:

http://www.imada.sdu.dk/~bardur/personal/45-programs/


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Fwd: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-04 22:19                                 ` Fwd: " Till Varoquaux
@ 2007-06-04 23:40                                   ` Jon Harrop
  2007-06-05  2:24                                   ` skaller
  1 sibling, 0 replies; 71+ messages in thread
From: Jon Harrop @ 2007-06-04 23:40 UTC (permalink / raw)
  To: caml-list

On Monday 04 June 2007 23:19:20 Till Varoquaux wrote:
> For more functional data structures you might be interested by vlists.
> There is even an OCaml version.

Incidentally, has anyone tried 3-finger trees? Sounds like they might offset 
the disadvantage of 64-bit pointers...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-04 22:44                               ` Pierre Etchemaïté
@ 2007-06-05  1:42                                 ` Jon Harrop
  2007-06-05 10:30                                   ` Jon Harrop
  0 siblings, 1 reply; 71+ messages in thread
From: Jon Harrop @ 2007-06-05  1:42 UTC (permalink / raw)
  To: caml-list

On Monday 04 June 2007 23:44:13 Pierre Etchemaïté wrote:
> The VList looks interesting too: O(1) operations (amortized), and
> functional; Someone even already wrote a pure OCaml implementation:

Looks like a skip list with arrays instead of trees.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: Fwd: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-04 22:19                                 ` Fwd: " Till Varoquaux
  2007-06-04 23:40                                   ` Jon Harrop
@ 2007-06-05  2:24                                   ` skaller
  1 sibling, 0 replies; 71+ messages in thread
From: skaller @ 2007-06-05  2:24 UTC (permalink / raw)
  To: Till Varoquaux; +Cc: OCaml

On Tue, 2007-06-05 at 00:19 +0200, Till Varoquaux wrote:
> Forgot the reply-to-all....

> At a first glance these could be translated to persistent data
> structures, although with a cost  (they are "upside down": inserting a
> new element modifies one of the leafs... but this is also the case
> with red/black trees). It doesn't seem to me that most operations
> would be O(1) (sounds too good to be true anyways*), I'd rather guess
> O(ln(n)) (where log is base 256 and n is length of the element we are
> storing/querying etc...). 

Integers are fixed size ... it's an array remember, integers
are the keys, so it's O(1) because ln 64 ~= 1 :) The main thing
is it isn't O(ln N) where N is the number of elements in the array.

> I do wonder how well these would scale when
> used with big elements. they might prove an interesting back-end for
> an implementation of maps (values would be stored in the nodes).

JudySL accepts C style null terminated strings. It sorts them faster
than the GNU sort command. JudyHS tries to work for strings with length
count, and combines a Trie with a Hashtable, but it doesn't supply
an iterator at the moment so it's fairly useless.

I make no claims for non-fixed sized keys.

> Most of the optimizations seem to reside more on the low level details
> of the implementation rather than on the algorithmic side (which seems
> fairly classical from a distance).

Correct .. Tries are already theoretically optimal .. they're just
not very fast in practice on modern machines because size matters
due to caching .. Judy arrays solve that problem, which isn't really
a theoretical one.

However the implementation is mutable .. in a functional version
if you write:

	newA = OldA + element

and OldA isn't reachable now, there's also a load on the garbage
collector which is part of the real cost, and that cost might
be proportional to the number of insertions.. so maybe this
data structure isn't very good in functional form?


-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-05  1:42                                 ` Jon Harrop
@ 2007-06-05 10:30                                   ` Jon Harrop
  0 siblings, 0 replies; 71+ messages in thread
From: Jon Harrop @ 2007-06-05 10:30 UTC (permalink / raw)
  To: caml-list

On Tuesday 05 June 2007 02:42:15 Jon Harrop wrote:
> On Monday 04 June 2007 23:44:13 Pierre Etchemaïté wrote:
> > The VList looks interesting too: O(1) operations (amortized), and
> > functional; Someone even already wrote a pure OCaml implementation:
>
> Looks like a skip list with arrays instead of trees.

Incidentally, I should probably explain that I like the idea of a functional 
array based upon balanced binary trees because it allows fast concatenation 
and subarrays.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-04 14:03                         ` Mike Furr
  2007-06-04 14:39                           ` Jon Harrop
@ 2007-06-10 12:10                           ` Jon Harrop
  2007-06-10 12:58                             ` skaller
  1 sibling, 1 reply; 71+ messages in thread
From: Jon Harrop @ 2007-06-10 12:10 UTC (permalink / raw)
  To: caml-list

On Monday 04 June 2007 15:03:45 Mike Furr wrote:
> I have already started using your suggestion of
> including a 1-element constructor for my trees as it does indeed seem to
> give a noticeable speedup.

I have another suggestion for you:

OCaml's exceptions are very fast and can be used to accelerate many no-op 
paths. For example, when inserting an element into a set that already 
contains that element, the current Set implementation will reallocate every 
node on the path to the element even though the resulting set is identical. 
You can improve performance enormously for the no-op case, avoiding this 
allocation, by raising an exception as soon as you realise the element is 
already present and catching it on the outside of the "insert" function to 
return the input set as the output.

A related optimization is to use physical equality to avoid allocation 
(copying) when the output will be equivalent to the input.

Insertion sort is a good example of this. A naive insertion sort may be 
written:

let rec insertion = function
  | [] -> []
  | h1::t ->
    match insertion t with 
    | h2::t when h1>h2 -> h2::insertion(h1::t)
    | t -> h1::t

but this copies already-sorted tail lists unnecessarily. It can be optimized 
by returning the original "list" when possible:

let rec insertion = function
  | [] -> []
  | h1::t as list ->
    match insertion t with 
    | h2::t when h1>h2 -> h2::insertion(h1::t)
    | t' -> if t==t' then list else h1::t'

More generally, applications such as term rewriters can benefit from 
specialized higher-order functions that incorporate this physical equality 
based optimization. My term rewriter used an "id_map" function:

val id_map : ('a -> 'a) -> 'a array -> 'a array

which returns the input when possible. This can avoid a lot of unnecessary 
allocation during rewriting and doubled the performance of the whole program!

My implementation of id_map was as follows:

let id_map f a =
  if a = [||] then a else
    let b = ref a in
    try
      for i = 0 to length a - 1 do
        let e = f a.(i) in
        if e != a.(i) then begin
          b := Array.copy a;
          (!b).(i) <- e;
          raise (Start (i+1))
        end
      done;
      a
    with Start start ->
      let b = !b in
      for i = start to length a - 1 do
        let e = f a.(i) in
        if e != a.(i) then b.(i) <- e;
      done;
      b

Another useful function along similar lines combines a map and a fold_left 
into one operation:

let id_map_fold_left f x a =
  if a = [||] then x, a else
    let r = ref x in
    let b = ref a in
    try
      for i = 0 to length a - 1 do
        let r', e = f !r a.(i) in
        r := r';
        if e != a.(i) then begin
          b := Array.copy a;
          (!b).(i) <- e;
          raise (Start (i+1))
        end
      done;
      !r, a
    with Start start ->
      let b = !b in
      for i = start to length a - 1 do
        let r', e = f !r a.(i) in
        r := r';
        if e != a.(i) then b.(i) <- e;
      done;
      !r, b

I used the "map" to rewrite one expression into another and the "fold_left" to 
accumulate the state of the interpreter.

I'm sure you can think of many combinations along similar lines. I think a new 
standard library would do very well to work such optimizations into the 
existing framework (e.g. the Set module) and providing some useful additional 
functions would be nice too.

I'll write articles about these sorts of things for the OCaml Journal.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e


^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [Caml-list] Comparison of OCaml and MLton for numerics
  2007-06-10 12:10                           ` Jon Harrop
@ 2007-06-10 12:58                             ` skaller
  0 siblings, 0 replies; 71+ messages in thread
From: skaller @ 2007-06-10 12:58 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Sun, 2007-06-10 at 13:10 +0100, Jon Harrop wrote:

> More generally, applications such as term rewriters can benefit from 
> specialized higher-order functions that incorporate this physical equality 
> based optimization. My term rewriter used an "id_map" function:
> 
> val id_map : ('a -> 'a) -> 'a array -> 'a array
> 
> which returns the input when possible. This can avoid a lot of unnecessary 
> allocation during rewriting and doubled the performance of the whole program!

Generally this often comes up rewriting terms:

	match x with
	| A a -> A (f a)
	| B b -> B b
	| ...

This is slower than

	match x with
	| A a -> A a
	| B _ as p -> p
	| ...

for the same reason: it avoids creating a new term equal
to the old one by just using the old term. This is particularly
useful with polymorphic variants:

	| #subtype as s -> s

In general though, Ocaml won't attain the performance Felix or
Mlton gain from whole program analysis, unboxing, and monomorphisation.
It compensates by providing (usually) rapid compile and build times,
a fast run time engine, and moderate optimisations.

It's somehow a pity the project isn't a bit more open, because
some of this stuff: more inlining, monomorphisation, additional
analysis, semantic axioms, etc, could be written by interested parties. 

However the only hooks available at the moment are in the 
front end with camlp4.

I might be interesting if the build system could "slot in"
third party analysis phases in a standardised way. The idea
here is that such modification would be strictly isolated,
extending the compiler by:

(a) an initialisation hook
(b) a term -> term analysis
(c) a command line switch to enable

so that the result is more or less guaranteed to be 
the standard Ocaml unless you switch on the extra feature.
When switched on, the analysis has an obligation to preserve
semantics (i.e. only optimisations are permitted).

With such a build system modification, more users could
choose to 'plug in' such optimisations in testing
builds of the compiler.

Obviously, such optimisations have limited opportunities
compared with arbitrary patches, but still, perhaps this
could provide a way for the community to build and assess
a limited but hopefully useful contribution. The INRIA team
could also supply testing optimisations using the same
mechanism.

With the bytecode compiler, it may even be possible to
direct loading of the optimisation phases dynamically
with the command line switches.

Well, I know nothing about the ocaml compiler at all,
so I have no idea whatsoever if this suggestion has
any merit. I do know, however, that at certain points
in compilation of Felix codes, it would be possible
to add in arbitrary optimisation passes which are close
to if not totally purely functional (and thus easy to
make optional).

I believe Mlton also supports "optional optimisations" at
various points in compilation.

How possible is this for Ocaml?

-- 
John Skaller <skaller at users dot sf dot net>
Felix, successor to C++: http://felix.sf.net


^ permalink raw reply	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2007-06-10 12:58 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-05-31  5:50 Comparison of OCaml and MLton for numerics Yuanchen Zhu
2007-05-31  6:17 ` [Caml-list] " Jon Harrop
2007-05-31  6:32   ` skaller
2007-05-31  7:31   ` Yuanchen Zhu
2007-05-31  9:08     ` Jon Harrop
2007-05-31  9:22       ` Yuanchen Zhu
2007-05-31 10:27         ` Jon Harrop
2007-05-31 21:30           ` Alain Frisch
2007-06-01  1:22             ` skaller
2007-06-01  1:36               ` Erik de Castro Lopo
2007-06-01  2:21                 ` skaller
2007-06-01  2:49                   ` Erick Tryzelaar
2007-06-01  3:05                     ` skaller
2007-06-01  5:30               ` Alain Frisch
2007-06-01  5:39                 ` Jon Harrop
2007-06-01  6:36                   ` Yuanchen Zhu
2007-06-01  8:09                 ` skaller
2007-06-01  8:53                   ` Richard Jones
2007-06-01  8:59                     ` Richard Jones
2007-06-01  9:22                       ` Stephan Tolksdorf
2007-06-01  9:49                         ` Richard Jones
2007-06-01  9:32                       ` Stephan Tolksdorf
2007-06-01 10:02                     ` skaller
2007-06-01 11:29                 ` Yaron Minsky
2007-06-01 11:43                   ` Erik de Castro Lopo
2007-06-01 11:58                     ` Jon Harrop
2007-06-01 13:49                       ` Julien Signoles
2007-06-01 14:18                         ` Stephen Weeks
2007-06-01 14:43                           ` Julien Signoles
2007-06-01 14:57                           ` Brian Hurt
2007-06-01 15:40                             ` Alain Frisch
2007-06-01 15:58                               ` Brian Hurt
2007-06-01 16:25                                 ` Markus Mottl
2007-06-01 16:47                               ` Jon Harrop
2007-06-01 23:26                             ` skaller
2007-06-01 23:49                               ` Brian Hurt
2007-06-02  3:26                                 ` skaller
2007-06-01 12:40                     ` Erik de Castro Lopo
2007-06-01 13:56                       ` Julien Signoles
2007-06-01 11:49                   ` David MENTRE
2007-06-01 14:41                     ` skaller
2007-06-01 16:52                       ` Jon Harrop
2007-06-01 23:33                         ` skaller
2007-06-01 16:14                     ` Markus Mottl
2007-06-01 16:46                       ` Jon Harrop
2007-06-01 17:13                       ` Jon Harrop
2007-06-04 14:03                         ` Mike Furr
2007-06-04 14:39                           ` Jon Harrop
2007-06-04 15:33                             ` Mike Furr
2007-06-04 18:08                             ` skaller
     [not found]                               ` <9d3ec8300706041518y115d22bdsa120d4010261d841@mail.gmail.com>
2007-06-04 22:19                                 ` Fwd: " Till Varoquaux
2007-06-04 23:40                                   ` Jon Harrop
2007-06-05  2:24                                   ` skaller
2007-06-04 22:44                               ` Pierre Etchemaïté
2007-06-05  1:42                                 ` Jon Harrop
2007-06-05 10:30                                   ` Jon Harrop
2007-06-10 12:10                           ` Jon Harrop
2007-06-10 12:58                             ` skaller
2007-06-01 14:15                 ` Stephen Weeks
2007-06-01 14:37                   ` Brian Hurt
2007-06-01 14:39                   ` Eric Cooper
2007-05-31  9:24       ` Yuanchen Zhu
2007-05-31 10:25       ` Loup Vaillant
2007-05-31 10:30         ` Jon Harrop
2007-05-31 12:12     ` skaller
2007-05-31  7:11 ` Daniel Bünzli
2007-05-31 15:15 ` Christophe Raffalli
2007-05-31 15:23   ` Jon Harrop
2007-05-31 15:35     ` Christophe Raffalli
     [not found]       ` <604682010705310923o5a1ee0eiee5ae697da9d3c60@mail.gmail.com>
2007-05-31 20:14         ` Stephen Weeks
2007-05-31 15:16 ` Christophe Raffalli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).