caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* OCaml image blending performance
@ 2008-02-06 20:29 Ilmari Heikkinen
  2008-02-06 22:12 ` [Caml-list] " Pal-Kristian Engstad
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Ilmari Heikkinen @ 2008-02-06 20:29 UTC (permalink / raw)
  To: caml-list

Hi,

I was writing some image blending operations to get to grips with OCaml,
and wrote the same code in C as well. Asking (and receiving) advice for
optimizing the code on freenode #ocaml, I was told to post the code here
as it might be an interesting compiler test.

The C and Caml versions don't produce the same results, but should
have the same amount of computation (don't take my word for it though,
I don't know why the results differ.)

The source files are:
 http://glimr.rubyforge.org/cake/blend.ml
 http://glimr.rubyforge.org/cake/blend2.ml
 http://glimr.rubyforge.org/cake/blend.c

Or as a tarball:

wget http://glimr.rubyforge.org/cake/blend_test.tar.gz
tar zxf blend_test.tar.gz
cd blend_test
./build.sh

cblend

real	0m1.466s
user	0m1.456s
sys	0m0.008s

blend

real	0m5.463s
user	0m5.456s
sys	0m0.012s

blend2

real	0m3.423s
user	0m3.404s
sys	0m0.012s

Use them as you wish.

--
Ilmari Heikkinen


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
  2008-02-06 20:29 OCaml image blending performance Ilmari Heikkinen
@ 2008-02-06 22:12 ` Pal-Kristian Engstad
  2008-02-06 23:30   ` Jon Harrop
  2008-02-06 23:34 ` Jon Harrop
       [not found] ` <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>
  2 siblings, 1 reply; 10+ messages in thread
From: Pal-Kristian Engstad @ 2008-02-06 22:12 UTC (permalink / raw)
  To: Ilmari Heikkinen; +Cc: caml-list

If you are looking for speed, this should be done in assembly...:

http://ompf.org/forum/viewtopic.php?f=11&t=494

PKE.

Ilmari Heikkinen wrote:
> Hi,
>
> I was writing some image blending operations to get to grips with OCaml,
> and wrote the same code in C as well. Asking (and receiving) advice for
> optimizing the code on freenode #ocaml, I was told to post the code here
> as it might be an interesting compiler test.
>
> The C and Caml versions don't produce the same results, but should
> have the same amount of computation (don't take my word for it though,
> I don't know why the results differ.)
>
> The source files are:
>  http://glimr.rubyforge.org/cake/blend.ml
>  http://glimr.rubyforge.org/cake/blend2.ml
>  http://glimr.rubyforge.org/cake/blend.c
>
> Or as a tarball:
>
> wget http://glimr.rubyforge.org/cake/blend_test.tar.gz
> tar zxf blend_test.tar.gz
> cd blend_test
> ./build.sh
>
> cblend
>
> real	0m1.466s
> user	0m1.456s
> sys	0m0.008s
>
> blend
>
> real	0m5.463s
> user	0m5.456s
> sys	0m0.012s
>
> blend2
>
> real	0m3.423s
> user	0m3.404s
> sys	0m0.012s
>
> Use them as you wish.
>
> --
> Ilmari Heikkinen
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>   

-- 
Pål-Kristian Engstad (engstad@naughtydog.com), 
Lead Graphics & Engine Programmer,
Naughty Dog, Inc., 1601 Cloverfield Blvd, 6000 North,
Santa Monica, CA 90404, USA. Ph.: (310) 633-9112.

"Most of us would do well to remember that there is a reason Carmack
is Carmack, and we are not Carmack.",
                       Jonathan Blow, 2/1/2006, GD Algo Mailing List



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
  2008-02-06 22:12 ` [Caml-list] " Pal-Kristian Engstad
@ 2008-02-06 23:30   ` Jon Harrop
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Harrop @ 2008-02-06 23:30 UTC (permalink / raw)
  To: pal_engstad; +Cc: caml-list List

On Wednesday 06 February 2008 22:12:28 Pal-Kristian Engstad wrote:
> If you are looking for speed, this should be done in assembly...:
>
> http://ompf.org/forum/viewtopic.php?f=11&t=494

It should be done on the GPU. :-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
  2008-02-06 20:29 OCaml image blending performance Ilmari Heikkinen
  2008-02-06 22:12 ` [Caml-list] " Pal-Kristian Engstad
@ 2008-02-06 23:34 ` Jon Harrop
  2008-02-07 11:01   ` Richard Jones
       [not found] ` <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>
  2 siblings, 1 reply; 10+ messages in thread
From: Jon Harrop @ 2008-02-06 23:34 UTC (permalink / raw)
  To: caml-list

On Wednesday 06 February 2008 20:29:05 Ilmari Heikkinen wrote:
> The C and Caml versions don't produce the same results, but should
> have the same amount of computation (don't take my word for it though,
> I don't know why the results differ.)

You should post working code if you want people to optimize it. I suspect 
you've incorrectly assumed that OCaml has full-width ints in this case 
because it does work here on my 64-bit machine.

> ./build.sh
>
> cblend
>
> real	0m1.466s
> user	0m1.456s
> sys	0m0.008s
>
> blend
>
> real	0m5.463s
> user	0m5.456s
> sys	0m0.012s
>
> blend2
>
> real	0m3.423s
> user	0m3.404s
> sys	0m0.012s

On 2.2GHz AMD64, I get:

$ ./build.sh

cblend

real    0m6.337s
user    0m6.048s
sys     0m0.032s

blend

real    0m14.965s
user    0m14.281s
sys     0m0.096s

blend2

real    0m9.639s
user    0m9.333s
sys     0m0.056s

OCaml is not competitive here for two main reasons:

. Full-size integer arithmetic is not very fast in OCaml.
. Abstractions often cost performance in OCaml.

In this case, most of the speed loss can be regained by simply aggressively 
inlining everything, which is exactly what you (ab)used C macros for in the C 
code.

If you want to get more interesting then you can look at metaprogramming in 
OCaml, either by generating OCaml code, using MetaOCaml or JIT compiling 
using LLVM. You should be able to meet or beat C that way. At the very least, 
you can write OCaml programs to take the pain away from writing C programs by 
hand... :-)

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
  2008-02-06 23:34 ` Jon Harrop
@ 2008-02-07 11:01   ` Richard Jones
  2008-02-07 11:13     ` Dominique Martinet
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Richard Jones @ 2008-02-07 11:01 UTC (permalink / raw)
  To: caml-list

On Wed, Feb 06, 2008 at 11:34:02PM +0000, Jon Harrop wrote:
> In this case, most of the speed loss can be regained by simply
> aggressively inlining everything, which is exactly what you (ab)used
> C macros for in the C code.

I don't understand this.  In 'blend2.ml' (which I was responsible for)
C macros are used to inline all the OCaml functions the same as in the
C version.  Yet it's still 70% slower than the C version.

My suspicion was that it was to do with his use of a string as a byte
array.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
  2008-02-07 11:01   ` Richard Jones
@ 2008-02-07 11:13     ` Dominique Martinet
  2008-02-07 12:29     ` Mauricio Fernandez
  2008-02-07 15:04     ` Jon Harrop
  2 siblings, 0 replies; 10+ messages in thread
From: Dominique Martinet @ 2008-02-07 11:13 UTC (permalink / raw)
  To: Richard Jones; +Cc: caml-list

I haven't looked at the code nor the build options, but you might want
to try the unsafe get and set functions if you did your own bundary
checks before, instead of having caml check it everytime it access
some data.

It saved us roughtly 20% of running time on a project involving ALOT
of array accesses


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
  2008-02-07 11:01   ` Richard Jones
  2008-02-07 11:13     ` Dominique Martinet
@ 2008-02-07 12:29     ` Mauricio Fernandez
  2008-02-07 15:04     ` Jon Harrop
  2 siblings, 0 replies; 10+ messages in thread
From: Mauricio Fernandez @ 2008-02-07 12:29 UTC (permalink / raw)
  To: caml-list

On Thu, Feb 07, 2008 at 11:01:54AM +0000, Richard Jones wrote:
> On Wed, Feb 06, 2008 at 11:34:02PM +0000, Jon Harrop wrote:
> > In this case, most of the speed loss can be regained by simply
> > aggressively inlining everything, which is exactly what you (ab)used
> > C macros for in the C code.
> 
> I don't understand this.  In 'blend2.ml' (which I was responsible for)
> C macros are used to inline all the OCaml functions the same as in the
> C version.  Yet it's still 70% slower than the C version.
> 
> My suspicion was that it was to do with his use of a string as a byte
> array.

I get a 30% speedup by unrolling the BLEND loop and performing some additional
CSE (just getting rid of that extra (j_+3), actually). 

I think the major culprit is 31-bit integer arithmetic: there are lots of 
"orl $1, ...", "sarl $1, ...", decl and incl in the generated code.

If Bigarray.Array1 had unsafe_get and unsafe_set, operating on int32 values
could be faster (ocamlopt is often smart enough to do without boxed int32s).
Currently, bigarray bound checking is performed even with -unsafe, however.

-- 
Mauricio Fernandez  -   http://eigenclass.org


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
  2008-02-07 14:10   ` Ilmari Heikkinen
@ 2008-02-07 14:08     ` Jon Harrop
  0 siblings, 0 replies; 10+ messages in thread
From: Jon Harrop @ 2008-02-07 14:08 UTC (permalink / raw)
  To: caml-list

On Thursday 07 February 2008 14:10:22 Ilmari Heikkinen wrote:
> On Feb 7, 2008 1:34 AM, Jon Harrop <jon@ffconsultancy.com> wrote:
> > In this case, most of the speed loss can be regained by simply
> > aggressively inlining everything, which is exactly what you (ab)used C
> > macros for in the C code.
>
> The C macros were more about making the code shorter by emulating HOFs,
> blend2.ml does inline everything. Coming within 2x of C with no major
> hackery on this sort of code is impressive though.

Getting within 2x of C is often regarded as disappointing in the OCaml 
community. :-)

I've played with some similar int-intensive algorithms and never managed to 
get close to C. OCaml's float performance is where it really shines.

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
       [not found] ` <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>
@ 2008-02-07 14:10   ` Ilmari Heikkinen
  2008-02-07 14:08     ` Jon Harrop
  0 siblings, 1 reply; 10+ messages in thread
From: Ilmari Heikkinen @ 2008-02-07 14:10 UTC (permalink / raw)
  To: caml-list

On Feb 7, 2008 1:34 AM, Jon Harrop <jon@ffconsultancy.com> wrote:
> On Wednesday 06 February 2008 20:29:05 Ilmari Heikkinen wrote:
> > The C and Caml versions don't produce the same results, but should
> > have the same amount of computation (don't take my word for it though,
> > I don't know why the results differ.)
>
> You should post working code if you want people to optimize it. I suspect
> you've incorrectly assumed that OCaml has full-width ints in this case
> because it does work here on my 64-bit machine.

Actually, it does work here as well, I probably tested earlier with extra
blend modes enabled in C. Sorry about the confusion, should've done a
more thorough test before posting.


> OCaml is not competitive here for two main reasons:
>
> . Full-size integer arithmetic is not very fast in OCaml.
> . Abstractions often cost performance in OCaml.
>
> In this case, most of the speed loss can be regained by simply aggressively
> inlining everything, which is exactly what you (ab)used C macros for in the C
> code.

The C macros were more about making the code shorter by emulating HOFs,
blend2.ml does inline everything. Coming within 2x of C with no major hackery
on this sort of code is impressive though.

--
Ilmari Heikkinen


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Caml-list] OCaml image blending performance
  2008-02-07 11:01   ` Richard Jones
  2008-02-07 11:13     ` Dominique Martinet
  2008-02-07 12:29     ` Mauricio Fernandez
@ 2008-02-07 15:04     ` Jon Harrop
  2 siblings, 0 replies; 10+ messages in thread
From: Jon Harrop @ 2008-02-07 15:04 UTC (permalink / raw)
  To: caml-list

On Thursday 07 February 2008 11:01:54 Richard Jones wrote:
> On Wed, Feb 06, 2008 at 11:34:02PM +0000, Jon Harrop wrote:
> > In this case, most of the speed loss can be regained by simply
> > aggressively inlining everything, which is exactly what you (ab)used
> > C macros for in the C code.
>
> I don't understand this.  In 'blend2.ml' (which I was responsible for)
> C macros are used to inline all the OCaml functions the same as in the
> C version.  Yet it's still 70% slower than the C version.
>
> My suspicion was that it was to do with his use of a string as a byte
> array.

I suspect OCaml's 31-bit integers are responsible for the remaining slowdown. 
You might try using 32-bit integers (Int32) but I'm not sure ocamlopt will 
unbox them as aggressively as it does floats (I've never tried it).

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-02-07 15:08 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-06 20:29 OCaml image blending performance Ilmari Heikkinen
2008-02-06 22:12 ` [Caml-list] " Pal-Kristian Engstad
2008-02-06 23:30   ` Jon Harrop
2008-02-06 23:34 ` Jon Harrop
2008-02-07 11:01   ` Richard Jones
2008-02-07 11:13     ` Dominique Martinet
2008-02-07 12:29     ` Mauricio Fernandez
2008-02-07 15:04     ` Jon Harrop
     [not found] ` <854c25eb0802070607t4b2ed641xa3bc19e8384ce23@mail.gmail.com>
2008-02-07 14:10   ` Ilmari Heikkinen
2008-02-07 14:08     ` Jon Harrop

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).