caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* function definition
@ 2007-01-21 12:45 Vu Ngoc San
  2007-01-21 17:42 ` [Caml-list] " Jon Harrop
  0 siblings, 1 reply; 3+ messages in thread
From: Vu Ngoc San @ 2007-01-21 12:45 UTC (permalink / raw)
  To: Caml Mailing List

I'm sure this is a basic question:

what is the difference between these ways of defining a function, and 
what is the most efficient (I mean for the resulting function f = binop 
o f1 f2, which typically will be called x*1000 times)

type operator = Plus | Minus;;


let binop1 o f1 f2 =
   fun x -> match o with
	| Plus -> (f1 x) +. (f2 x)
	| Minus -> (f1 x) -. (f2 x)

let binop2 o f1 f2 =
   match o with
	| Plus -> fun x -> (f1 x) +. (f2 x)
	| Minus -> fun x -> (f1 x) -. (f2 x)

let binop3 o f1 f2 =
   let op = match o with
	| Plus ->  (+.)
	| Minus -> (-.) in
	fun x -> op (f1 x) (f2 x)

let binop4 o f1 f2 =
   fun x ->
     let op = match o with
	| Plus ->  (+.)
	| Minus -> (-.) in
       	op (f1 x) (f2 x)

Thanks for your expertise !

San



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] function definition
  2007-01-21 12:45 function definition Vu Ngoc San
@ 2007-01-21 17:42 ` Jon Harrop
  2007-01-23 11:44   ` Vu Ngoc San
  0 siblings, 1 reply; 3+ messages in thread
From: Jon Harrop @ 2007-01-21 17:42 UTC (permalink / raw)
  To: caml-list

On Sunday 21 January 2007 12:45, Vu Ngoc San wrote:
> I'm sure this is a basic question:
>
> what is the difference between these ways of defining a function, and
> what is the most efficient (I mean for the resulting function f = binop
> o f1 f2, which typically will be called x*1000 times)

That's a very hard question, and is probably platform specific but I'll throw 
some ideas at you off the top of my head. I'm sure someone like Brian Hurt 
will dive into the assembler and prove me wrong. ;-)

> type operator = Plus | Minus;;
>
>
> let binop1 o f1 f2 =
>    fun x -> match o with
> 	| Plus -> (f1 x) +. (f2 x)
> 	| Minus -> (f1 x) -. (f2 x)

That is equivalent to:

  let binop1 o f1 f2 x = ..

it waits for all arguments before doing anything. ocamlopt optimises currying 
for that case.

> let binop2 o f1 f2 =
>    match o with
> 	| Plus -> fun x -> (f1 x) +. (f2 x)
> 	| Minus -> fun x -> (f1 x) -. (f2 x)

After 3 args, this returns a closure waiting for the fourth arg.

> let binop3 o f1 f2 =
>    let op = match o with
> 	| Plus ->  (+.)
> 	| Minus -> (-.) in
>    fun x -> op (f1 x) (f2 x)

After 3 args, this creates a closure to do the op and another closure that 
captures the first closure. ocamlopt might inline the closure "op" but I 
doubt it.

> let binop4 o f1 f2 =
>    fun x ->
>      let op = match o with
> 	| Plus ->  (+.)
> 	| Minus -> (-.) in
>      op (f1 x) (f2 x)

This waits for all four args again (same as first case) and the closure "op" 
might be inlined.

Assuming you invoke the function will all four arguments, I would expect the 
first solution to be the fastest by a significant margin. If you factor out a 
closure of binop with its first three arguments passed and use it multiple 
times then the second solution might be faster.

I've found this with a pattern matcher written in OCaml that was faster when 
the pattern matcher evaluated when partially applied to the pattern, 
returning a closure that matched against the pattern it had been partially 
applied to. I was surprised to find that. I still don't know why that would 
be faster...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
Objective CAML for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [Caml-list] function definition
  2007-01-21 17:42 ` [Caml-list] " Jon Harrop
@ 2007-01-23 11:44   ` Vu Ngoc San
  0 siblings, 0 replies; 3+ messages in thread
From: Vu Ngoc San @ 2007-01-23 11:44 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Thanks for the reply.

So, I have tried to do some benchmarking, and the results are surprising.

The method: a (for i=1 to ... do) loop.

I have tried first an inlined test with

ignore(binop1 Plus sin cos 1.)

(and then binop2, binop3, binop4)

Then I have defined the function

  f = binop1 Plus sin cos

and tried the loop with

  ignore(f 1.)

Here are the results
(Linux debian 2.6.17-2-686, Intel(R) Pentium(R) M processor 1300MHz)

BYTE CODE:

Inline binop1 with 100000000 iterations:58736 msec.
Inline binop2 with 100000000 iterations:59681 msec.
Inline binop3 with 100000000 iterations:67663 msec.
Inline binop4 with 100000000 iterations:68692 msec.

f (binop1) with 100000000 iterations:53514 msec.
f (binop2) with 100000000 iterations:47794 msec.
f (binop3) with 100000000 iterations:53315 msec.
f (binop4) with 100000000 iterations:59411 msec.


NATIVE CODE:

Inline binop1 with 100000000 iterations:25017 msec.
Inline binop2 with 100000000 iterations:25860 msec.
Inline binop3 with 100000000 iterations:26310 msec.
Inline binop4 with 100000000 iterations:24251 msec.

f (binop1) with 100000000 iterations:25512 msec.
f (binop2) with 100000000 iterations:25620 msec.
f (binop3) with 100000000 iterations:25554 msec.
f (binop4) with 100000000 iterations:24801 msec.


The last method (binop4) is clearly the slowest in bytcode and the 
fastest in native code !

In native code, the difference bewteen the slowest and the fastest 
reaches 8%.
This seems to me significant enough, but I don't know whether it depends 
on the system architecture or not.

Best
San



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2007-01-23 11:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-21 12:45 function definition Vu Ngoc San
2007-01-21 17:42 ` [Caml-list] " Jon Harrop
2007-01-23 11:44   ` Vu Ngoc San

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).