Magesh Kannan wrote:
> Does the invocation (my_func_part 10 20) run any faster than
> (my_func_wrapper 5 10 20)?

It is comparable if you compile to bytecode, and much worse if you compile to 
native code. In that case, you may loose more than a factor of ten:

$ ./bench
my_func:         16.4 sec
my_func_wrapper: 20.2 sec
my_func_part:    17.1 sec
$ ocamlopt -inline 0 -o bench unix.cmxa bench.ml
$ ./bench
my_func:         0.6 sec
my_func_wrapper: 0.8 sec
my_func_part:    2.2 sec
$ ocamlopt -inline 10 -o bench unix.cmxa bench.ml
$ ./bench
my_func:         0.2 sec
my_func_wrapper: 0.2 sec
my_func_part:    2.3 sec

A full application of a function is optimized far better than a partial one.
Especially in the -inline 10 case, the function call is completely optimized 
away for the two fully applied versions.

Yours, Florian Hars.