Hello,

 Got some time scratching my head over this little puzzle.
 Consider this bog-standard ackermann code :

let rec ack m n =
  match m, n with
  | 0,n -> n+1
  | m,0 -> ack (m-1) 1
  | m,n -> ack (m-1) (ack m (n-1))
in let _ = ack 4 1 ()

One could also pass m and n as a tuple. Also the call to the actual computation can be a toplevel let or not.
All in all 4 variants. Can you predict what will be the performance and what is the difference (if any) in generated
code?

All code and Makefile is attached.

Running `make bench` here consistently gives the following (ack1, ack3 - tuples, ack2, ack4 - curried) :

ack1.ml
0:03.85

ack2.ml
0:04.70

ack3.ml
0:04.60

ack4.ml
0:03.85

Tested with 3.12.1 and 4.00.1 (ack4 becomes slower).

Moreover, the generated assembly code for the main loop is the same, afaics. The only
difference is the initialization of structure fields and the initial call to ack. Please can anybody
explain the performance difference? I understand that microbenchmarks are no way the basis to draw
performance conclusions upon, but I cannot explain these results to myself in any meaninful way.
Please help! :)

-- 
 ygrek
 http://ygrek.org.ua