Ackermann function! That's a familiar beast, but I only thought I knew it by name (heard of it, never "saw" it). Now that I see it, it was part of a puzzle that I extracted from asm, translated into OCaml, converted to a recursive loop without global state... and ended up with almost exactly that definition. I tried optimizing it for a day... and my notes look like the Wikipedia page, now that I know what to look for. The challenge involved computing A(4,1,p), discovering 'p' which results in '6' (word-wrapped on a 15-bit virtual machine). Eventually I gave up on reducing or optimizing and memoized... churning through over 20k values of 'p' in less than an hour to find my answer.

I get slightly different, but very consistent, timing (OCaml 4.00.0). In particular, the timing of ack1 and ack3 are "toggled" compared to yours:

ack1.ml
0:04.89

ack2.ml
0:04.89

ack3.ml
0:03.98

ack4.ml
0:03.98

ack5.ml
0:05.23

I'd also expect this kind of result due to alignment.

And for fun I added ack5, which is the non-match implementation I ended up deriving from some disassembly:

let test r7 =
  let rec loop x y =
    if x = 0 then y+1
    else if y = 0 then loop (x-1) r7
    else loop (x-1) (loop x (y-1))
  in loop 4 1
let _ = test 1

I didn't compare asm outputs, but looks like this version adds a small "something".