I get slightly different, but very consistent, timing (OCaml 4.00.0). In particular, the timing of ack1 and ack3 are "toggled" compared to yours:
0:04.89
0:04.89
0:03.98
0:03.98
0:05.23
I'd also expect this kind of result due to alignment.
And for fun I added ack5, which is the non-match implementation I ended up deriving from some disassembly:
let test r7 =
let rec loop x y =
if x = 0 then y+1
else if y = 0 then loop (x-1) r7
else loop (x-1) (loop x (y-1))
in loop 4 1
let _ = test 1
I didn't compare asm outputs, but looks like this version adds a small "something".