I compiled and ran a simple matrix-multiplication-style benchmark program written in imperative style with Bigarrays (see below) using versions 4.01.0, 4.03.0, 4.03.0+flambda, 4.04.0+trunk+flambda (with -version 4.04.0+dev9-2015-09-05), 4.04.0+beta2+flambda (4.04.0+beta2) and 4.05+trunk+flambda (4.05.0+dev0-2016-08-01).

Execution time increased by a factor of 2.3 from 4.03+flambda to 4.04+trunk+flambda.  I tried a few of the newer optimization switches (-rounds X, -unboxed-types, -unbox-closures, etc.) but that didn't make a significant difference. 

Execution times, repeatable within ~5%:

4.01.0                5.45s
4.03.0                4.28s
4.03.0+flambda        4.44s
4.04.0+trunk+flambda  10.45s
4.04.0+beta2+flambda 10.72s
4.05.0+trunk+flambda  10.36s

The test program was:

open Bigarray

let _ = 
  let m, n, rep = 100, 100, 1000 in
  let cr m n = Array2.create float64 fortran_layout m n in
  let a = cr m n in
  let c = cr m m in
  let rz = ref 0.0 in
  let x = ref 0.0 in
  for r = 1 to rep do
    for i = 1 to m do
      for j = 1 to n do
a.{i,j} <- !rz;
rz := !rz +. 123.45;
      done
    done;
    for i = 1 to m do
      for j = 1 to m do
x := 0.0;
for k = 1 to n do
 x := !x +. a.{i,k} *. a.{k,i}
done;
c.{i,j} <- !x
      done
    done
  done

-- 
Berke Durak, VA7OBD (CN88)