caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Why is some code compiled with 4.04.0 or 4.05.0 running 2.3 times slower than the same code compiled with 4.03.0?
@ 2016-10-30  6:42 Berke Durak
  2016-10-30 16:49 ` David Allsopp
  0 siblings, 1 reply; 4+ messages in thread
From: Berke Durak @ 2016-10-30  6:42 UTC (permalink / raw)
  To: caml-list

[-- Attachment #1: Type: text/plain, Size: 1303 bytes --]

I compiled and ran a simple matrix-multiplication-style benchmark program
written in imperative style with Bigarrays (see below) using versions
4.01.0, 4.03.0, 4.03.0+flambda, 4.04.0+trunk+flambda (with -version
4.04.0+dev9-2015-09-05), 4.04.0+beta2+flambda (4.04.0+beta2) and
4.05+trunk+flambda (4.05.0+dev0-2016-08-01).

Execution time increased by a factor of 2.3 from 4.03+flambda to
4.04+trunk+flambda.  I tried a few of the newer optimization switches
(-rounds X, -unboxed-types, -unbox-closures, etc.) but that didn't make a
significant difference.

Execution times, repeatable within ~5%:

4.01.0                5.45s
4.03.0                4.28s
4.03.0+flambda        4.44s
4.04.0+trunk+flambda  10.45s
4.04.0+beta2+flambda 10.72s
4.05.0+trunk+flambda  10.36s

The test program was:

open Bigarray

let _ =
  let m, n, rep = 100, 100, 1000 in
  let cr m n = Array2.create float64 fortran_layout m n in
  let a = cr m n in
  let c = cr m m in
  let rz = ref 0.0 in
  let x = ref 0.0 in
  for r = 1 to rep do
    for i = 1 to m do
      for j = 1 to n do
a.{i,j} <- !rz;
rz := !rz +. 123.45;
      done
    done;
    for i = 1 to m do
      for j = 1 to m do
x := 0.0;
for k = 1 to n do
 x := !x +. a.{i,k} *. a.{k,i}
done;
c.{i,j} <- !x
      done
    done
  done

-- 
Berke Durak, VA7OBD (CN88)

[-- Attachment #2: Type: text/html, Size: 2378 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [Caml-list] Why is some code compiled with 4.04.0 or 4.05.0 running 2.3 times slower than the same code compiled with 4.03.0?
  2016-10-30  6:42 [Caml-list] Why is some code compiled with 4.04.0 or 4.05.0 running 2.3 times slower than the same code compiled with 4.03.0? Berke Durak
@ 2016-10-30 16:49 ` David Allsopp
  2016-10-31  7:37   ` Mark Shinwell
  0 siblings, 1 reply; 4+ messages in thread
From: David Allsopp @ 2016-10-30 16:49 UTC (permalink / raw)
  To: Berke Durak, caml-list

Berke Durak wrote:
> I compiled and ran a simple matrix-multiplication-style benchmark
> program written in imperative style with Bigarrays (see below) 
> using versions 4.01.0, 4.03.0, 4.03.0+flambda, 4.04.0+trunk+flambda
> (with -version 4.04.0+dev9-2015-09-05), 4.04.0+beta2+flambda 
> (4.04.0+beta2) and 4.05+trunk+flambda (4.05.0+dev0-2016-08-01).
>
> Execution time increased by a factor of 2.3 from 4.03+flambda to
> 4.04+trunk+flambda.  I tried a few of the newer optimization 
> switches (-rounds X, -unboxed-types, -unbox-closures, etc.) but
> that didn't make a significant difference. 
>
> Execution times, repeatable within ~5%:
>
> 4.01.0                5.45s
> 4.03.0                4.28s
> 4.03.0+flambda        4.44s
> 4.04.0+trunk+flambda  10.45s
> 4.04.0+beta2+flambda 10.72s
> 4.05.0+trunk+flambda  10.36s

Note that this is specifically an flambda problem - neither 4.04 nor trunk exhibit the slowdown with flambda disabled.

I've bisected and identified commit f7dcb as the problem (it was added to trunk after 4.03 was branched). The specific issue is the change in https://github.com/ocaml/ocaml/blob/trunk/bytecomp/simplif.ml#L473 disabling conversion of refs to mutable variables when flambda is enabled. If you remove the `&& Config.flambda = false` and rebuild ocamlopt, you should find your benchmark speed restored. flambda is supposed to be performing that optimisation itself, so something is clearly (very) wrong!

Could you open a Mantis PR for this, please?


David

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Why is some code compiled with 4.04.0 or 4.05.0 running 2.3 times slower than the same code compiled with 4.03.0?
  2016-10-30 16:49 ` David Allsopp
@ 2016-10-31  7:37   ` Mark Shinwell
  2016-10-31 23:16     ` Berke Durak
  0 siblings, 1 reply; 4+ messages in thread
From: Mark Shinwell @ 2016-10-31  7:37 UTC (permalink / raw)
  To: David Allsopp; +Cc: Berke Durak, caml-list

Thanks for the report and bisection.  We're looking at this.  Please
file a Mantis report in any case.

Mark

On 30 October 2016 at 16:49, David Allsopp <dra-news@metastack.com> wrote:
> Berke Durak wrote:
>> I compiled and ran a simple matrix-multiplication-style benchmark
>> program written in imperative style with Bigarrays (see below)
>> using versions 4.01.0, 4.03.0, 4.03.0+flambda, 4.04.0+trunk+flambda
>> (with -version 4.04.0+dev9-2015-09-05), 4.04.0+beta2+flambda
>> (4.04.0+beta2) and 4.05+trunk+flambda (4.05.0+dev0-2016-08-01).
>>
>> Execution time increased by a factor of 2.3 from 4.03+flambda to
>> 4.04+trunk+flambda.  I tried a few of the newer optimization
>> switches (-rounds X, -unboxed-types, -unbox-closures, etc.) but
>> that didn't make a significant difference.
>>
>> Execution times, repeatable within ~5%:
>>
>> 4.01.0                5.45s
>> 4.03.0                4.28s
>> 4.03.0+flambda        4.44s
>> 4.04.0+trunk+flambda  10.45s
>> 4.04.0+beta2+flambda 10.72s
>> 4.05.0+trunk+flambda  10.36s
>
> Note that this is specifically an flambda problem - neither 4.04 nor trunk exhibit the slowdown with flambda disabled.
>
> I've bisected and identified commit f7dcb as the problem (it was added to trunk after 4.03 was branched). The specific issue is the change in https://github.com/ocaml/ocaml/blob/trunk/bytecomp/simplif.ml#L473 disabling conversion of refs to mutable variables when flambda is enabled. If you remove the `&& Config.flambda = false` and rebuild ocamlopt, you should find your benchmark speed restored. flambda is supposed to be performing that optimisation itself, so something is clearly (very) wrong!
>
> Could you open a Mantis PR for this, please?
>
>
> David
>
> --
> Caml-list mailing list.  Subscription management and archives:
> https://sympa.inria.fr/sympa/arc/caml-list
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Caml-list] Why is some code compiled with 4.04.0 or 4.05.0 running 2.3 times slower than the same code compiled with 4.03.0?
  2016-10-31  7:37   ` Mark Shinwell
@ 2016-10-31 23:16     ` Berke Durak
  0 siblings, 0 replies; 4+ messages in thread
From: Berke Durak @ 2016-10-31 23:16 UTC (permalink / raw)
  To: Mark Shinwell; +Cc: David Allsopp, caml-list

[-- Attachment #1: Type: text/plain, Size: 2250 bytes --]

Thanks Mark & David.

Issue opened here: https://caml.inria.fr/mantis/view.php?id=7396


On Mon, Oct 31, 2016 at 12:37 AM, Mark Shinwell <mshinwell@janestreet.com>
wrote:

> Thanks for the report and bisection.  We're looking at this.  Please
> file a Mantis report in any case.
>
> Mark
>
> On 30 October 2016 at 16:49, David Allsopp <dra-news@metastack.com> wrote:
> > Berke Durak wrote:
> >> I compiled and ran a simple matrix-multiplication-style benchmark
> >> program written in imperative style with Bigarrays (see below)
> >> using versions 4.01.0, 4.03.0, 4.03.0+flambda, 4.04.0+trunk+flambda
> >> (with -version 4.04.0+dev9-2015-09-05), 4.04.0+beta2+flambda
> >> (4.04.0+beta2) and 4.05+trunk+flambda (4.05.0+dev0-2016-08-01).
> >>
> >> Execution time increased by a factor of 2.3 from 4.03+flambda to
> >> 4.04+trunk+flambda.  I tried a few of the newer optimization
> >> switches (-rounds X, -unboxed-types, -unbox-closures, etc.) but
> >> that didn't make a significant difference.
> >>
> >> Execution times, repeatable within ~5%:
> >>
> >> 4.01.0                5.45s
> >> 4.03.0                4.28s
> >> 4.03.0+flambda        4.44s
> >> 4.04.0+trunk+flambda  10.45s
> >> 4.04.0+beta2+flambda 10.72s
> >> 4.05.0+trunk+flambda  10.36s
> >
> > Note that this is specifically an flambda problem - neither 4.04 nor
> trunk exhibit the slowdown with flambda disabled.
> >
> > I've bisected and identified commit f7dcb as the problem (it was added
> to trunk after 4.03 was branched). The specific issue is the change in
> https://github.com/ocaml/ocaml/blob/trunk/bytecomp/simplif.ml#L473
> disabling conversion of refs to mutable variables when flambda is enabled.
> If you remove the `&& Config.flambda = false` and rebuild ocamlopt, you
> should find your benchmark speed restored. flambda is supposed to be
> performing that optimisation itself, so something is clearly (very) wrong!
> >
> > Could you open a Mantis PR for this, please?
> >
> >
> > David
> >
> > --
> > Caml-list mailing list.  Subscription management and archives:
> > https://sympa.inria.fr/sympa/arc/caml-list
> > Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> > Bug reports: http://caml.inria.fr/bin/caml-bugs
>



-- 
Berke Durak, VA7OBD (CN88)

[-- Attachment #2: Type: text/html, Size: 3695 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-31 23:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-30  6:42 [Caml-list] Why is some code compiled with 4.04.0 or 4.05.0 running 2.3 times slower than the same code compiled with 4.03.0? Berke Durak
2016-10-30 16:49 ` David Allsopp
2016-10-31  7:37   ` Mark Shinwell
2016-10-31 23:16     ` Berke Durak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).