From mboxrd@z Thu Jan  1 00:00:00 1970
Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id LAA14971; Mon, 25 Nov 2002 11:23:19 +0100 (MET)
X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id LAA15023 for <caml-list@pauillac.inria.fr>; Mon, 25 Nov 2002 11:23:18 +0100 (MET)
Received: from pauillac.inria.fr (pauillac.inria.fr [128.93.11.35])
	by concorde.inria.fr (8.11.1/8.11.1) with ESMTP id gAPANCX27904;
	Mon, 25 Nov 2002 11:23:12 +0100 (MET)
Received: (from weis@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id LAA14987; Mon, 25 Nov 2002 11:23:12 +0100 (MET)
From: Pierre Weis <pierre.weis@inria.fr>
Message-Id: <200211251023.LAA14987@pauillac.inria.fr>
Subject: Re: [Caml-list] Some clarifications to the language shootout page
In-Reply-To: <3DE1E6CD.2020906@kfunigraz.ac.at> from Siegfried Gonzi at "Nov 25, 102 10:01:01 am"
To: siegfried.gonzi@kfunigraz.ac.at (Siegfried Gonzi)
Date: Mon, 25 Nov 2002 11:23:12 +0100 (MET)
Cc: caml-list@inria.fr
X-Mailer: ELM [version 2.4ME+ PL28 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-caml-list@pauillac.inria.fr
Precedence: bulk

Hi,

I try to answer your post in a fair manner: I think the Caml and
Scheme code are not equivalent (hence the figures you observed). I
give the Caml code that is (in my mind) equivalent to the Scheme
code. I would be glad if you could run it on your machine to complete
your comparison (and may be modify your conclusions).

[...]
Introduction
------------

The language shootout page has some nice examples where Bigloo and
OCaml stacked up well when compared to C. But I think these
comparisons are only valid when algorithms used for different
languages are as much as possible IDENTICAL. For our family of
languages this requirement is even stronger: code should have been
translated from one language to the other as closely as possible.
Take the matrix-matrix multiplication (MM) for example [...]  (it) is
useful to check some facts. In particular, the fact that the Scheme
and Caml code for this program look very different.

Method
------

Scheme and Caml are close together in spirit and programming power
facilities (yes, I know, there are a lot of important differences, but
for trivial examples such as shoutout's ones, this two functional
languages are extremely close).

Hence, I translated the Bigloo code (the fast one that explicitely
uses +fx and +fl operators) as litterally as I could into Ocaml, and
ran the compiler and benchs.

Results and Discussion
-----------------------

I have NOT assumed that the posted codes there are posted by masters
of their specific language (because it is untrue). However, I assume
that a mere translation of the Scheme code allows a fair comparison
between the two languages.

[...]  The Caml resulting code can be found at the end of this
post. It is easy to copy and paste it for your own tests: memory has
not been observed.

The file mm.orig.ml is the original one you posted to the Caml list.
The file mm.scm.ml is the Caml code translated fron the original Scheme
code you posted to the Caml list.

Benchmarks where run on a Pentium III 750 Mhz 256 Mo Ram running
Linux. I used the latest version of the compiler (3.06). Here are the
figures:

ocamlopt mm.orig.ml
time a.out
23.04user 0.02system 0:23.21elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (143major+1685minor)pagefaults 0swaps

ocamlopt mm.scm.ml
time a.out
16.49user 0.02system 0:16.56elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (134major+1617minor)pagefaults 0swaps

I used the ``-benchmark'' option of ocamlopt, which is generally
assumed to be the combination of -unsage and -inline

   ocamlopt -unsafe -inline 9 <source file>

I then got:

ocamlopt -unsafe -inline 9 mm.orig.ml
time a.out
15.21user 0.04system 0:15.77elapsed 96%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (143major+1685minor)pagefaults 0swaps

ocamlopt -unsafe -inline 9 mm.scm.ml
time a.out
11.30user 0.04system 0:11.36elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (133major+1619minor)pagefaults 0swaps

>Before you scream:
>
> a) I am sure there exists an ocaml option for further optimizations;
> "ocamlopt --help" did not unveil it.

Yes, there is no explicit -bench flag. However, those are explicite
enough (I think)

  -inline <n>  Set aggressiveness of inlining to <n>
  -unsafe  No bounds checking on array and string access

> The second Bigloo version uses ordinary "+" and "-" operators. The first
> version uses Bigloo's operators "+fl", "+fx",... It is incomprehensible
> why the second ordinary Bigloo version uses that much on memory! For a
> dimension of 1024x1024 the overall timing picture is the same, except
> that the normal Bigloo version consumes 80% of the memory as compared to
> the C version which consumes 20%.

This is not difficult to understand: if you use the generic + or - in
Scheme the compiler cannot guess the type of operands and so it is
forced to allocate your floating point numbers (boxing).

[...]
> The original language shootout page results on the MM are misleading:
> 
> a) Pure integers are hardly ever used in reality

I'm afraid you will have a bad time to demonstrate this one.

[...]

Conclusion
----------

> Nevertheless Bigloo remains still a very decent compiler in my opinion,
> and I do not regret that I have made the transition from Python (which
> is one of the most confusing programming languages out there) to Scheme
> speak Bigloo.

Sure Bigloo is a great compiler :)

> People should be more wary that consulting the language shootout page
> can result in misleading conclusions; as you know: benchmarking is black
> art.

Right. Especially if you do not carefully TRANSLATE the code from one
language to the other, reataining the spirit and the algorithm of the
original version.

Before you scream:

- yes, I haven't run all the benchs for all the languages on my
machine, so I cannot compare properly compare. Also, I don't know how
this scales to a faster processor. May be you can add this figures to
your results ?

- yes, I know, there are 2 unused functions in the Scheme version that
I translated also into Caml; and the printf call is not strictly
comparable to the Scheme print call, but I think that this difference
is irrelevant here (not proved).

Pierre Weis.

APPENDIX:
=========================

=========================
Ocaml version
obtained from the original Scheme fast version: (C) P. Weis
usage: 	ocamlopt -unsafe -inline 9 mm_scm.ml
        time ./a.out
=========================

let make_matrix rows cols =
  let mx = Array.make rows [||] in
  let count = ref 1.0 in
  for i = 0 to rows - 1 do
    let row = Array.make cols 0.0 in
    for j = 0 to cols - 1 do
      row.(j) <- !count;
      count := !count +. 1.0
    done;
    mx.(i) <- row
  done;
  mx;;

(* Don't know why there is this dead code into the Scheme version *)
let num_cols mx =
 let row = mx.(0) in
 Array.length row;;

(* Don't know why there is this dead code into the Scheme version *)
let num_rows mx =
 Array.length mx;;

let mmult rows cols m1 m2 =
  let m3 = Array.make rows [||] in
  for i = 0 to rows - 1 do
    let m1i = m1.(i) in
    let row = Array.make cols 0.0 in
    for j = 0 to cols - 1 do
     let v = ref 0.0 in
     for k = 0 to cols - 1 do
       v := !v +. m1i.(k) *. m2.(k).(j);
     done;
     row.(j) <- !v;
    done;
    m3.(i) <- row;
  done;
  m3;;

let do_main size =
 let mm = ref [||] in
 let m1 = make_matrix size size in
 let m2 = make_matrix size size in
 mm := mmult size size m1 m2;
 let r0 = !mm.(0)
 and r2 = !mm.(2)
 and r3 = !mm.(3)
 and r4 = !mm.(4) in
 Printf.printf "%f %f %f %f\n" r0.(0) r2.(3) r3.(2) r4.(4);;

do_main 512;;
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners