caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Some clarifications to the language shootout page
@ 2002-11-25  9:01 Siegfried Gonzi
  2002-11-25 10:23 ` Pierre Weis
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Siegfried Gonzi @ 2002-11-25  9:01 UTC (permalink / raw)
  To: caml-list

Introduction
------------

The language shootout page has some nice examples where Bigloo and OCaml
stacked up well when compared to C. But I think this is only valid for
the micro benchmarks. Take the matrix-matrix multiplication (MM) for
example. Though, rarely ever will you be in dire need for a
matrix-matrix multiplication, because there exists better methods for
solving problems in linear algebra. However, the matrix-matrix
multiplication is useful to check some facts.

Method
------

I downloaded the codes for the MM from the language shootout page and
adapted the examples as follows: The overall code scaffolding has been
left as it was with the original version on the language shootout page:

a) The dimension is increased to 512x512
b) The run is just performed for one MM
c) The arrays are double precision


Results and Discussion
-----------------------

Though, the published MM codes at the language shootout page do not use
any sophisticated constructs like loop unrolling or the like, they can
nevertheless be used because they are consistent within their specific
programming language namely C, C++, Scheme (Bigloo), OCaml, Python and
Fortran 90/95. However, I have assumed that the posted codes there are
posted by masters of their specific language (not every code has been
written by the owner of the language shootout page).

The code can be found at the end of this post. It is easy to copy and
paste it for own tests: memory has been observed via "top":

compile
			real/usr/sys  		memory
--------------------------------------------------------------
bigloo -Obench bench.scm	22s/22s/0.1s		6%
bigloo -Obench bench.scm	1m30s/1m30s/0.3s	30%	
ocamlopt bench.ml		38s/38s/0.1s		3%
g++ -O6 bench.c			8s/8s/0.1s		3%
g++ -O6 bench.cpp               8.5s/8.5s/0.1s          5%
python bench.py                 4m16s/4m16s/0.1s        6%
ifc -O3 bench.f90               13s/13s/0.04s           3%

[suffixes: scm...Scheme (Bigloo); ml...OCaml; c...gnu C; cpp...gnu C++;
py...Python; f90...Fortran 90/95]
	
[Machine: laptop Celeron 1000MHz, RAM 256MB, SuSE Linux 8.0, gcc, bigloo
2.5a, ocaml 3.04, Python 2.2, Intel Fortran 90/95]

The second Bigloo version uses ordinary "+" and "-" operators. The first
version uses Bigloo's operators "+fl", "+fx",... It is incomprehensible
why the second ordinary Bigloo version uses that much on memory! For a
dimension of 1024x1024 the overall timing picture is the same, except
that the normal Bigloo version consumes 80% of the memory as compared to
the C version which consumes 20%.

Especially important is the fact that the Python version stacks up very
well; you may not forget Python is just interpreted and not compiled.

The elegant C++ version is of interest too, because it relies on the
vector container and it is possible to completely avoid the awful "*".

Some note to the Fortran version: Fortran starts at 1 (C starts at 0)
when indexing arrays; and the first index runs faster (in C the second
index runs faster). Reading from left to right is more natural (index
wise), so using C arrays is way more pleasant. Bigloo uses C's array
indexing scheme. I hate Fortran for this array indexing nightmare and
hope I didn't mix things up because I was a little bit surprised that C
is faster than Fortran 90. The Fortran 90/95 compiler stems from Intel
and is the one and only free  Fortran 90/95 compiler available for Linux (go to the
Intel site and download the 50MB if you need a Fortran 90/95 compiler).
Nowadays times are over were Fortran is that much faster than C/C++ on
ordinary home PCs. Computing in Science and Engineering had once a nice
article (Theurich, G., et al., 2001: "Making the Fortran-To-C
Transition: How Painful is it Really?", January/February 2001, p.21)
about translating  a 20,000 lines of code program (in the field of
chemistry) from Fortran 77 to C. Everybody would have excpected that the
C version runs slower than the original Fortran version, but it turned
out that the C version runs even slightly faster than the Fortran version.


Before you scream:

a) I am sure there exists an ocaml option for further optimizations;
"ocamlopt --help" did not unveil it.

b) The Bigloo (Scheme) version could surely be improved but how? /I mean
not the code scheme itslef, because this should remain consistent as
compared to the other code versions./ Fore example transposing one array
is not allowed, because this would not only reduce Bigloo's execution
time but also the time of the C version would be reduced by a factor
of 2. Surely, there exists the caveat: But mum told me, we should
not use the same algorithms for every programming language. But mum
didn't tell  that often the reality does not let you do so. And in
addition mum did not tell you that you are not a Forth programmer.

c) I post it here for soliciting comments. I could have gone to bed to
construe my own explanations and theories. But this would likely result
in a desaster, because most of the time it isn't how I would like that
it is.

The original language shootout page results on the MM are misleading:

a) Pure integers are hardly ever used in reality
b) Problems often do not fit into the cache
c) A comparsion of the FFT would be more interesting


Conclusion
----------

Nevertheless Bigloo remains still a very decent compiler in my opinion,
and I do not regret that I have made the transition from Python (which
is one of the most confusing programming languages out there) to Scheme
speak Bigloo.
People should be more wary that consulting the language shootout page
can result in misleading conclusions; as you know: benchmarking is black
art.

S. Gonzi
APPENDIX:
===============================
Normal Bigloo version
original v.: (C) M. Serrano
usage: bigloo -Obench bench.scm
       time ./a.out
===============================
(module bench-matrix)


(define (mkmatrix rows cols)
  (let ((mx (make-vector rows 0.0))
	(count 1.0))
    (do ((i 0 (+ i 1)))
	((= i rows))
      (let ((row (make-vector cols 0.0)))
	(do ((j 0 (+ j 1)))
	    ((= j cols))
	  (vector-set! row j count)
	  (set! count (+ count 1.0)))
	(vector-set! mx i row)))
    mx))

(define (num-cols mx)
  (let ((row (vector-ref mx 0)))
    (vector-length row)))

(define (num-rows mx)
  (vector-length mx))

(define (mmult rows cols m1 m2)
  (let ((m3 (make-vector rows 0.0)))
    (do ((i 0 (+ 1 i)))
	((= i rows))
      (let ((m1i (vector-ref m1 i))
	    (row (make-vector cols 0.0)))
	(do ((j 0 (+ 1 j)))
	    ((= j cols))
	  (let ((val 0.0))
	    (do ((k 0 (+ 1 k)))
		((= k cols))
	      (set! val (+ val (* (vector-ref m1i k)
				  (vector-ref (vector-ref m2 k) j)))))
	    (vector-set! row j val)))
	(vector-set! m3 i row)))
    m3))


(define (do-main size)
  (let ((mm 0.0)
	(m1 (mkmatrix size size))
	(m2 (mkmatrix size size)))
    (set! mm (mmult size size m1 m2))
    (let ((r0 (vector-ref mm 0))
	  (r2 (vector-ref mm 2))
	  (r3 (vector-ref mm 3))
	  (r4 (vector-ref mm 4)))
      (print (vector-ref r0 0) " " (vector-ref r2 3) " "
		  (vector-ref r3 2) " " (vector-ref r4 4)))))

(do-main 512)
====

===============================
Bigloo version based on +fl,...
original v.: (C) M. Serrano
usage: bigloo -Obench bench.scm
       time ./a.out
===============================
(module bench-matrix
	(option (set! *genericity* #f)))


(define (mkmatrix rows cols)
  (let ((mx (make-vector rows 0.0))
	(count 1.0))
    (do ((i 0 (+fx i 1)))
	((=fx i rows))
      (let ((row (make-vector cols 0.0)))
	(do ((j 0 (+fx j 1)))
	    ((=fx j cols))
	  (vector-set! row j count)
	  (set! count (+fl count 1.0)))
	(vector-set! mx i row)))
    mx))

(define (num-cols mx)
  (let ((row (vector-ref mx 0)))
    (vector-length row)))

(define (num-rows mx)
  (vector-length mx))

(define (mmult rows cols m1 m2)
  (let ((m3 (make-vector rows 0.0)))
    (do ((i 0 (+fx 1 i)))
	((=fx i rows))
      (let ((m1i (vector-ref m1 i))
	    (row (make-vector cols 0.0)))
	(do ((j 0 (+fx 1 j)))
	    ((=fx j cols))
	  (let ((val 0.0))
	    (do ((k 0 (+fx 1 k)))
		((=fx k cols))
	      (set! val (+fl val (*fl (vector-ref m1i k)
				  (vector-ref (vector-ref m2 k) j)))))
	    (vector-set! row j val)))
	(vector-set! m3 i row)))
    m3))


(define (do-main size)
  (let ((mm 0.0)
	(m1 (mkmatrix size size))
	(m2 (mkmatrix size size)))
    (set! mm (mmult size size m1 m2))
    (let ((r0 (vector-ref mm 0))
	  (r2 (vector-ref mm 2))
	  (r3 (vector-ref mm 3))
	  (r4 (vector-ref mm 4)))
      (print (vector-ref r0 0) " " (vector-ref r2 3) " "
		  (vector-ref r3 2) " " (vector-ref r4 4)))))

(do-main 512)
====


=========================
Ocaml version
original v.: (C) M. Mottl
usage: 	ocamlopt bench.ml
        time ./a.out
=========================
let mkmatrix rows cols =
  let count = ref 1 and last_col = cols - 1
  and m = Array.make_matrix rows cols 0.0 in
  for i = 0 to rows - 1 do
    let mi = m.(i) in
    for j = 0 to last_col do
        mi.(j) <- float_of_int(!count);
        incr count done;
  done;
  m

let rec inner_loop k v m1i m2 j =
  if k < 0 then v
  else inner_loop (k - 1) (v +. m1i.(k) *. m2.(k).(j)) m1i m2 j

let mmult rows cols m1 m2 m3 =
  let last_col = cols - 1 and last_row = rows - 1 in
  for i = 0 to last_row do
    let m1i = m1.(i) and m3i = m3.(i) in
    for j = 0 to last_col do m3i.(j) <- inner_loop last_row 0.0 m1i m2 j
done;
  done

let size = 512
let m1 = mkmatrix size size and m2 = mkmatrix size size
let m3 = Array.make_matrix size size 0.0;;
mmult size size m1 m2 m3;
Printf.printf "%f %f %f %f\n" m3.(0).(0) m3.(2).(3) m3.(3).(2) m3.(4).(4)
====

=======================
C version
usage: 	g++ -O6 bench.c
	time ./a.out
=======================
#include <iostream>
#include <stdlib.h>

using namespace std;

double **mkmatrix(int rows, int cols) {
    int i, j, count = 1;
    double **m = (double **) malloc(rows * sizeof(double *));
    for (i=0; i<rows; i++) {
	m[i] = (double *) malloc(cols * sizeof(double));
	for (j=0; j<cols; j++) {
	    m[i][j] = count++;
	}
    }
    return(m);
}

void zeromatrix(int rows, int cols, double **m) {
    int i, j;
    for (i=0; i<rows; i++)
	for (j=0; j<cols; j++)
	    m[i][j] = 0;
}

void freematrix(int rows, double **m) {
    while (--rows > -1) { free(m[rows]); }
    free(m);
}

double **mmult(int rows, int cols, double **m1, double **m2, double **m3) {
    int i, j, k;
    double val;
    for (i=0; i<rows; i++) {
	for (j=0; j<cols; j++) {
	    val = double(0.0);
	    for (k=0; k<cols; k++) {
		val += m1[i][k] * m2[k][j];
	    }
	    m3[i][j] = val;
	}
    }
    return(m3);
}

int main()
{
    int i, SIZE=512;

    double **m1 = mkmatrix(SIZE, SIZE);
    double **m2 = mkmatrix(SIZE, SIZE);
    double **mm = mkmatrix(SIZE, SIZE);

    mm = mmult(SIZE, SIZE, m1, m2, mm);

    cout << mm[0][0] << " " << mm[2][3] << " " << mm[3][2] << " " <<
mm[4][4] << endl;

    freematrix(SIZE, m1);
    freematrix(SIZE, m2);
    freematrix(SIZE, mm);
    return(0);
}
====

=========================
C++ version
(C) S. Gonzi
usage: 	g++ -O6 bench.cpp
	time ./a.out
=========================
#include <iostream>
#include <vector>

using namespace std;

vector<vector<double> > mkmatrix(int rows, int cols)
{
    double counter=double(0.0);
    vector<vector<double> > m;


    for(int i=0; i<rows; i++)
    {
	vector<double> row_vec;
	for(int j=0; j<cols; j++)
	{
	    counter = counter + 1.0;
	    row_vec.push_back(counter);
	}
	m.push_back(row_vec);
    }
	return(m);
}




vector<vector<double> > mmult(int rows, int cols,
				vector<vector<double> > m1,
			      	vector<vector<double> > m2,
			 	vector<vector<double> > m3)
{
    double val;
    for(int i=0; i<rows; i++)
    {
	for(int j=0; j<rows; j++)
	{
	    val = double(0.0);
	    for(int k=0; k<cols; k++)
	    {
		val += m1[i][k] * m2[k][j];
	    }
	    m3[i][j] = val;
	}
    }
    return(m3);
}


int main()
{
    int SIZE=512;

    vector<vector<double> > m1 = mkmatrix(SIZE,SIZE);
    vector<vector<double> > m2 = mkmatrix(SIZE,SIZE);
    vector<vector<double> > mm = mkmatrix(SIZE,SIZE);

    mm = mmult(SIZE,SIZE,m1,m2,mm);

    cout << mm[0][0] << " " << mm[2][3] << " " << mm[3][2] << " " <<
mm[4][4] << endl;

    return(0);
}
====


=============================
Python version
usage: 	time python  bench.py
=============================
def mkmatrix(rows, cols):
    count = 1.0
    mx = [ None ] * rows
    for i in xrange(rows):
        mx[i] = [0] * cols
        for j in xrange(cols):
            mx[i][j] = count
            count += 1.0
    return mx

def mmult(rows, cols, m1, m2):
    m3 = [ None ] * rows
    for i in xrange( rows ):
        m3[i] = [0] * cols
        for j in xrange( cols ):
            val = 0.0
            for k in xrange( cols ):
                val += m1[i][k] * m2[k][j]
            m3[i][j] = val
    return m3

size = 512

def main():
    m1 = mkmatrix(size, size)
    m2 = mkmatrix(size, size)
    mm = mmult(size, size, m1, m2)
    print mm[0][0], mm[2][3], mm[3][2], mm[4][4]

main()
====

=============================
Intel Fortran 90/95 version
(C) Siegfried Gonzi
usage: 	ifc -O3  bench.f90
        timer ./a.out
=============================
program main
  integer:: size=512
  double precision, POINTER,dimension(:,:):: m1, m2, mm

  call mkmatrix(size,size,m1)
  call mkmatrix(size,size,m2)
  call mkmatrix(size,size,mm)

  call mmult(size,size,m1,m2,mm)

  print *, mm(1,1)

  deallocate(m1)
  deallocate(m2)
  deallocate(mm)

contains

subroutine mkmatrix(rows, cols,m)
  integer, intent(in):: rows, cols
  double precision, POINTER, dimension(:,:):: m
  double precision:: counter
  integer:: i,j

  allocate(m(rows,cols))

  counter = 1.0
  do i=1,cols
     do j=1,rows
        counter = counter + 1.0
        m(j,i) = counter
     end do
  end do
end subroutine mkmatrix


subroutine mmult(rows, cols, m1, m2, m3)
  integer, intent(in):: rows, cols
  double precision, dimension(rows,cols), intent(in):: m1, m2
  double precision, dimension(rows,cols), intent(in out):: m3
  double precision:: val
  integer:: i,j,k

  do i=1,rows
     do j=1,rows
        val = 0.0
        do k=1,cols
           val = val + m1(k,i)*m2(j,k)
        end do
      m3(j,i) = val
      end do
   end do
end subroutine mmult
end program main
====








-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Some clarifications to the language shootout page
  2002-11-25  9:01 [Caml-list] Some clarifications to the language shootout page Siegfried Gonzi
@ 2002-11-25 10:23 ` Pierre Weis
  2002-11-25 13:03   ` Siegfried Gonzi
  2002-11-25 11:09 ` Siegfried Gonzi
  2002-11-26  9:43 ` Siegfried Gonzi
  2 siblings, 1 reply; 8+ messages in thread
From: Pierre Weis @ 2002-11-25 10:23 UTC (permalink / raw)
  To: Siegfried Gonzi; +Cc: caml-list

Hi,

I try to answer your post in a fair manner: I think the Caml and
Scheme code are not equivalent (hence the figures you observed). I
give the Caml code that is (in my mind) equivalent to the Scheme
code. I would be glad if you could run it on your machine to complete
your comparison (and may be modify your conclusions).

[...]
Introduction
------------

The language shootout page has some nice examples where Bigloo and
OCaml stacked up well when compared to C. But I think these
comparisons are only valid when algorithms used for different
languages are as much as possible IDENTICAL. For our family of
languages this requirement is even stronger: code should have been
translated from one language to the other as closely as possible.
Take the matrix-matrix multiplication (MM) for example [...]  (it) is
useful to check some facts. In particular, the fact that the Scheme
and Caml code for this program look very different.

Method
------

Scheme and Caml are close together in spirit and programming power
facilities (yes, I know, there are a lot of important differences, but
for trivial examples such as shoutout's ones, this two functional
languages are extremely close).

Hence, I translated the Bigloo code (the fast one that explicitely
uses +fx and +fl operators) as litterally as I could into Ocaml, and
ran the compiler and benchs.

Results and Discussion
-----------------------

I have NOT assumed that the posted codes there are posted by masters
of their specific language (because it is untrue). However, I assume
that a mere translation of the Scheme code allows a fair comparison
between the two languages.

[...]  The Caml resulting code can be found at the end of this
post. It is easy to copy and paste it for your own tests: memory has
not been observed.

The file mm.orig.ml is the original one you posted to the Caml list.
The file mm.scm.ml is the Caml code translated fron the original Scheme
code you posted to the Caml list.

Benchmarks where run on a Pentium III 750 Mhz 256 Mo Ram running
Linux. I used the latest version of the compiler (3.06). Here are the
figures:

ocamlopt mm.orig.ml
time a.out
23.04user 0.02system 0:23.21elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (143major+1685minor)pagefaults 0swaps

ocamlopt mm.scm.ml
time a.out
16.49user 0.02system 0:16.56elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (134major+1617minor)pagefaults 0swaps

I used the ``-benchmark'' option of ocamlopt, which is generally
assumed to be the combination of -unsage and -inline

   ocamlopt -unsafe -inline 9 <source file>

I then got:

ocamlopt -unsafe -inline 9 mm.orig.ml
time a.out
15.21user 0.04system 0:15.77elapsed 96%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (143major+1685minor)pagefaults 0swaps

ocamlopt -unsafe -inline 9 mm.scm.ml
time a.out
11.30user 0.04system 0:11.36elapsed 99%CPU (0avgtext+0avgdata
0maxresident)k 0inputs+0outputs (133major+1619minor)pagefaults 0swaps

>Before you scream:
>
> a) I am sure there exists an ocaml option for further optimizations;
> "ocamlopt --help" did not unveil it.

Yes, there is no explicit -bench flag. However, those are explicite
enough (I think)

  -inline <n>  Set aggressiveness of inlining to <n>
  -unsafe  No bounds checking on array and string access

> The second Bigloo version uses ordinary "+" and "-" operators. The first
> version uses Bigloo's operators "+fl", "+fx",... It is incomprehensible
> why the second ordinary Bigloo version uses that much on memory! For a
> dimension of 1024x1024 the overall timing picture is the same, except
> that the normal Bigloo version consumes 80% of the memory as compared to
> the C version which consumes 20%.

This is not difficult to understand: if you use the generic + or - in
Scheme the compiler cannot guess the type of operands and so it is
forced to allocate your floating point numbers (boxing).

[...]
> The original language shootout page results on the MM are misleading:
> 
> a) Pure integers are hardly ever used in reality

I'm afraid you will have a bad time to demonstrate this one.

[...]

Conclusion
----------

> Nevertheless Bigloo remains still a very decent compiler in my opinion,
> and I do not regret that I have made the transition from Python (which
> is one of the most confusing programming languages out there) to Scheme
> speak Bigloo.

Sure Bigloo is a great compiler :)

> People should be more wary that consulting the language shootout page
> can result in misleading conclusions; as you know: benchmarking is black
> art.

Right. Especially if you do not carefully TRANSLATE the code from one
language to the other, reataining the spirit and the algorithm of the
original version.

Before you scream:

- yes, I haven't run all the benchs for all the languages on my
machine, so I cannot compare properly compare. Also, I don't know how
this scales to a faster processor. May be you can add this figures to
your results ?

- yes, I know, there are 2 unused functions in the Scheme version that
I translated also into Caml; and the printf call is not strictly
comparable to the Scheme print call, but I think that this difference
is irrelevant here (not proved).

Pierre Weis.

APPENDIX:
=========================

=========================
Ocaml version
obtained from the original Scheme fast version: (C) P. Weis
usage: 	ocamlopt -unsafe -inline 9 mm_scm.ml
        time ./a.out
=========================

let make_matrix rows cols =
  let mx = Array.make rows [||] in
  let count = ref 1.0 in
  for i = 0 to rows - 1 do
    let row = Array.make cols 0.0 in
    for j = 0 to cols - 1 do
      row.(j) <- !count;
      count := !count +. 1.0
    done;
    mx.(i) <- row
  done;
  mx;;

(* Don't know why there is this dead code into the Scheme version *)
let num_cols mx =
 let row = mx.(0) in
 Array.length row;;

(* Don't know why there is this dead code into the Scheme version *)
let num_rows mx =
 Array.length mx;;

let mmult rows cols m1 m2 =
  let m3 = Array.make rows [||] in
  for i = 0 to rows - 1 do
    let m1i = m1.(i) in
    let row = Array.make cols 0.0 in
    for j = 0 to cols - 1 do
     let v = ref 0.0 in
     for k = 0 to cols - 1 do
       v := !v +. m1i.(k) *. m2.(k).(j);
     done;
     row.(j) <- !v;
    done;
    m3.(i) <- row;
  done;
  m3;;

let do_main size =
 let mm = ref [||] in
 let m1 = make_matrix size size in
 let m2 = make_matrix size size in
 mm := mmult size size m1 m2;
 let r0 = !mm.(0)
 and r2 = !mm.(2)
 and r3 = !mm.(3)
 and r4 = !mm.(4) in
 Printf.printf "%f %f %f %f\n" r0.(0) r2.(3) r3.(2) r4.(4);;

do_main 512;;
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Some clarifications to the language shootout page
  2002-11-25  9:01 [Caml-list] Some clarifications to the language shootout page Siegfried Gonzi
  2002-11-25 10:23 ` Pierre Weis
@ 2002-11-25 11:09 ` Siegfried Gonzi
  2002-11-26  9:43 ` Siegfried Gonzi
  2 siblings, 0 replies; 8+ messages in thread
From: Siegfried Gonzi @ 2002-11-25 11:09 UTC (permalink / raw)
  To: Siegfried Gonzi; +Cc: caml-list

For the sake of completeness: An update to the Fortran 90/95 version. 
This version comes close to C and even surpasses it a bit after adding a 
few lines (the overal matrix matrix multiplication scheme just remains 
valid, though). The author (thanks to François BÉREUX) gave the okay for 
posting it here:

==
Hello,

I reply to you privately about the F90 code of the benchmark you mention (because this
is only Fortran related and not Caml related, but you can quote my e-mail if you want):

there is a primitive in the F90 language called matmul which precisely does dense
matrix-matrix multiplication, and which is pretty well optimized.

I ran the test on a Sun Solaris Ultra SParc III 600 MHz , with the compiler suite
WorkShop 6.2, and I got the following timings :

1) your C version (compiled with the -fast option) :
2.4s/2.02s/0.03s

2) your F90 version (compiled with the -fast option) :
5.2s/4.4s/0.02s

3)a F90 version using matmul (compiled with the -fast option ):
0.8s/0.7s/0.06s

I guess a similar conclusion will hold on your platform.

Now, if you do not want to use the matmul or dot_product primitives and would like to
program in a F77 way, the following code for mmult is better suited than yours:

subroutine mmult(m, n, a, b, c)
integer, intent(in) :: m,n
double precision, dimension(:,:), intent(in) :: a,b
double precision, dimension(:,:), intent(out) :: c
integer :: i,j,k
!! initialize c by 0.
do j = 1, n
   do i = 1, m
      c(i,j) = 0.0
   enddo
enddo
!!
do k = 1, n
   do j = 1, n
      do i =1, m
         c(i,j) = c(i,j) + a(i,k)*b(k,j)
      enddo
   enddo
enddo
!!
end subroutine mmult

4)a F90 version with an improved mmult (compiled with the -fast option ):
2.8s/1.8s/0.07s
which is faster than the C version.

Best regards,

F. Béreux

PS : please find enclosed the F90 code using matmul:

==========
version using matmul
==========
program main
  integer:: size=512
  double precision, POINTER,dimension(:,:):: m1, m2, mm

  call mkmatrix(size,size,m1)
  call mkmatrix(size,size,m2)
  call mkmatrix(size,size,mm)

  mm = matmul(m1,m2)

  print *, mm(1,1)

  deallocate(m1)
  deallocate(m2)
  deallocate(mm)

contains

subroutine mkmatrix(rows, cols,m)
  integer, intent(in):: rows, cols
  double precision, POINTER, dimension(:,:):: m
  double precision:: counter
  integer:: i,j

  allocate(m(rows,cols))

  counter = 1.0
  do i=1,cols
     do j=1,rows
        counter = counter + 1.0
        m(j,i) = counter
     end do
  end do
end subroutine mkmatrix

end program main
===============================


S. Gonzi



-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Some clarifications to the language shootout page
  2002-11-25 10:23 ` Pierre Weis
@ 2002-11-25 13:03   ` Siegfried Gonzi
  0 siblings, 0 replies; 8+ messages in thread
From: Siegfried Gonzi @ 2002-11-25 13:03 UTC (permalink / raw)
  To: Pierre Weis; +Cc: caml-list

Pierre Weis wrote:

>Hi,
>
>I try to answer your post in a fair manner: I think the Caml and
>Scheme code are not equivalent (hence the figures you observed). I
>give the Caml code that is (in my mind) equivalent to the Scheme
>code. I would be glad if you could run it on your machine to complete
>your comparison (and may be modify your conclusions).
>

Pierre Weis translated the Bigloo version to:

====

=========================
Ocaml version
obtained from the original Scheme fast version: (C) P. Weis
usage: 	ocamlopt -unsafe -inline 9 mm_scm.ml
        time ./a.out
=========================

let make_matrix rows cols =
  let mx = Array.make rows [||] in
  let count = ref 1.0 in
  for i = 0 to rows - 1 do
    let row = Array.make cols 0.0 in
    for j = 0 to cols - 1 do
      row.(j) <- !count;
      count := !count +. 1.0
    done;
    mx.(i) <- row
  done;
  mx;;

(* Don't know why there is this dead code into the Scheme version *)
let num_cols mx =
 let row = mx.(0) in
 Array.length row;;

(* Don't know why there is this dead code into the Scheme version *)
let num_rows mx =
 Array.length mx;;

let mmult rows cols m1 m2 =
  let m3 = Array.make rows [||] in
  for i = 0 to rows - 1 do
    let m1i = m1.(i) in
    let row = Array.make cols 0.0 in
    for j = 0 to cols - 1 do
     let v = ref 0.0 in
     for k = 0 to cols - 1 do
       v := !v +. m1i.(k) *. m2.(k).(j);
     done;
     row.(j) <- !v;
    done;
    m3.(i) <- row;
  done;
  m3;;

let do_main size =
 let mm = ref [||] in
 let m1 = make_matrix size size in
 let m2 = make_matrix size size in
 mm := mmult size size m1 m2;
 let r0 = !mm.(0)
 and r2 = !mm.(2)
 and r3 = !mm.(3)
 and r4 = !mm.(4) in
 Printf.printf "%f %f %f %f\n" r0.(0) r2.(3) r3.(2) r4.(4);;

do_main 512;;
====

This new situation ends up in:

I did things on my stationary machine: Pentium II 450MHz, 256MB RAM, 
SuSE Linux 8.0

bigloo -Obench bench.scm : 23s/23s/0.1s
g++ -O6 bench.scm: 7s/7s/0.03s
ocamlopt -unsafe -inline 9 bench1.ml: 18s/18s/0.01
;; Weis's version:
ocamlopt unsafe -inline 9 bench2.ml : 14s/14s/0.01


Maybe the Scheme version could a little bit tweaked too. I am not sure.

One short note: The Scheme version is portable from one Scheme 
implementation to the next and one could write a small macro which 
decides whether to use normal Scheme operators or Bigloo's native ones.


S. Gonzi

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Some clarifications to the language shootout page
  2002-11-25  9:01 [Caml-list] Some clarifications to the language shootout page Siegfried Gonzi
  2002-11-25 10:23 ` Pierre Weis
  2002-11-25 11:09 ` Siegfried Gonzi
@ 2002-11-26  9:43 ` Siegfried Gonzi
  2002-11-26 13:22   ` Pierre Weis
  2 siblings, 1 reply; 8+ messages in thread
From: Siegfried Gonzi @ 2002-11-26  9:43 UTC (permalink / raw)
  To: caml-list; +Cc: Siegfried Gonzi

[-- Attachment #1: Type: text/plain, Size: 2442 bytes --]


> Hi:



The Bigloo boss Manuel  tweaked the MM a little bit (2 little changes 
but with significant consequences):

The Bigloo code can  be reduced to 17s/17/s0.1s

This  is still a little bit higher than the OCaml version but it is damn 
close to within the factor of 2 to C.

Compile it with:

bigloo -v2 -Obench bench.scm -copt "-O3 -fomit-frame-pointer"


Regards,
S. Gonzi


====
Just one more thing, I have changed the Scheme code a very little bit to

make it more similar to the Caml code. That is, I have changed the 
construction of the row vectors. The original initialization 

  (mx (make-vector rows 0.0))

was breaking the CFA because mx what not 
correctly typed vector of vector of double. I have replaced it with:

  (mx (make-vector rows (make-vector 0 0.0)))

Then, you should compile with:

  bigloo -v2 -Obench foo.scm  -copt "-O3 -fomit-frame-pointer"

On my machine with this two modifications I see an improvement of about 17%.
It is still insufficient to defeat Ocaml but we are getting a little closer  [:-)] 

-- 
Manuel

-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
(module bench-matrix
(option (set! *genericity* #f))
(main main))


(define (mkmatrix rows cols)
(let ((mx (make-vector rows (make-vector 0 0.0)))
(count 1.0))
(do ((i 0 (+fx i 1)))
((=fx i rows))
(let ((row (make-vector cols 0.0)))
(do ((j 0 (+fx j 1)))
((=fx j cols))
(vector-set! row j count)
(set! count (+fl count 1.0)))
(vector-set! mx i row)))
mx))

(define (num-cols mx)
(let ((row (vector-ref mx 0)))
(vector-length row)))

(define (num-rows mx)
(vector-length mx))

(define (mmult rows cols m1 m2)
(let ((m3 (make-vector rows (make-vector 0 0.0))))
(do ((i 0 (+fx 1 i)))
((=fx i rows))
(let ((m1i (vector-ref m1 i))
(row (make-vector cols 0.0)))
(do ((j 0 (+fx 1 j)))
((=fx j cols))
(let ((val 0.0))
(do ((k 0 (+fx 1 k)))
((=fx k cols))
(set! val
(+fl val (*fl (vector-ref m1i k)
(vector-ref (vector-ref m2 k) j)))))
(vector-set! row j val)))
(vector-set! m3 i row)))
m3))


(define (do-main size)
(let* ((m1 (mkmatrix size size))
(m2 (mkmatrix size size))
(mm (mmult size size m1 m2)))
(let ((r0 (vector-ref mm 0))
(r2 (vector-ref mm 2))
(r3 (vector-ref mm 3))
(r4 (vector-ref mm 4)))
(print (vector-ref r0 0) " " (vector-ref r2 3) " "
(vector-ref r3 2) " " (vector-ref r4 4)))))

(define (main x)
(do-main 512))
-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----
====






^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Some clarifications to the language shootout page
  2002-11-26  9:43 ` Siegfried Gonzi
@ 2002-11-26 13:22   ` Pierre Weis
  0 siblings, 0 replies; 8+ messages in thread
From: Pierre Weis @ 2002-11-26 13:22 UTC (permalink / raw)
  To: Siegfried Gonzi; +Cc: caml-list, siegfried.gonzi

Hi list,

Here is the conclusion of our mail exchange with Manuel:

> ====
> Just one more thing, I have changed the Scheme code a very little bit to
> 
> make it more similar to the Caml code. That is, I have changed the 
> construction of the row vectors. The original initialization 
> 
>   (mx (make-vector rows 0.0))
> 
> was breaking the CFA because mx what not 
> correctly typed vector of vector of double. I have replaced it with:
> 
>   (mx (make-vector rows (make-vector 0 0.0)))
> 
> Then, you should compile with:
> 
>   bigloo -v2 -Obench foo.scm  -copt "-O3 -fomit-frame-pointer"
> 
> On my machine with this two modifications I see an improvement of about 17%.
> It is still insufficient to defeat Ocaml but we are getting a little closer  [:-)] 
> 
> -- 
> Manuel

Hi Manuel,

As you should have noticed, I did not try to optimize anything in the
Caml code: I just tried a litteral translation from Scheme to Caml. In
effect, the original Caml code was not optimally clear (difficult to
read and difficult to compile) and was compared to a (relatively)
clear Scheme code. Mere translation of the Scheme code to Caml just
improves the runtime by a factor of 2 or so. As you surely know
already, I was obliged to change the initialization, since

   (mx (make-vector rows 0.0))

is ill-typed in Caml.

This is amazing that now this well-typed Caml version, once translated
back to Scheme, improves the original Scheme version!

I think that this proves, once more, that our languages are not as
different as people think they are :)

Cheers,

Pierre Weis

INRIA, Projet Cristal, Pierre.Weis@inria.fr, http://pauillac.inria.fr/~weis/


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Some clarifications to the language shootout page
  2002-11-27 20:50 isaac gouy
@ 2002-11-29  8:32 ` Siegfried Gonzi
  0 siblings, 0 replies; 8+ messages in thread
From: Siegfried Gonzi @ 2002-11-29  8:32 UTC (permalink / raw)
  To: caml-list; +Cc: isaac gouy

isaac gouy wrote:

>Although the Great Language Shootout is no longer
>updated, your new faster imlementations can be sent to
>the Great Win32 Language Shootout:
>
>http://dada.perl.it/shootout/index.html
>
I am not sure how the original language shootout page obtained their 
results, but the new code improves the situation dramatically (I did 
just test double precision because I do not see any sense in using 
integers):

The more or less original language shootout page benchmark 
(dimension=30, repetitions=300): Celeron 1000MHz, RAM 256MB, SuSE 8.0 Linux:

Bigloo: 0.16/0.14/0.01s
OCAML (Weis): 0.12/0.12/0.0s
OCAML (Mottl): 0.6/0.6/0.0s
C: 0.06/0.06/0.0s

for 600 repetitions:

Bigloo: 0.3/0.3/0.02s
OCAML (Weis): 0.23/0.23/0.0s
OCAML (Mottl): 1.1/1.1/0.01s
C: 0.11/0.11/0.01s

I think under best circumstances an OCaml code can come close to C by a 
factor of 2 and Bigloo can come close to C by a factor of 3 if you use 
it for numerically intense calculations . In reality the situation is 
often better. I have a Bigloo and equivalent C++ program which extracts 
numbers from text files. And reading in a  50MB file takes in Bigloo 
(note: ordinary Scheme types and not native Bigloo ones)  just the same 
time (30 seconds) as in C++.

I would always prefer Scheme (Bigloo) because fiddling with types is 
always required by the OCaml compiler (it can be as awful as with Clean) 
whereas in Bigloo you will have to resort to type issues when you 
strieve for speed (this is often a very small part of the code).


S. Gonzi


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Caml-list] Some clarifications to the language shootout page
@ 2002-11-27 20:50 isaac gouy
  2002-11-29  8:32 ` Siegfried Gonzi
  0 siblings, 1 reply; 8+ messages in thread
From: isaac gouy @ 2002-11-27 20:50 UTC (permalink / raw)
  To: caml-list

Although the Great Language Shootout is no longer
updated, your new faster imlementations can be sent to
the Great Win32 Language Shootout:

http://dada.perl.it/shootout/index.html

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-11-29  8:33 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-11-25  9:01 [Caml-list] Some clarifications to the language shootout page Siegfried Gonzi
2002-11-25 10:23 ` Pierre Weis
2002-11-25 13:03   ` Siegfried Gonzi
2002-11-25 11:09 ` Siegfried Gonzi
2002-11-26  9:43 ` Siegfried Gonzi
2002-11-26 13:22   ` Pierre Weis
2002-11-27 20:50 isaac gouy
2002-11-29  8:32 ` Siegfried Gonzi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).