From mboxrd@z Thu Jan 1 00:00:00 1970 Received: (from majordomo@localhost) by pauillac.inria.fr (8.7.6/8.7.3) id JAA10199; Mon, 25 Nov 2002 09:10:33 +0100 (MET) X-Authentication-Warning: pauillac.inria.fr: majordomo set sender to owner-caml-list@pauillac.inria.fr using -f Received: from nez-perce.inria.fr (nez-perce.inria.fr [192.93.2.78]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id JAA10268 for ; Mon, 25 Nov 2002 09:10:32 +0100 (MET) Received: from jerry.kfunigraz.ac.at (GIGAJERRY.kfunigraz.ac.at [143.50.55.161]) by nez-perce.inria.fr (8.11.1/8.11.1) with ESMTP id gAP8AV101460 for ; Mon, 25 Nov 2002 09:10:31 +0100 (MET) Received: from tom.kfunigraz.ac.at (mc_tom [10.10.1.160]) by jerry.kfunigraz.ac.at (8.11.2/8.11.2) with ESMTP id gAP8AVm01490 for ; Mon, 25 Nov 2002 09:10:31 +0100 (MET) Received: from kfunigraz.ac.at (IGAM08AV.kfunigraz.ac.at [143.50.39.35]) by tom.kfunigraz.ac.at (8.11.2/8.11.2) with ESMTP id gAP8AVJ27651 for ; Mon, 25 Nov 2002 09:10:31 +0100 (MET) Message-ID: <3DE1E6CD.2020906@kfunigraz.ac.at> Date: Mon, 25 Nov 2002 10:01:01 +0100 From: Siegfried Gonzi User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020204 X-Accept-Language: en-us MIME-Version: 1.0 To: caml-list@inria.fr Subject: [Caml-list] Some clarifications to the language shootout page Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-caml-list@pauillac.inria.fr Precedence: bulk Introduction ------------ The language shootout page has some nice examples where Bigloo and OCaml stacked up well when compared to C. But I think this is only valid for the micro benchmarks. Take the matrix-matrix multiplication (MM) for example. Though, rarely ever will you be in dire need for a matrix-matrix multiplication, because there exists better methods for solving problems in linear algebra. However, the matrix-matrix multiplication is useful to check some facts. Method ------ I downloaded the codes for the MM from the language shootout page and adapted the examples as follows: The overall code scaffolding has been left as it was with the original version on the language shootout page: a) The dimension is increased to 512x512 b) The run is just performed for one MM c) The arrays are double precision Results and Discussion ----------------------- Though, the published MM codes at the language shootout page do not use any sophisticated constructs like loop unrolling or the like, they can nevertheless be used because they are consistent within their specific programming language namely C, C++, Scheme (Bigloo), OCaml, Python and Fortran 90/95. However, I have assumed that the posted codes there are posted by masters of their specific language (not every code has been written by the owner of the language shootout page). The code can be found at the end of this post. It is easy to copy and paste it for own tests: memory has been observed via "top": compile real/usr/sys memory -------------------------------------------------------------- bigloo -Obench bench.scm 22s/22s/0.1s 6% bigloo -Obench bench.scm 1m30s/1m30s/0.3s 30% ocamlopt bench.ml 38s/38s/0.1s 3% g++ -O6 bench.c 8s/8s/0.1s 3% g++ -O6 bench.cpp 8.5s/8.5s/0.1s 5% python bench.py 4m16s/4m16s/0.1s 6% ifc -O3 bench.f90 13s/13s/0.04s 3% [suffixes: scm...Scheme (Bigloo); ml...OCaml; c...gnu C; cpp...gnu C++; py...Python; f90...Fortran 90/95] [Machine: laptop Celeron 1000MHz, RAM 256MB, SuSE Linux 8.0, gcc, bigloo 2.5a, ocaml 3.04, Python 2.2, Intel Fortran 90/95] The second Bigloo version uses ordinary "+" and "-" operators. The first version uses Bigloo's operators "+fl", "+fx",... It is incomprehensible why the second ordinary Bigloo version uses that much on memory! For a dimension of 1024x1024 the overall timing picture is the same, except that the normal Bigloo version consumes 80% of the memory as compared to the C version which consumes 20%. Especially important is the fact that the Python version stacks up very well; you may not forget Python is just interpreted and not compiled. The elegant C++ version is of interest too, because it relies on the vector container and it is possible to completely avoid the awful "*". Some note to the Fortran version: Fortran starts at 1 (C starts at 0) when indexing arrays; and the first index runs faster (in C the second index runs faster). Reading from left to right is more natural (index wise), so using C arrays is way more pleasant. Bigloo uses C's array indexing scheme. I hate Fortran for this array indexing nightmare and hope I didn't mix things up because I was a little bit surprised that C is faster than Fortran 90. The Fortran 90/95 compiler stems from Intel and is the one and only free Fortran 90/95 compiler available for Linux (go to the Intel site and download the 50MB if you need a Fortran 90/95 compiler). Nowadays times are over were Fortran is that much faster than C/C++ on ordinary home PCs. Computing in Science and Engineering had once a nice article (Theurich, G., et al., 2001: "Making the Fortran-To-C Transition: How Painful is it Really?", January/February 2001, p.21) about translating a 20,000 lines of code program (in the field of chemistry) from Fortran 77 to C. Everybody would have excpected that the C version runs slower than the original Fortran version, but it turned out that the C version runs even slightly faster than the Fortran version. Before you scream: a) I am sure there exists an ocaml option for further optimizations; "ocamlopt --help" did not unveil it. b) The Bigloo (Scheme) version could surely be improved but how? /I mean not the code scheme itslef, because this should remain consistent as compared to the other code versions./ Fore example transposing one array is not allowed, because this would not only reduce Bigloo's execution time but also the time of the C version would be reduced by a factor of 2. Surely, there exists the caveat: But mum told me, we should not use the same algorithms for every programming language. But mum didn't tell that often the reality does not let you do so. And in addition mum did not tell you that you are not a Forth programmer. c) I post it here for soliciting comments. I could have gone to bed to construe my own explanations and theories. But this would likely result in a desaster, because most of the time it isn't how I would like that it is. The original language shootout page results on the MM are misleading: a) Pure integers are hardly ever used in reality b) Problems often do not fit into the cache c) A comparsion of the FFT would be more interesting Conclusion ---------- Nevertheless Bigloo remains still a very decent compiler in my opinion, and I do not regret that I have made the transition from Python (which is one of the most confusing programming languages out there) to Scheme speak Bigloo. People should be more wary that consulting the language shootout page can result in misleading conclusions; as you know: benchmarking is black art. S. Gonzi APPENDIX: =============================== Normal Bigloo version original v.: (C) M. Serrano usage: bigloo -Obench bench.scm time ./a.out =============================== (module bench-matrix) (define (mkmatrix rows cols) (let ((mx (make-vector rows 0.0)) (count 1.0)) (do ((i 0 (+ i 1))) ((= i rows)) (let ((row (make-vector cols 0.0))) (do ((j 0 (+ j 1))) ((= j cols)) (vector-set! row j count) (set! count (+ count 1.0))) (vector-set! mx i row))) mx)) (define (num-cols mx) (let ((row (vector-ref mx 0))) (vector-length row))) (define (num-rows mx) (vector-length mx)) (define (mmult rows cols m1 m2) (let ((m3 (make-vector rows 0.0))) (do ((i 0 (+ 1 i))) ((= i rows)) (let ((m1i (vector-ref m1 i)) (row (make-vector cols 0.0))) (do ((j 0 (+ 1 j))) ((= j cols)) (let ((val 0.0)) (do ((k 0 (+ 1 k))) ((= k cols)) (set! val (+ val (* (vector-ref m1i k) (vector-ref (vector-ref m2 k) j))))) (vector-set! row j val))) (vector-set! m3 i row))) m3)) (define (do-main size) (let ((mm 0.0) (m1 (mkmatrix size size)) (m2 (mkmatrix size size))) (set! mm (mmult size size m1 m2)) (let ((r0 (vector-ref mm 0)) (r2 (vector-ref mm 2)) (r3 (vector-ref mm 3)) (r4 (vector-ref mm 4))) (print (vector-ref r0 0) " " (vector-ref r2 3) " " (vector-ref r3 2) " " (vector-ref r4 4))))) (do-main 512) ==== =============================== Bigloo version based on +fl,... original v.: (C) M. Serrano usage: bigloo -Obench bench.scm time ./a.out =============================== (module bench-matrix (option (set! *genericity* #f))) (define (mkmatrix rows cols) (let ((mx (make-vector rows 0.0)) (count 1.0)) (do ((i 0 (+fx i 1))) ((=fx i rows)) (let ((row (make-vector cols 0.0))) (do ((j 0 (+fx j 1))) ((=fx j cols)) (vector-set! row j count) (set! count (+fl count 1.0))) (vector-set! mx i row))) mx)) (define (num-cols mx) (let ((row (vector-ref mx 0))) (vector-length row))) (define (num-rows mx) (vector-length mx)) (define (mmult rows cols m1 m2) (let ((m3 (make-vector rows 0.0))) (do ((i 0 (+fx 1 i))) ((=fx i rows)) (let ((m1i (vector-ref m1 i)) (row (make-vector cols 0.0))) (do ((j 0 (+fx 1 j))) ((=fx j cols)) (let ((val 0.0)) (do ((k 0 (+fx 1 k))) ((=fx k cols)) (set! val (+fl val (*fl (vector-ref m1i k) (vector-ref (vector-ref m2 k) j))))) (vector-set! row j val))) (vector-set! m3 i row))) m3)) (define (do-main size) (let ((mm 0.0) (m1 (mkmatrix size size)) (m2 (mkmatrix size size))) (set! mm (mmult size size m1 m2)) (let ((r0 (vector-ref mm 0)) (r2 (vector-ref mm 2)) (r3 (vector-ref mm 3)) (r4 (vector-ref mm 4))) (print (vector-ref r0 0) " " (vector-ref r2 3) " " (vector-ref r3 2) " " (vector-ref r4 4))))) (do-main 512) ==== ========================= Ocaml version original v.: (C) M. Mottl usage: ocamlopt bench.ml time ./a.out ========================= let mkmatrix rows cols = let count = ref 1 and last_col = cols - 1 and m = Array.make_matrix rows cols 0.0 in for i = 0 to rows - 1 do let mi = m.(i) in for j = 0 to last_col do mi.(j) <- float_of_int(!count); incr count done; done; m let rec inner_loop k v m1i m2 j = if k < 0 then v else inner_loop (k - 1) (v +. m1i.(k) *. m2.(k).(j)) m1i m2 j let mmult rows cols m1 m2 m3 = let last_col = cols - 1 and last_row = rows - 1 in for i = 0 to last_row do let m1i = m1.(i) and m3i = m3.(i) in for j = 0 to last_col do m3i.(j) <- inner_loop last_row 0.0 m1i m2 j done; done let size = 512 let m1 = mkmatrix size size and m2 = mkmatrix size size let m3 = Array.make_matrix size size 0.0;; mmult size size m1 m2 m3; Printf.printf "%f %f %f %f\n" m3.(0).(0) m3.(2).(3) m3.(3).(2) m3.(4).(4) ==== ======================= C version usage: g++ -O6 bench.c time ./a.out ======================= #include #include using namespace std; double **mkmatrix(int rows, int cols) { int i, j, count = 1; double **m = (double **) malloc(rows * sizeof(double *)); for (i=0; i -1) { free(m[rows]); } free(m); } double **mmult(int rows, int cols, double **m1, double **m2, double **m3) { int i, j, k; double val; for (i=0; i #include using namespace std; vector > mkmatrix(int rows, int cols) { double counter=double(0.0); vector > m; for(int i=0; i row_vec; for(int j=0; j > mmult(int rows, int cols, vector > m1, vector > m2, vector > m3) { double val; for(int i=0; i > m1 = mkmatrix(SIZE,SIZE); vector > m2 = mkmatrix(SIZE,SIZE); vector > mm = mkmatrix(SIZE,SIZE); mm = mmult(SIZE,SIZE,m1,m2,mm); cout << mm[0][0] << " " << mm[2][3] << " " << mm[3][2] << " " << mm[4][4] << endl; return(0); } ==== ============================= Python version usage: time python bench.py ============================= def mkmatrix(rows, cols): count = 1.0 mx = [ None ] * rows for i in xrange(rows): mx[i] = [0] * cols for j in xrange(cols): mx[i][j] = count count += 1.0 return mx def mmult(rows, cols, m1, m2): m3 = [ None ] * rows for i in xrange( rows ): m3[i] = [0] * cols for j in xrange( cols ): val = 0.0 for k in xrange( cols ): val += m1[i][k] * m2[k][j] m3[i][j] = val return m3 size = 512 def main(): m1 = mkmatrix(size, size) m2 = mkmatrix(size, size) mm = mmult(size, size, m1, m2) print mm[0][0], mm[2][3], mm[3][2], mm[4][4] main() ==== ============================= Intel Fortran 90/95 version (C) Siegfried Gonzi usage: ifc -O3 bench.f90 timer ./a.out ============================= program main integer:: size=512 double precision, POINTER,dimension(:,:):: m1, m2, mm call mkmatrix(size,size,m1) call mkmatrix(size,size,m2) call mkmatrix(size,size,mm) call mmult(size,size,m1,m2,mm) print *, mm(1,1) deallocate(m1) deallocate(m2) deallocate(mm) contains subroutine mkmatrix(rows, cols,m) integer, intent(in):: rows, cols double precision, POINTER, dimension(:,:):: m double precision:: counter integer:: i,j allocate(m(rows,cols)) counter = 1.0 do i=1,cols do j=1,rows counter = counter + 1.0 m(j,i) = counter end do end do end subroutine mkmatrix subroutine mmult(rows, cols, m1, m2, m3) integer, intent(in):: rows, cols double precision, dimension(rows,cols), intent(in):: m1, m2 double precision, dimension(rows,cols), intent(in out):: m3 double precision:: val integer:: i,j,k do i=1,rows do j=1,rows val = 0.0 do k=1,cols val = val + m1(k,i)*m2(j,k) end do m3(j,i) = val end do end do end subroutine mmult end program main ==== ------------------- To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/ Beginner's list: http://groups.yahoo.com/group/ocaml_beginners