From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.7 required=5.0 tests=AWL,NO_REAL_NAME autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by yquem.inria.fr (Postfix) with ESMTP id 8D08DBC69 for ; Fri, 9 Feb 2007 22:42:26 +0100 (CET) Received: from server2.thinkcrime.de (server2.thinkcrime.de [213.133.110.149]) by concorde.inria.fr (8.13.6/8.13.6) with ESMTP id l19LgQn2026979 for ; Fri, 9 Feb 2007 22:42:26 +0100 Received: from hod-sarge-2005-10.lan.m-e-leypold.de (dslb-088-072-200-144.pools.arcor-ip.net [88.72.200.144]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by server2.thinkcrime.de (Postfix) with ESMTP id 1229A488032 for ; Fri, 9 Feb 2007 22:42:26 +0100 (CET) Received: by hod-sarge-2005-10.lan.m-e-leypold.de (Postfix, from userid 1003) id DE873376C1; Fri, 9 Feb 2007 22:47:34 +0100 (CET) To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Multiplication of matrix in C and OCaml References: <45CAF3E2.7020807@univ-paris12.fr> <45CAFF5A.2020607@inria.fr> <45CB3ED0.9040200@univ-paris12.fr> <20070209.115842.106265091.garrigue@math.nagoya-u.ac.jp> <1171030953.5302.51.camel@rosella.wigram> Organization: Leypold, Software-Dienstleistungen und -Beratung From: ls-ocaml-developer-2006@m-e-leypold.de Date: Fri, 09 Feb 2007 22:47:34 +0100 In-Reply-To: <1171030953.5302.51.camel@rosella.wigram> (skaller@users.sourceforge.net's message of "Sat, 10 Feb 2007 01:22:33 +1100") Message-ID: <9e4ppv6m15.fsf@hod.lan.m-e-leypold.de> User-Agent: Some cool user agent (SCUG) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Miltered: at concorde with ID 45CCEAC2.000 by Joe's j-chkmail (http://j-chkmail . ensmp . fr)! X-Spam: no; 0.00; ocaml:01 0100,:01 matrices:01 associative:01 compiler:01 compiler:01 associative:01 invokes:01 ocamlopt:01 prefered:01 gcc:01 gcc:01 bug:01 markus:01 stays:98 skaller writes: > On Fri, 2007-02-09 at 11:32 +0100, > ls-ocaml-developer-2006@m-e-leypold.de wrote: >> ls-ocaml-developer-2006@m-e-leypold.de writes: >> >> > optimizations. I'd feel better if the code is benchmarked in a way >> > that the result of the multiplication is output to a file and to >> > subtract the constant contribution of that to the run time that the >> > time is measured for various problem sizes (number of matrices). >> >> Correction: ... the constant contribution of writing the result to a >> file should be subtracted from the run time by measuring with various >> problem sizes. >> >> Hope that is understandable now. > There is no need for that IMHO. The right way to benchmark simple > things like matrix operations AND simultaneously validate them > is to evaluate known laws of arithmetic, such as the associative > law, on various samples. Yes, that's another way. But what stays is the fact that (a) you should somehow use the result of your operations to keep the compiler from just optimzing the operation away (since the compiler propably doesn't know about the distributive law, that would be the right way to go, but I can just imagin if you try to test (A * B) * C == A * (B * C) (the associative law) it might just be that the compiler inlines the function calls and then discovers that this is always true (since it just knows the laws of float arithmetics). Perhaps I'm just being paranoid. > This doesn't require any I/O, nor a 'known correct' program > to generate the comparison data. And as far as correctness goes: The program should not invoke undefined behaviour and if you have 2 programs in 2 different languages that you want to benchmark against each other, they should really do the same thing. I don't see how you could get any meaningfull results with an incorrect program or by comparing 2 programs from which one just invokes undefined behaviour in C. > Generating the samples is harder .. I once used things like > Fibonacci to generate input for integer calculator tests. > You really want to run these benchmarks for at least 5 minutes > and randomly to eliminate cache effects. > > If you can generate increasing sized problems based on a single > linear integer parameter, my Python test harness can handle > randomising the tests, run multiple processes, > and also draw a nice plot of the results, such as these: > > http://felix.sourceforge.net/speed/en_flx_perf_0005.html > http://felix.sourceforge.net/speed/en_flx_perf_0012.html > > full sources etc are here to read: > > http://felix.sourceforge.net/speed/en_flx_perf_top.html > > or you can grab from Felix svn archive (the test harness > is independent of Felix). Or I can email to you (saves > building Felix just to get the test harness ..) I'm not the OP (with the original problem, but I'd be interested in the test harness (if it is documented). I've been thinking about writing something similar for testing algorithms (sometimes the scaling behaviour tells you, what you did wrong or where problem lies), but didn't have the time yet. > Ocamlopt scores quite well thank you! This data is from > two AMD64 machines running Ubuntu Linux and is the result > of HOURS of run time. I'd have prefered no straight lines between the points (physicist know why) and more data points. But yes, this is the kind of benchmark I've been suggesting. Which version of gcc did you use? I'm still a bit wary on the OPs results: That gcc generated code just runs 3 times faster than code from another version from gcc seems to point to an optimization bug, IMHO: There is no way that there was a optimization potential of factor 3 in older gcc versions (else I suppose the Intel cc would have generated much faster than gcc, too, because those people presumably knew what they where doing and weren't likely to make "the same mistakes as the gcc people: So, since I don't believ in miracles ...) > I don't know if it is cache, random context switches, or what, > but on my boxes variations up to 20% on tests under 5 seconds are > normal. Almost the same here. I assume you didn't test in single user mode (I tried it once and the resulting data was much smoother). > Note the measurements are real time. Other measurements > are suspect .. you really SHOULD count VM paging for example. :-). Why me? Regards -- Markus (who isn't really in the benchamark business presently, but only wanted to point to the importance of being earnest^W observing scaling behaviour instead of looking at absolute numbers)