From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.2 required=5.0 tests=AWL autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) by yquem.inria.fr (Postfix) with ESMTP id 3A8A9BBAF for ; Fri, 25 Sep 2009 23:28:27 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AtYCAMnSvErUnwcki2dsb2JhbACCIJhhAQEBCgsKBxEGrBCPbIQeBQ X-IronPort-AV: E=Sophos;i="4.44,453,1249250400"; d="scan'208";a="33553572" Received: from relay.ptn-ipout02.plus.net ([212.159.7.36]) by mail2-smtp-roc.national.inria.fr with ESMTP/TLS/RC4-SHA; 25 Sep 2009 23:28:26 +0200 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AhQGAMnSvErUnw6T/2dsb2JhbACCIMUxj2yEHgU Received: from ptb-relay03.plus.net ([212.159.14.147]) by relay.ptn-ipout02.plus.net with ESMTP; 25 Sep 2009 22:28:26 +0100 Received: from [87.114.87.187] (helo=leper.local) by ptb-relay03.plus.net with esmtp (Exim) id 1MrIL4-0007tS-0Y for caml-list@yquem.inria.fr; Fri, 25 Sep 2009 22:28:26 +0100 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] OC4MC : OCaml for Multicore architectures Date: Fri, 25 Sep 2009 22:39:42 +0100 User-Agent: KMail/1.9.9 References: <200909241409.56894.jon@ffconsultancy.com> <200909242209.50565.jon@ffconsultancy.com> <20090925.130721.70227045.garrigue@math.nagoya-u.ac.jp> In-Reply-To: <20090925.130721.70227045.garrigue@math.nagoya-u.ac.jp> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200909252239.42656.jon@ffconsultancy.com> X-Plusnet-Relay: 0c072112f4de89b5177f10f7bada5f84 X-Spam: no; 0.00; ocaml:01 hash:01 hashtables:01 ocaml:01 hashing:01 ocaml's:01 hash:01 touted:01 hpc:01 hpc:01 gcc:01 compiler:01 speedup:01 computations:01 speedup:01 On Friday 25 September 2009 05:07:21 Jacques Garrigue wrote: > Your benchmark seems strange to me, as you are comparing apples with > oranges. In some sense, yes. I was interested in the performance of the defacto-standard hash table implementations and not the performance that can be obtained by reinventing the wheel. > Hashtables in Python are a basic feature of the language, > and they are of course implemented in C. In ocaml, they are > implemented in ocaml (except the hashing function, which has to be > polymorphic), using an array of association lists! > (Actually the pairs are flattened for better performance, but still) > What is impressive is that you don't need any special optimization to > get reasonably good performance. OCaml is 4x slower than F# on that benchmark for several reasons: 1. Overhead of 31-bit int arithmetic. 2. Lack of constant table sizes in the implementation and OCaml's failure to optimize mod-by-a-constant. 3. No monomorphization. You can write a far more efficient hash table implementation in F# than you can in OCaml because it addressed all of those deficiencies. > Actually the only tuning you need is to start from a reasonable table size, > which you didn't... No, the exact opposite is true: OCaml had the unfair advantage of starting from the optimal table size for the problem whereas F# started from the default size and had to resize. If you level the playing field then OCaml is 8x slower than F#. > > Even if that were not the case, the idea of cherry picking interpreted > > scripting languages to compete with because OCaml has fallen so far > > behind mainstream languages (let alone modern languages) is embarrassing. > > What's next, OCaml vs Bash for your high performance needs? > > OCaml was never touted as an HPC language! I started learning OCaml because people were running high performance OCaml code on a 256-CPU supercomputer in Cambridge. I have been touting OCaml for HPC ever since. Thousands of scientists and engineers all over the world have used OCaml for technical computing and chose it precisely because it was competitively performant. > The only claim I've seen is that it intends to stay within 2x of C for most > applications. (Which is not so easy these days, gcc getting much faster.) Yes. The infrastructure for compiler writers is improving rapidly as well though, e.g. LLVM. > Actually, I believe that Philippe's point is rather different. > Making a functional language work well on multicores is difficult. > If I tell you that you just have to modify a bit your program to get a > near linear speedup, then it looks great. But in practice it is rather > having to rethink completely your algorithm, Sure. The free lunch is over. However, the solution usually consists either of spawning independent computations or parallelizing outer loops, both of which can be made very easy by the language implementor. > to eventually get a speedup bounded by bandwidth, For some applications under certain circumstances, yes. > and starting from a point lower than the original single thread program. Yes. > There are applications for that (ray tracing is one), but this is not the > kind of needs most people have. Not the kind of needs the remaining OCaml programmers have, perhaps. Outside the OCaml world, a lot of people are now programming for multicores. > By the way, I was discussing with numerical computation people working > on BLAS the other day, and their answer was clear: if you need high > performance, better use a grid than SMP, since bandwidth is > paramount. That is a false dichotomy. Grids are inevitably composed of multicores so you will still lose out if you fail to leverage SMP when programming for a grid. > ...And you have to write in C or FORTRAN (or asm), because the timing of > instructions matter. I have written linear algebra code in F# that outperforms Intel's vendor tuned Fortran (the MKL) by a substantial margin on Intel hardware. Moreover, their code only works on certain types whereas mine is generic. OCaml is an excellent language for this kind of work but it requires an implementation with a performance profile that is very different from OCaml's. > The funniest part was that those people were working on integer > computations, but had to stick to floating point, because timing on integers > is unpredictable, making synchronization harder. Interesting. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e