From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id B5F8DBBAF for ; Mon, 21 Dec 2009 04:16:14 +0100 (CET) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq4AAEh3LkvUnwckk2dsb2JhbACCGYFxl0gBAQEBCQkKCRMEqH6CJowRgS9KgWNSBIFlGQ X-IronPort-AV: E=Sophos;i="4.47,429,1257116400"; d="scan'208";a="52570140" Received: from relay.ptn-ipout02.plus.net ([212.159.7.36]) by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 21 Dec 2009 04:16:14 +0100 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: ApwEALd3LkvUnw4R/2dsb2JhbACCGYFxwQqCJowRgS9KgWNSBIFlGQ Received: from pih-relay04.plus.net ([212.159.14.17]) by relay.ptn-ipout02.plus.net with ESMTP; 21 Dec 2009 03:16:13 +0000 Received: from [87.114.35.173] (helo=leper.local) by pih-relay04.plus.net with esmtp (Exim) id 1NMYkn-0003tI-0a for caml-list@yquem.inria.fr; Mon, 21 Dec 2009 03:16:13 +0000 From: Jon Harrop Organization: Flying Frog Consultancy Ltd. To: caml-list@yquem.inria.fr Subject: Re: [Caml-list] Re: OCaml is broken Date: Mon, 21 Dec 2009 04:30:17 +0000 User-Agent: KMail/1.9.9 References: <794713.82307.qm@web111510.mail.gq1.yahoo.com> <200912202114.42669.jon@ffconsultancy.com> <1261357694.6545.89.camel@flake.lan.gerd-stolpmann.de> In-Reply-To: <1261357694.6545.89.camel@flake.lan.gerd-stolpmann.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200912210430.17246.jon@ffconsultancy.com> X-Plusnet-Relay: 8fff81c82c7dcb52d7e7b77074f30b85 X-Spam: no; 0.00; ocaml:01 gerd:01 stolpmann:01 hash:01 algebra:01 ocaml:01 ocaml's:01 renders:01 hash:01 real-world:01 runtimes:01 runtimes:01 locality:01 allocating:01 computations:01 On Monday 21 December 2009 01:08:14 Gerd Stolpmann wrote: > > The following web page describes a commercial machine sold by Azul > > Systems that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory > > in a flat SMP configuration: > > > > http://www.azulsystems.com/products/compute_appliance.htm > > > > As you can see, a GC with shared memory can already scale across dozens > > of cores and memory access is no more heterogeneous than it was 20 years > > ago. Also, note that homogeneous memory access is a red herring in this > > context because it does not undermine the utility of a shared heap on a > > multicore. > > The benchmarks they mention can all easily be parallelized - that stuff > you can also do with multi-processing. Only if the result is small, otherwise you spend all of your time deserializing it. With a shared heap, you just return the resulting value by reference. > The interesting thing would be an > inherent parallel algorithm where the same memory region is accessed by > multiple threads. Concurrent hash tables are a big thing for Azul: "Scales well up to 768 CPUs" - http://www.youtube.com/watch?v=WYXgtXWejRM This blog entry describes performance on 750 cores: http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html > Or at least a numeric program (your examples seem to be mostly from that > area). Yes. You can look at matrix operations or linear algebra (QR decomposition) but also things like quicksort and graphics. Would be interesting to compare symbolic performance as well though. > > > - Have you considered that many Ocaml users prefer a GC that offers > > > maximum single core performance, > > > > OCaml's GC is nowhere near offering maximum single core performance. Its > > uniform data representation renders OCaml many times slower than its > > competitors for many tasks. For example, filling a 10M float->float hash > > table is over 18x slower with OCaml than with F#. FFT with a complex > > number type is 5.5x slower with OCaml than F#. Fibonacci with floats is > > 3.3x slower with OCaml than my own HLVM project (!). > > Sure, but these micro benchmarks are first seldom correct, and do not > really count for real-world programs. > > For example, an important parameter of such benchmarks is the frequency > the GC runs. Ocaml runs the GC very often - good for latencies, but bad > for micro benchmarks because other runtimes simply delay the GC until > some limits are exceeded, so these other runtimes often haven't run the > GC even once in the short period of time the benchmark runs. You're missing the point: every example I gave shouldn't be doing any GC at all and doesn't in F# but spends a lot of time in the GC in OCaml just because of unnecessary boxing. The mutator also takes longer because boxing damages locality. > It is simply a fact that the ocaml developers had some preferences. E.g. > allocating and freeing short-living values is extremely fast (often > <10ns). This is very good when you do symbolic computations, or have > lots of small strings, but ignorable for numeric stuff, or for programs > where the lifetime of allocated memory is bound to server sessions. The > minor GC is very fast, but, as you observe, the uniform representation > has costs elsewhere. Yes. That's why I think the best way forward is to develop HLVM. > > > because their application is parallelised via multiple processes > > > communicating via message passing? > > > > A circular argument based upon the self-selected group of remaining OCaml > > users. Today's OCaml users use OCaml despite its shortcomings. If you > > want to see the impact of OCaml's multicore unfriendliness, consider why > > the OCaml community has haemorrhaged 50% of its users in only 2 years. > > Don't see that. That's just speculation - maybe some win32 ocaml users > switched to F#, I wasn't a win32 user. :-) > but there are for sure also other reasons than multicore > support, e.g. GUIs and better Windows integration. Btw, where do you get > your numbers from? Traffic here: 2007: 5814 2008: 4051 2009: 3071 http://groups.google.com/group/fa.caml/about Or searches for OCaml on Google: http://www.google.com/trends?q=ocaml%2Cclojure%2Cf%23 The number of OCaml jobs has crashed as well: http://www.itjobswatch.co.uk/jobs/uk/ocaml.do And, of course, what our customers say. > There are many, many users for whom multicore is just a useless hype. In 2005, the OCaml community was composed largely of performance junkies who came here because OCaml produced excellent performance from succinct and readable code on benchmark after benchmark. More people were buying OFS than were using Coq. I don't believe for a second that many of OCaml's former users thought multicore was just useless hype. > Either the algorithms are inherently difficult to parallelize (and this > is vast majority), I have had great success parallelizing code. > or are that easy (like all client/server stuff) that multi-processing is > sufficient. There are certainly applications where multicore is not beneficial. > You can consider multicore as a marketing trick of the chip > industry to let the ordinary desktop user pay for a feature that is mostly > interesting for datacenters. Ordinary desktop users have been paying top dollar for parallel computers in the form of GPUs for some time now. The use of GPUs for more general programming has been a really hot topic for years and just became viable. Even games consoles have multicores. ARM are making quadcores for your phone and netbook! If I can get HLVM to make parallel OCaml-style programming easy, I think a lot of people would love it. -- Dr Jon Harrop, Flying Frog Consultancy Ltd. http://www.ffconsultancy.com/?e