From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <jon@ffconsultancy.com>
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by yquem.inria.fr (Postfix) with ESMTP id B5F8DBBAF
	for <caml-list@yquem.inria.fr>; Mon, 21 Dec 2009 04:16:14 +0100 (CET)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: Aq4AAEh3LkvUnwckk2dsb2JhbACCGYFxl0gBAQEBCQkKCRMEqH6CJowRgS9KgWNSBIFlGQ
X-IronPort-AV: E=Sophos;i="4.47,429,1257116400"; 
   d="scan'208";a="52570140"
Received: from relay.ptn-ipout02.plus.net ([212.159.7.36])
  by mail4-smtp-sop.national.inria.fr with ESMTP/TLS/RC4-SHA; 21 Dec 2009 04:16:14 +0100
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: ApwEALd3LkvUnw4R/2dsb2JhbACCGYFxwQqCJowRgS9KgWNSBIFlGQ
Received: from pih-relay04.plus.net ([212.159.14.17])
  by relay.ptn-ipout02.plus.net with ESMTP; 21 Dec 2009 03:16:13 +0000
Received: from [87.114.35.173] (helo=leper.local)
	 by pih-relay04.plus.net with esmtp (Exim) id 1NMYkn-0003tI-0a
	for caml-list@yquem.inria.fr; Mon, 21 Dec 2009 03:16:13 +0000
From: Jon Harrop <jon@ffconsultancy.com>
Organization: Flying Frog Consultancy Ltd.
To: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Re: OCaml is  broken
Date: Mon, 21 Dec 2009 04:30:17 +0000
User-Agent: KMail/1.9.9
References: <794713.82307.qm@web111510.mail.gq1.yahoo.com> <200912202114.42669.jon@ffconsultancy.com> <1261357694.6545.89.camel@flake.lan.gerd-stolpmann.de>
In-Reply-To: <1261357694.6545.89.camel@flake.lan.gerd-stolpmann.de>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="utf-8"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200912210430.17246.jon@ffconsultancy.com>
X-Plusnet-Relay: 8fff81c82c7dcb52d7e7b77074f30b85
X-Spam: no; 0.00; ocaml:01 gerd:01 stolpmann:01 hash:01 algebra:01 ocaml:01 ocaml's:01 renders:01 hash:01 real-world:01 runtimes:01 runtimes:01 locality:01 allocating:01 computations:01 

On Monday 21 December 2009 01:08:14 Gerd Stolpmann wrote:
> > The following web page describes a commercial machine sold by Azul
> > Systems that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory
> > in a flat SMP configuration:
> >
> >   http://www.azulsystems.com/products/compute_appliance.htm
> >
> > As you can see, a GC with shared memory can already scale across dozens
> > of cores and memory access is no more heterogeneous than it was 20 years
> > ago. Also, note that homogeneous memory access is a red herring in this
> > context because it does not undermine the utility of a shared heap on a
> > multicore.
>
> The benchmarks they mention can all easily be parallelized - that stuff
> you can also do with multi-processing.

Only if the result is small, otherwise you spend all of your time 
deserializing it. With a shared heap, you just return the resulting value by 
reference.

> The interesting thing would be an 
> inherent parallel algorithm where the same memory region is accessed by
> multiple threads.

Concurrent hash tables are a big thing for Azul:

  "Scales well up to 768 CPUs" -
  http://www.youtube.com/watch?v=WYXgtXWejRM

This blog entry describes performance on 750 cores:

  http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html

> Or at least a numeric program (your examples seem to be mostly from that
> area). 

Yes. You can look at matrix operations or linear algebra (QR decomposition) 
but also things like quicksort and graphics.

Would be interesting to compare symbolic performance as well though.

> > > - Have you considered that many Ocaml users prefer a GC that offers
> > > maximum single core performance,
> >
> > OCaml's GC is nowhere near offering maximum single core performance. Its
> > uniform data representation renders OCaml many times slower than its
> > competitors for many tasks. For example, filling a 10M float->float hash
> > table is over 18x slower with OCaml than with F#. FFT with a complex
> > number type is 5.5x slower with OCaml than F#. Fibonacci with floats is
> > 3.3x slower with OCaml than my own HLVM project (!).
>
> Sure, but these micro benchmarks are first seldom correct, and do not
> really count for real-world programs.
>
> For example, an important parameter of such benchmarks is the frequency
> the GC runs. Ocaml runs the GC very often - good for latencies, but bad
> for micro benchmarks because other runtimes simply delay the GC until
> some limits are exceeded, so these other runtimes often haven't run the
> GC even once in the short period of time the benchmark runs.

You're missing the point: every example I gave shouldn't be doing any GC at 
all and doesn't in F# but spends a lot of time in the GC in OCaml just 
because of unnecessary boxing. The mutator also takes longer because boxing 
damages locality.

> It is simply a fact that the ocaml developers had some preferences. E.g.
> allocating and freeing short-living values is extremely fast (often
> <10ns). This is very good when you do symbolic computations, or have
> lots of small strings, but ignorable for numeric stuff, or for programs
> where the lifetime of allocated memory is bound to server sessions. The
> minor GC is very fast, but, as you observe, the uniform representation
> has costs elsewhere.

Yes. That's why I think the best way forward is to develop HLVM.

> > >   because their application is parallelised via multiple processes
> > >   communicating via message passing?
> >
> > A circular argument based upon the self-selected group of remaining OCaml
> > users. Today's OCaml users use OCaml despite its shortcomings. If you
> > want to see the impact of OCaml's multicore unfriendliness, consider why
> > the OCaml community has haemorrhaged 50% of its users in only 2 years.
>
> Don't see that. That's just speculation - maybe some win32 ocaml users
> switched to F#,

I wasn't a win32 user. :-)

> but there are for sure also other reasons than multicore 
> support, e.g. GUIs and better Windows integration. Btw, where do you get
> your numbers from?

Traffic here:

2007: 5814
2008: 4051
2009: 3071

  http://groups.google.com/group/fa.caml/about

Or searches for OCaml on Google:

  http://www.google.com/trends?q=ocaml%2Cclojure%2Cf%23

The number of OCaml jobs has crashed as well:

  http://www.itjobswatch.co.uk/jobs/uk/ocaml.do

And, of course, what our customers say.

> There are many, many users for whom multicore is just a useless hype.

In 2005, the OCaml community was composed largely of performance junkies who 
came here because OCaml produced excellent performance from succinct and 
readable code on benchmark after benchmark. More people were buying OFS than 
were using Coq. I don't believe for a second that many of OCaml's former 
users thought multicore was just useless hype.

> Either the algorithms are inherently difficult to parallelize (and this
> is vast majority),

I have had great success parallelizing code.

> or are that easy (like all client/server stuff) that multi-processing is
> sufficient.

There are certainly applications where multicore is not beneficial.

> You can consider multicore as a marketing trick of the chip 
> industry to let the ordinary desktop user pay for a feature that is mostly
> interesting for datacenters. 

Ordinary desktop users have been paying top dollar for parallel computers in 
the form of GPUs for some time now. The use of GPUs for more general 
programming has been a really hot topic for years and just became viable. 
Even games consoles have multicores. ARM are making quadcores for your phone 
and netbook!

If I can get HLVM to make parallel OCaml-style programming easy, I think a lot 
of people would love it.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e