caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* Re: OCaml is broken
@ 2009-12-19 19:38 Jeff Shaw
  2009-12-20  4:43 ` [Caml-list] " Jon Harrop
  2009-12-20 11:56 ` [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken Erik Rigtorp
  0 siblings, 2 replies; 57+ messages in thread
From: Jeff Shaw @ 2009-12-19 19:38 UTC (permalink / raw)
  To: caml-list

My understanding is that since jocaml uses the regular ocaml runtime, it 
is also not multicore enabled.

Haskell is a functional language that has good performance that can use 
multiple processors, but the learning curve is steeper and higher.

OCaml is a close relative of Standard ML, so there might be some 
implementation of SML that you like. MLTon might allow multicore use, 
but I'm not sure how mature it is. SML/NJ has a library or language 
extension called Concurrent ML, but I think SML/NJ might not use 
multiple processors.

Note that if you're not using a lot of threads, you can use Unix.fork to 
do true multithreaded programming ocaml.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-19 19:38 OCaml is broken Jeff Shaw
@ 2009-12-20  4:43 ` Jon Harrop
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  2009-12-21 13:31   ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow
  2009-12-20 11:56 ` [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken Erik Rigtorp
  1 sibling, 2 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-20  4:43 UTC (permalink / raw)
  To: caml-list

On Saturday 19 December 2009 19:38:41 Jeff Shaw wrote:
> My understanding is that since jocaml uses the regular ocaml runtime, it
> is also not multicore enabled.
>
> Haskell is a functional language that has good performance

GHC and the Haskell language itself have serious performance problems.

> that can use multiple processors, but the learning curve is steeper and
> higher. 

And Haskell lacks many of the features OCaml programmers take for granted.

> OCaml is a close relative of Standard ML, so there might be some
> implementation of SML that you like. MLTon might allow multicore use,
> but I'm not sure how mature it is. SML/NJ has a library or language
> extension called Concurrent ML, but I think SML/NJ might not use
> multiple processors.

MLton and SML/NJ are both multicore incapable. The PolyML implementation of 
SML is multicore friendly but last time I looked (many years ago) it was 100x 
slower than OCaml for floating point.

As long as you're looking at OCaml's close relatives with multicore support, 
F# is your only viable option. Soon, HLVM will provide a cross-platform open 
source solution. If you look further you will also find Scala and Clojure.

> Note that if you're not using a lot of threads, you can use Unix.fork to 
> do true multithreaded programming ocaml.

We've discussed the problems with that before. Writing a parallel generic 
quicksort seems to be a good test of a decent multicore capable language 
implementation. Currently, F# is a *long* way ahead of everything open 
source.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken
  2009-12-19 19:38 OCaml is broken Jeff Shaw
  2009-12-20  4:43 ` [Caml-list] " Jon Harrop
@ 2009-12-20 11:56 ` Erik Rigtorp
  1 sibling, 0 replies; 57+ messages in thread
From: Erik Rigtorp @ 2009-12-20 11:56 UTC (permalink / raw)
  Cc: caml-list

On Sat, Dec 19, 2009 at 20:38, Jeff Shaw <shawjef3@msu.edu> wrote:
> My understanding is that since jocaml uses the regular ocaml runtime, it is
> also not multicore enabled.
>
> Haskell is a functional language that has good performance that can use
> multiple processors, but the learning curve is steeper and higher.

Haskell has as far as i've understood very poor and unpredictable performance.

> Note that if you're not using a lot of threads, you can use Unix.fork to do
> true multithreaded programming ocaml.

I would get to much IPC latency with that solution.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is  broken
  2009-12-20  4:43 ` [Caml-list] " Jon Harrop
@ 2009-12-20 12:21   ` Erik Rigtorp
  2009-12-20 13:22     ` Martin Jambon
                       ` (4 more replies)
  2009-12-21 13:31   ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow
  1 sibling, 5 replies; 57+ messages in thread
From: Erik Rigtorp @ 2009-12-20 12:21 UTC (permalink / raw)
  Cc: caml-list

On Sun, Dec 20, 2009 at 05:43, Jon Harrop <jon@ffconsultancy.com> wrote:
> As long as you're looking at OCaml's close relatives with multicore support,
> F# is your only viable option. Soon, HLVM will provide a cross-platform open
> source solution. If you look further you will also find Scala and Clojure.

F# is not viable since i'm developing for Solaris. I also believe the
.NET GC is not good enough for real-time systems. Clojure running
under real-time Java might be interesting.

It's too bad that INRIA is not interested in fixing this bug. No
matter what people say I consider this a bug. Two cores is standard by
now, I'm used to 8, next year 32 and so on. OCaml will only become
more and more irrelevant. I hate to see that happening.

I think right now only Erlang got this right and they have a great
library for developing enterprise applications too!

The first step for OCaml would be to be able to run multiple
communicating instances of the runtime bound to one core each in one
process and have them communicate via lock free queues.

Erik Rigtorp


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
@ 2009-12-20 13:22     ` Martin Jambon
  2009-12-20 13:47     ` Yaron Minsky
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 57+ messages in thread
From: Martin Jambon @ 2009-12-20 13:22 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: caml-list

Erik Rigtorp wrote:

> It's too bad that INRIA is not interested in fixing this bug.

Ask Santa Claus, you'll get it by Friday.  Free shipping.


;-)

Martin


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  2009-12-20 13:22     ` Martin Jambon
@ 2009-12-20 13:47     ` Yaron Minsky
  2009-12-20 16:01       ` Gerd Stolpmann
  2009-12-21 22:50       ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  2009-12-20 14:27     ` Dario Teixeira
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 57+ messages in thread
From: Yaron Minsky @ 2009-12-20 13:47 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]

On Sun, Dec 20, 2009 at 7:21 AM, Erik Rigtorp <erik@rigtorp.com> wrote:

> The first step for OCaml would be to be able to run multiple
> communicating instances of the runtime bound to one core each in one
> process and have them communicate via lock free queues.


We've done some experiments in this direction at Jane Street.  On Linux,
we've been able to get fast enough IPC channels for our purposes that
slamming things into the same memory space has not in the end been
necessary.  (There is I agree some pain associated with running multiple
runtimes in the same process.  If you're interested, contact me off-list and
I can try to get you some of the details of what we ran into.)

But have you tried using shared-memory segments for communicating between
different processes?  You say the latencies are too high, but do you have
any measurements you could share? Have you tried queues using shared memory
segments, in particular?  Inter-thread communication has latency as well,
and the performance issues depend on lots of things, OS and hardware
platform included.  It would help in understanding the tradeoffs.

As we go to higher-and-higher numbers of cores, I suspect that
message-passing solutions are likely to scale better than shared memory, so
I'm not so sure that OCaml is on the wrong path here.  I think that most of
the work that's needed is going to come in the form of libraries, with only
a little work in the compiler and the runtime.  Given that, I think this is
an issue for the community to solve, not INRIA.

y

[-- Attachment #2: Type: text/html, Size: 1916 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  2009-12-20 13:22     ` Martin Jambon
  2009-12-20 13:47     ` Yaron Minsky
@ 2009-12-20 14:27     ` Dario Teixeira
  2009-12-20 21:14       ` Jon Harrop
  2009-12-20 19:38     ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop
  2009-12-21 13:07     ` [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] " Damien Doligez
  4 siblings, 1 reply; 57+ messages in thread
From: Dario Teixeira @ 2009-12-20 14:27 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: caml-list

Hi,

> It's too bad that INRIA is not interested in fixing this bug. No
> matter what people say I consider this a bug. Two cores is standard by
> now, I'm used to 8, next year 32 and so on. OCaml will only become
> more and more irrelevant. I hate to see that happening.

This is a perennial topic in this list.  Without meaning to dwell too
long on old arguments, I simply ask you to consider the following:

- Do you really think a concurrent GC with shared memory will scale neatly
  to those 32 cores?

- Will memory access remain homogeneous for all cores as soon as we get into
  the dozens of cores?

- Have you considered that many Ocaml users prefer a GC that offers maximum
  single core performance, because their application is parallelised via
  multiple processes communicating via message passing?  In this context,
  your "bug" is actually a "feature".

Best regards,
Dario Teixeira






^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 13:47     ` Yaron Minsky
@ 2009-12-20 16:01       ` Gerd Stolpmann
  2009-12-21 22:50       ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  1 sibling, 0 replies; 57+ messages in thread
From: Gerd Stolpmann @ 2009-12-20 16:01 UTC (permalink / raw)
  To: yminsky; +Cc: Erik Rigtorp, caml-list


Am Sonntag, den 20.12.2009, 08:47 -0500 schrieb Yaron Minsky:
> On Sun, Dec 20, 2009 at 7:21 AM, Erik Rigtorp <erik@rigtorp.com>
> wrote:
>         The first step for OCaml would be to be able to run multiple
>         communicating instances of the runtime bound to one core each
>         in one
>         process and have them communicate via lock free queues.
> 
> 
> We've done some experiments in this direction at Jane Street.  On
> Linux, we've been able to get fast enough IPC channels for our
> purposes that slamming things into the same memory space has not in
> the end been necessary.  (There is I agree some pain associated with
> running multiple runtimes in the same process.  If you're interested,
> contact me off-list and I can try to get you some of the details of
> what we ran into.)
> 
> 
> But have you tried using shared-memory segments for communicating
> between different processes?  You say the latencies are too high, but
> do you have any measurements you could share? Have you tried queues
> using shared memory segments, in particular?  Inter-thread
> communication has latency as well, and the performance issues depend
> on lots of things, OS and hardware platform included.  It would help
> in understanding the tradeoffs.

I'm also experimenting now with shared memory (shm) as fast IPC
mechanism. I've extended ocamlnet with a few functions that allow to
copy an ocaml value into a shm segment which is accessible as bigarray:

https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli

Look especially for init_string. (I've also to mention Ancient here
which inspired to this work.)

Having ocaml values in shm saves us from some marshalling costs which is
right now the biggest performance penalty when using multiple processes.
However, this causes some problems, and at some point modifications of
the ocaml runtime will be necessary:

- The polymorphic equality and hash primitives do not work anymore
  for values in such shm segments (and that really hurts,
  especially string comparison is broken)
- Given that the shm segment is set to read-only after being set up, it
  is not possible to have pointers from shm to other memory regions.
  This is good, as this would be very dangerous (GC may delete or move
  values in the regular heap). However, the question arises when the
  shm segment can be deleted. We would need help by the GC to identify
  segments that are no longer referenced.

Without that, shm will be restricted to a role as low-level
inter-process buffers. 

> As we go to higher-and-higher numbers of cores, I suspect that
> message-passing solutions are likely to scale better than shared
> memory, so I'm not so sure that OCaml is on the wrong path here.  I
> think that most of the work that's needed is going to come in the form
> of libraries, with only a little work in the compiler and the
> runtime.  Given that, I think this is an issue for the community to
> solve, not INRIA.

Well, message passing and shm do not exclude each other. We should
refine the terminology here: Actually, shm is just a basic mechanism
where several execution threads (including processes) can share memory.
What's often meant is, however, the role it plays for multi-threading,
i.e. shared mutable data structures. What's typical here is that several
threads write to the same memory regions. I don't know a good name for
naming that programming style - maybe multi-threading style shm is the
best.

I'm working on a local message passing queue that can be used for long
messages, based on shm, and where the messages can contain normal ocaml
values (although it is likely that these are copied to the normal heap
by the receiver, for the above mentioned reasons, but this is an
expensive copy). The whole point will be that the data marshalling costs
are minimized. So far I can already say, we will need some changes in
the runtime to make such a mechanism fast and safe.

Gerd

-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
                       ` (2 preceding siblings ...)
  2009-12-20 14:27     ` Dario Teixeira
@ 2009-12-20 19:38     ` Jon Harrop
  2009-12-21 12:26       ` Mihamina Rakotomandimby
  2009-12-21 13:07     ` [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] " Damien Doligez
  4 siblings, 1 reply; 57+ messages in thread
From: Jon Harrop @ 2009-12-20 19:38 UTC (permalink / raw)
  To: caml-list

On Sunday 20 December 2009 12:21:44 Erik Rigtorp wrote:
> On Sun, Dec 20, 2009 at 05:43, Jon Harrop <jon@ffconsultancy.com> wrote:
> > As long as you're looking at OCaml's close relatives with multicore
> > support, F# is your only viable option. Soon, HLVM will provide a
> > cross-platform open source solution. If you look further you will also
> > find Scala and Clojure.
>
> F# is not viable since i'm developing for Solaris.

Yes. F# is Windows only for all intents and purposes.

> I also believe the .NET GC is not good enough for real-time systems.

Although heavily allocating threads will experience pauses of up to several 
seconds on .NET. However, threads that do not exceed their allocation quota 
run almost completely concurrently with the GC, so their real-time 
performance characteristics are good. This is the key to keeping UI threads 
responsive.

Note that OCaml's GC has some problems. Specifically, the stack and arrays of 
pointers in the heap are not traversed incrementally, incurring 
arbitrarily-long stalls.

> Clojure running under real-time Java might be interesting.

Sounds like you have hard RT guarantees.

> It's too bad that INRIA is not interested in fixing this bug.

They spent something like a decade trying to write a decent concurrent GC and 
pioneered the field.

> No matter what people say I consider this a bug.

A perf bug at best: it just means that OCaml is slower for many tasks.

> Two cores is standard by 
> now, I'm used to 8, next year 32 and so on. OCaml will only become
> more and more irrelevant. I hate to see that happening.

Me too. The OCaml language will continue to kick ass for some time to come but 
INRIA's implementation is no longer competitively performant for many tasks. 
However, open source offerings are all quite dire, particularly stand-alone 
ones.

> I think right now only Erlang got this right and they have a great
> library for developing enterprise applications too!

I couldn't disagree more. The *only* reason to work on parallelism is 
performance and Erlang's performance sucks. I know Erlang scales better, but 
it scales from poor absolute performance on 1 core to poor absolute 
performance on n cores. Hence Erlang is hardly the defacto standard for HPC 
on shared-memory supercomputers.

> The first step for OCaml would be to be able to run multiple
> communicating instances of the runtime bound to one core each in one
> process and have them communicate via lock free queues.

I think the first step is simply to replace OCaml's GC with a stop-the-world 
parallel one like the one I wrote for HLVM. The problem is that OCaml's data 
representation gives absolutely dire performance and kills scalability if you 
do that. So you either need to optimize the GC for this or rewrite everything 
from the ground up. OC4MC is doing the former. My HLVM project is doing the 
latter. Suffice to say, there is no easy solution (although I prefer 
mine ;-).

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 14:27     ` Dario Teixeira
@ 2009-12-20 21:14       ` Jon Harrop
  2009-12-21  1:08         ` Gerd Stolpmann
  0 siblings, 1 reply; 57+ messages in thread
From: Jon Harrop @ 2009-12-20 21:14 UTC (permalink / raw)
  To: caml-list

On Sunday 20 December 2009 14:27:00 Dario Teixeira wrote:
> Hi,
>
> > It's too bad that INRIA is not interested in fixing this bug. No
> > matter what people say I consider this a bug. Two cores is standard by
> > now, I'm used to 8, next year 32 and so on. OCaml will only become
> > more and more irrelevant. I hate to see that happening.
>
> This is a perennial topic in this list.  Without meaning to dwell too
> long on old arguments, I simply ask you to consider the following:
>
> - Do you really think a concurrent GC with shared memory will scale neatly
> to those 32 cores?
>
> - Will memory access remain homogeneous for all cores as soon as we get
> into the dozens of cores?

The following web page describes a commercial machine sold by Azul Systems 
that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in a flat 
SMP configuration:

  http://www.azulsystems.com/products/compute_appliance.htm

As you can see, a GC with shared memory can already scale across dozens of 
cores and memory access is no more heterogeneous than it was 20 years ago. 
Also, note that homogeneous memory access is a red herring in this context 
because it does not undermine the utility of a shared heap on a multicore.

> - Have you considered that many Ocaml users prefer a GC that offers maximum
> single core performance, 

OCaml's GC is nowhere near offering maximum single core performance. Its 
uniform data representation renders OCaml many times slower than its 
competitors for many tasks. For example, filling a 10M float->float hash 
table is over 18x slower with OCaml than with F#. FFT with a complex number 
type is 5.5x slower with OCaml than F#. Fibonacci with floats is 3.3x slower 
with OCaml than my own HLVM project (!).

>   because their application is parallelised via multiple processes
>   communicating via message passing? 

A circular argument based upon the self-selected group of remaining OCaml 
users. Today's OCaml users use OCaml despite its shortcomings. If you want to 
see the impact of OCaml's multicore unfriendliness, consider why the OCaml 
community has haemorrhaged 50% of its users in only 2 years.

> In this context, your "bug" is actually a "feature".

I'm not even sure you can substantiate that in the very specific context of 
distributed parallel theorem provers because other languages are so much more 
efficient at handling common abstractions like parametric polymorphism. Got 
any benchmarks?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 21:14       ` Jon Harrop
@ 2009-12-21  1:08         ` Gerd Stolpmann
  2009-12-21  4:30           ` Jon Harrop
  2009-12-26 17:08           ` orbitz
  0 siblings, 2 replies; 57+ messages in thread
From: Gerd Stolpmann @ 2009-12-21  1:08 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

> The following web page describes a commercial machine sold by Azul Systems 
> that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in a flat 
> SMP configuration:
> 
>   http://www.azulsystems.com/products/compute_appliance.htm
> 
> As you can see, a GC with shared memory can already scale across dozens of 
> cores and memory access is no more heterogeneous than it was 20 years ago. 
> Also, note that homogeneous memory access is a red herring in this context 
> because it does not undermine the utility of a shared heap on a multicore.

The benchmarks they mention can all easily be parallelized - that stuff
you can also do with multi-processing. The interesting thing would be an
inherent parallel algorithm where the same memory region is accessed by
multiple threads. Or at least a numeric program (your examples seem to
be mostly from that area).

> > - Have you considered that many Ocaml users prefer a GC that offers maximum
> > single core performance, 
> 
> OCaml's GC is nowhere near offering maximum single core performance. Its 
> uniform data representation renders OCaml many times slower than its 
> competitors for many tasks. For example, filling a 10M float->float hash 
> table is over 18x slower with OCaml than with F#. FFT with a complex number 
> type is 5.5x slower with OCaml than F#. Fibonacci with floats is 3.3x slower 
> with OCaml than my own HLVM project (!).

Sure, but these micro benchmarks are first seldom correct, and do not
really count for real-world programs.

For example, an important parameter of such benchmarks is the frequency
the GC runs. Ocaml runs the GC very often - good for latencies, but bad
for micro benchmarks because other runtimes simply delay the GC until
some limits are exceeded, so these other runtimes often haven't run the
GC even once in the short period of time the benchmark runs.

It is simply a fact that the ocaml developers had some preferences. E.g.
allocating and freeing short-living values is extremely fast (often
<10ns). This is very good when you do symbolic computations, or have
lots of small strings, but ignorable for numeric stuff, or for programs
where the lifetime of allocated memory is bound to server sessions. The
minor GC is very fast, but, as you observe, the uniform representation
has costs elsewhere.

> >   because their application is parallelised via multiple processes
> >   communicating via message passing? 
> 
> A circular argument based upon the self-selected group of remaining OCaml 
> users. Today's OCaml users use OCaml despite its shortcomings. If you want to 
> see the impact of OCaml's multicore unfriendliness, consider why the OCaml 
> community has haemorrhaged 50% of its users in only 2 years.

Don't see that. That's just speculation - maybe some win32 ocaml users
switched to F#, but there are for sure also other reasons than multicore
support, e.g. GUIs and better Windows integration. Btw, where do you get
your numbers from?

There are many, many users for whom multicore is just a useless hype.
Either the algorithms are inherently difficult to parallelize (and this
is vast majority), or are that easy (like all client/server stuff) that
multi-processing is sufficient. You can consider multicore as a
marketing trick of the chip industry to let the ordinary desktop user
pay for a feature that is mostly interesting for datacenters.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-21  4:30           ` Jon Harrop
@ 2009-12-21  3:58             ` Yaron Minsky
  2009-12-21  5:32             ` Markus Mottl
  1 sibling, 0 replies; 57+ messages in thread
From: Yaron Minsky @ 2009-12-21  3:58 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

I find the ponderings on the popularity of OCaml to be of limited utility
--- those who pick OCaml based on its popularity are making a terrible
mistake.  OCaml was a deeply unpopular language in 2005 and remains so
today, the variations notwithstanding.  There are other good reasons to use
the language nonetheless.

On Sun, Dec 20, 2009 at 11:30 PM, Jon Harrop <jon@ffconsultancy.com> wrote:

> Or searches for OCaml on Google:
>
>  http://www.google.com/trends?q=ocaml%2Cclojure%2Cf%23


I'm not sure if OCaml is becoming more or less popular, but I find the
evidence for a decline less than convincing.  It is true that there is less
traffic on this list, but it's hard to know how to interpret this.  I
haven't gotten the sense that Python is in decline, but traffic on
comp.lang.python has also been declining since 2005.

Google Trends is also a confusing metric.  For example, it suggests that
Java, Python and C++ have been declining for years:

http://www.google.com/trends?q=java&ctab=0&geo=all&date=all&sort=0
http://www.google.com/trends?q=C%2B%2B&ctab=0&geo=all&date=all&sort=0
http://www.google.com/trends?q=Python&ctab=0&geo=all&date=all&sort=0

My suspicion is that Google Trends gives numbers normalized to the overall
search world, and so things that aren't growing fast look smaller as search
volume in general grows.  Obviously an up-and-coming language like clojure
still shows an upswing, as one would expect from an up-and-coming language.

The number of OCaml jobs has crashed as well:
>
>  http://www.itjobswatch.co.uk/jobs/uk/ocaml.do


I thought this was a silly metric when it spiked up, and continue to think
it's a silly metric today.  There are a tiny number of legitimate ocaml jobs
(and the same is true for Haskell, Clojure, Scala, SML, etc.) and the
ups-and-down in this tiny sample are not statistically significant.  Again:
don't pick OCaml because of the large number of OCaml jobs out there.  There
are very very few, both now and in '05.

Reliable metrics on a community like this are hard to come by, but things
seem quite vibrant to me.  There are always new OCaml startups popping into
existence, new libraries being written, and new things coming out of INRIA
(for example, the arrival of modules as first-class values, which is
expected in OCaml 3.12).  From my point of view, there is still no platform
out there I would rather be using.

y

[-- Attachment #2: Type: text/html, Size: 3572 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-21  1:08         ` Gerd Stolpmann
@ 2009-12-21  4:30           ` Jon Harrop
  2009-12-21  3:58             ` Yaron Minsky
  2009-12-21  5:32             ` Markus Mottl
  2009-12-26 17:08           ` orbitz
  1 sibling, 2 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-21  4:30 UTC (permalink / raw)
  To: caml-list

On Monday 21 December 2009 01:08:14 Gerd Stolpmann wrote:
> > The following web page describes a commercial machine sold by Azul
> > Systems that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory
> > in a flat SMP configuration:
> >
> >   http://www.azulsystems.com/products/compute_appliance.htm
> >
> > As you can see, a GC with shared memory can already scale across dozens
> > of cores and memory access is no more heterogeneous than it was 20 years
> > ago. Also, note that homogeneous memory access is a red herring in this
> > context because it does not undermine the utility of a shared heap on a
> > multicore.
>
> The benchmarks they mention can all easily be parallelized - that stuff
> you can also do with multi-processing.

Only if the result is small, otherwise you spend all of your time 
deserializing it. With a shared heap, you just return the resulting value by 
reference.

> The interesting thing would be an 
> inherent parallel algorithm where the same memory region is accessed by
> multiple threads.

Concurrent hash tables are a big thing for Azul:

  "Scales well up to 768 CPUs" -
  http://www.youtube.com/watch?v=WYXgtXWejRM

This blog entry describes performance on 750 cores:

  http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html

> Or at least a numeric program (your examples seem to be mostly from that
> area). 

Yes. You can look at matrix operations or linear algebra (QR decomposition) 
but also things like quicksort and graphics.

Would be interesting to compare symbolic performance as well though.

> > > - Have you considered that many Ocaml users prefer a GC that offers
> > > maximum single core performance,
> >
> > OCaml's GC is nowhere near offering maximum single core performance. Its
> > uniform data representation renders OCaml many times slower than its
> > competitors for many tasks. For example, filling a 10M float->float hash
> > table is over 18x slower with OCaml than with F#. FFT with a complex
> > number type is 5.5x slower with OCaml than F#. Fibonacci with floats is
> > 3.3x slower with OCaml than my own HLVM project (!).
>
> Sure, but these micro benchmarks are first seldom correct, and do not
> really count for real-world programs.
>
> For example, an important parameter of such benchmarks is the frequency
> the GC runs. Ocaml runs the GC very often - good for latencies, but bad
> for micro benchmarks because other runtimes simply delay the GC until
> some limits are exceeded, so these other runtimes often haven't run the
> GC even once in the short period of time the benchmark runs.

You're missing the point: every example I gave shouldn't be doing any GC at 
all and doesn't in F# but spends a lot of time in the GC in OCaml just 
because of unnecessary boxing. The mutator also takes longer because boxing 
damages locality.

> It is simply a fact that the ocaml developers had some preferences. E.g.
> allocating and freeing short-living values is extremely fast (often
> <10ns). This is very good when you do symbolic computations, or have
> lots of small strings, but ignorable for numeric stuff, or for programs
> where the lifetime of allocated memory is bound to server sessions. The
> minor GC is very fast, but, as you observe, the uniform representation
> has costs elsewhere.

Yes. That's why I think the best way forward is to develop HLVM.

> > >   because their application is parallelised via multiple processes
> > >   communicating via message passing?
> >
> > A circular argument based upon the self-selected group of remaining OCaml
> > users. Today's OCaml users use OCaml despite its shortcomings. If you
> > want to see the impact of OCaml's multicore unfriendliness, consider why
> > the OCaml community has haemorrhaged 50% of its users in only 2 years.
>
> Don't see that. That's just speculation - maybe some win32 ocaml users
> switched to F#,

I wasn't a win32 user. :-)

> but there are for sure also other reasons than multicore 
> support, e.g. GUIs and better Windows integration. Btw, where do you get
> your numbers from?

Traffic here:

2007: 5814
2008: 4051
2009: 3071

  http://groups.google.com/group/fa.caml/about

Or searches for OCaml on Google:

  http://www.google.com/trends?q=ocaml%2Cclojure%2Cf%23

The number of OCaml jobs has crashed as well:

  http://www.itjobswatch.co.uk/jobs/uk/ocaml.do

And, of course, what our customers say.

> There are many, many users for whom multicore is just a useless hype.

In 2005, the OCaml community was composed largely of performance junkies who 
came here because OCaml produced excellent performance from succinct and 
readable code on benchmark after benchmark. More people were buying OFS than 
were using Coq. I don't believe for a second that many of OCaml's former 
users thought multicore was just useless hype.

> Either the algorithms are inherently difficult to parallelize (and this
> is vast majority),

I have had great success parallelizing code.

> or are that easy (like all client/server stuff) that multi-processing is
> sufficient.

There are certainly applications where multicore is not beneficial.

> You can consider multicore as a marketing trick of the chip 
> industry to let the ordinary desktop user pay for a feature that is mostly
> interesting for datacenters. 

Ordinary desktop users have been paying top dollar for parallel computers in 
the form of GPUs for some time now. The use of GPUs for more general 
programming has been a really hot topic for years and just became viable. 
Even games consoles have multicores. ARM are making quadcores for your phone 
and netbook!

If I can get HLVM to make parallel OCaml-style programming easy, I think a lot 
of people would love it.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-21  4:30           ` Jon Harrop
  2009-12-21  3:58             ` Yaron Minsky
@ 2009-12-21  5:32             ` Markus Mottl
  2009-12-21 13:29               ` Jon Harrop
  1 sibling, 1 reply; 57+ messages in thread
From: Markus Mottl @ 2009-12-21  5:32 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Sun, Dec 20, 2009 at 23:30, Jon Harrop <jon@ffconsultancy.com> wrote:
> Traffic here:
>
> 2007: 5814
> 2008: 4051
> 2009: 3071

That's because I don't have much time to post here nowaydays.  I'm
sure if Jon followed my example, we would have a parallel GC for OCaml
by the end of the year.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: OCaml is broken
  2009-12-20 19:38     ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop
@ 2009-12-21 12:26       ` Mihamina Rakotomandimby
  2009-12-21 14:19         ` general question, was " Keyan
  0 siblings, 1 reply; 57+ messages in thread
From: Mihamina Rakotomandimby @ 2009-12-21 12:26 UTC (permalink / raw)
  To: caml-list

> Jon Harrop <jon@ffconsultancy.com> :
> > Two cores is standard by 
> > now, I'm used to 8, next year 32 and so on. OCaml will only become
> > more and more irrelevant. I hate to see that happening.  
> 
> Me too. The OCaml language will continue to kick ass for some time to
> come but INRIA's implementation is no longer competitively performant
> for many tasks. However, open source offerings are all quite dire,
> particularly stand-alone ones.

That seems dark...

-- 
       Architecte Informatique chez Blueline/Gulfsat:
    Administration Systeme, Recherche & Developpement
                +261 34 29 155 34 / +261 33 11 207 36


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
                       ` (3 preceding siblings ...)
  2009-12-20 19:38     ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop
@ 2009-12-21 13:07     ` Damien Doligez
  4 siblings, 0 replies; 57+ messages in thread
From: Damien Doligez @ 2009-12-21 13:07 UTC (permalink / raw)
  To: caml-list


On 2009-12-20, at 13:21, Erik Rigtorp wrote:

> The first step for OCaml would be to be able to run multiple
> communicating instances of the runtime bound to one core each in one
> process and have them communicate via lock free queues.


Does anyone know how to do lock-free queues in a weakly-consistent
memory model?

-- Damien


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-21  5:32             ` Markus Mottl
@ 2009-12-21 13:29               ` Jon Harrop
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-21 13:29 UTC (permalink / raw)
  To: caml-list

On Monday 21 December 2009 05:32:38 Markus Mottl wrote:
> On Sun, Dec 20, 2009 at 23:30, Jon Harrop <jon@ffconsultancy.com> wrote:
> > Traffic here:
> >
> > 2007: 5814
> > 2008: 4051
> > 2009: 3071
>
> That's because I don't have much time to post here nowaydays.  I'm
> sure if Jon followed my example, we would have a parallel GC for OCaml
> by the end of the year.

HLVM already has a parallel GC. :-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* multicore wish [Was: Re: [Caml-list] Re: OCaml is broken]
  2009-12-20  4:43 ` [Caml-list] " Jon Harrop
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
@ 2009-12-21 13:31   ` Goswin von Brederlow
  2009-12-21 14:19     ` multicore wish Mihamina Rakotomandimby
  2009-12-21 19:53     ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop
  1 sibling, 2 replies; 57+ messages in thread
From: Goswin von Brederlow @ 2009-12-21 13:31 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop <jon@ffconsultancy.com> writes:

> We've discussed the problems with that before. Writing a parallel generic 
> quicksort seems to be a good test of a decent multicore capable language 
> implementation. Currently, F# is a *long* way ahead of everything open 
> source.

How do you implement it?

1) divide at the top and then let each core sort its smaller array

Say you have 2^n cores then the first n splits in merge sort would
first sort the values into the 2 regions and then start a thread for
each region (start a new one, do the other part in this thread). After
n splits you would switch to a uni-core quicksort.

For this you need to split well so each core ends up with a roughly
equal sized chunk.

2) factory/worker approach

Each core runs a factory/worker thread waiting on a job queue. You
start by dumping the full array into the job queue. Some random core
picks it up, splits it into 2 regions and dumps one into the job
queue. If the job gets too small (PAGE_SIZE? cache line size? total
size / cores^2?) the factory/worker switches to normal uni-core
quicksort and sorts the whole chunk.

The job queue should probably be priority based so larger chunks are
sorted before smaller.

Here the quality of each split is not so important. If a chunk is
smaller, and therefore faster, the core just picks up the next job in
the queue. But you need more syncronization between the cores for the
job queue. On the other hand you aren't limited to 2^n cores. Any
number will do.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 57+ messages in thread

* general question, was Re: OCaml is broken
  2009-12-21 12:26       ` Mihamina Rakotomandimby
@ 2009-12-21 14:19         ` Keyan
  2009-12-21 14:40           ` [Caml-list] " rixed
                             ` (4 more replies)
  0 siblings, 5 replies; 57+ messages in thread
From: Keyan @ 2009-12-21 14:19 UTC (permalink / raw)
  To: caml-list

Hi,

i have a large project written in C++, for which i am planing to write add-ons and tools in ocaml, e.g. different tools to analyse my code (dependency stuff), an interpreter for a script-language i plan to include, etc, etc. form my time at the uni i remembered that ocaml allows to compile libraries which can be included in c/c++ program, and i know people who use it extensively in other projects. therefore, i decided to give ocaml a try. i like functional programming, and my first steps with ocaml are very promising.

following this discussion, i am not so sure anymore, if ocaml is a good decision. may be i got this discussion wrong, but if ocaml is dying out, i might have to look for another functional programming language to use with my project.

please dont take this email as an offence. i am just curious. at this point, i can still easily look for an alternative to ocaml, so its best to ask now.

regards,
keyan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: multicore wish
  2009-12-21 13:31   ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow
@ 2009-12-21 14:19     ` Mihamina Rakotomandimby
  2009-12-21 16:15       ` [Caml-list] " Fischbacher T.
                         ` (2 more replies)
  2009-12-21 19:53     ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop
  1 sibling, 3 replies; 57+ messages in thread
From: Mihamina Rakotomandimby @ 2009-12-21 14:19 UTC (permalink / raw)
  To: caml-list, ocsigen

Ok, so for the beginner I am (must I ask on the beginners ML?): is
multicore support just useless or not?

I am beginning using Ocsigen, for a growing web project:
Is multicore support useless for scaling on Ocsigen?

X-post to Ocsigen ML.
-- 
       Architecte Informatique chez Blueline/Gulfsat:
    Administration Systeme, Recherche & Developpement
                +261 34 29 155 34 / +261 33 11 207 36


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 14:19         ` general question, was " Keyan
@ 2009-12-21 14:40           ` rixed
  2009-12-21 14:42           ` Gerd Stolpmann
                             ` (3 subsequent siblings)
  4 siblings, 0 replies; 57+ messages in thread
From: rixed @ 2009-12-21 14:40 UTC (permalink / raw)
  To: caml-list

> following this discussion, i am not so sure anymore, if ocaml is a good decision. may be i got this discussion wrong, but if ocaml is dying out, i might have to look for another functional programming language to use with my project.

Every programming language suffers its trolls and flamewars.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 14:19         ` general question, was " Keyan
  2009-12-21 14:40           ` [Caml-list] " rixed
@ 2009-12-21 14:42           ` Gerd Stolpmann
  2009-12-21 15:25             ` Eray Ozkural
  2009-12-21 14:50           ` Philip
                             ` (2 subsequent siblings)
  4 siblings, 1 reply; 57+ messages in thread
From: Gerd Stolpmann @ 2009-12-21 14:42 UTC (permalink / raw)
  To: Keyan; +Cc: caml-list


Am Montag, den 21.12.2009, 15:19 +0100 schrieb Keyan:
> Hi,
> 
> i have a large project written in C++, for which i am planing to write add-ons and tools in ocaml, e.g. different tools to analyse my code (dependency stuff), an interpreter for a script-language i plan to include, etc, etc. form my time at the uni i remembered that ocaml allows to compile libraries which can be included in c/c++ program, and i know people who use it extensively in other projects. therefore, i decided to give ocaml a try. i like functional programming, and my first steps with ocaml are very promising.
> 
> following this discussion, i am not so sure anymore, if ocaml is a good decision. may be i got this discussion wrong, but if ocaml is dying out, i might have to look for another functional programming language to use with my project.
> 
> please dont take this email as an offence. i am just curious. at this point, i can still easily look for an alternative to ocaml, so its best to ask now.

Please don't believe Jon's propaganda. He has just very specific needs
(high performance computing on desktops), and generalizes them in the
way "it's not perfect for me anymore, so it's bad anyway". He has been
doing that for years now, not seeing that he really harms the way ocaml
is seen by newcomers.

The examples you mention are good matches for using ocaml - symbolic
programming with lots of terms and trees. That's the stuff ocaml was
originally developed for, and it delivers excellence performance.

Also, ocaml is still backed by INRIA, and there is still a large
community, including a growing number of industrial users.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 14:19         ` general question, was " Keyan
  2009-12-21 14:40           ` [Caml-list] " rixed
  2009-12-21 14:42           ` Gerd Stolpmann
@ 2009-12-21 14:50           ` Philip
  2009-12-21 15:01             ` Keyan
  2009-12-21 18:50             ` Jon Harrop
  2009-12-21 18:48           ` Jon Harrop
  2010-01-03 10:49           ` Sylvain Le Gall
  4 siblings, 2 replies; 57+ messages in thread
From: Philip @ 2009-12-21 14:50 UTC (permalink / raw)
  To: caml-list

On Mon, 2009-12-21 at 15:19 +0100, Keyan wrote:
> Hi,
> 
> i have a large project written in C++, for which i am planing to write add-ons and tools in ocaml, e.g. different tools to analyse my code (dependency stuff), an interpreter for a script-language i plan to include, etc, etc. form my time at the uni i remembered that ocaml allows to compile libraries which can be included in c/c++ program, and i know people who use it extensively in other projects. therefore, i decided to give ocaml a try. i like functional programming, and my first steps with ocaml are very promising.
> 
> following this discussion, i am not so sure anymore, if ocaml is a good decision. may be i got this discussion wrong, but if ocaml is dying out, i might have to look for another functional programming language to use with my project.
> 

Functional programming languages will never become mainstream. There is
a thread on haskell-mailing-list from time to time. 
What language you chose should depend always on your (your team) skills,
tools and tasks.

-Philip
 
> please dont take this email as an offence. i am just curious. at this point, i can still easily look for an alternative to ocaml, so its best to ask now.
> 
> regards,
> keyan
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 14:50           ` Philip
@ 2009-12-21 15:01             ` Keyan
  2009-12-21 15:13               ` Stefano Zacchiroli
  2009-12-21 15:27               ` Dario Teixeira
  2009-12-21 18:50             ` Jon Harrop
  1 sibling, 2 replies; 57+ messages in thread
From: Keyan @ 2009-12-21 15:01 UTC (permalink / raw)
  To: Philip; +Cc: caml-list

Hi,

first of all thanks for the replies, and for not misunderstanding my email.

> What language you chose should depend always on your (your team) skills,
> tools and tasks.

i dont want to go into a which-programming-language-is-best-for-what discussion (as this will never end), but at this point i wanted to know if ocaml is still alive, i.e. if you can still easily download and install it on a variety of OS, and if it will be supported in the future. i am working on an open-source project, so beside your arguments (which of course are valid), it is also important if people who decide not to download the binaries, but rather download the source, will be able to compile it. for me personally, i just like functional programming, and i like to learn new stuff from time to time. and to be honest, i am very happy to have found applications in the project which i can use a reason to include a functional programming language in the code :)

so, thanks for your replies. i am confident again, and will continue to use ocaml.

regards,
keyan

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 15:01             ` Keyan
@ 2009-12-21 15:13               ` Stefano Zacchiroli
  2009-12-21 15:27               ` Dario Teixeira
  1 sibling, 0 replies; 57+ messages in thread
From: Stefano Zacchiroli @ 2009-12-21 15:13 UTC (permalink / raw)
  To: caml-list

On Mon, Dec 21, 2009 at 04:01:31PM +0100, Keyan wrote:
> but at this point i wanted to know if ocaml is still alive, i.e. if
> you can still easily download and install it on a variety of OS, and

Quite alive.

http://packages.debian.org/changelogs/pool/main/o/ocaml/current/changelog

(Look for "new upstream release" to check how often new upstream have
 been released and made part of Debian-based distributions. Fedora
 people have similar stories to tell.)

Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
zack@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 14:42           ` Gerd Stolpmann
@ 2009-12-21 15:25             ` Eray Ozkural
  0 siblings, 0 replies; 57+ messages in thread
From: Eray Ozkural @ 2009-12-21 15:25 UTC (permalink / raw)
  To: caml-list

On Mon, Dec 21, 2009 at 4:42 PM, Gerd Stolpmann <gerd@gerd-stolpmann.de> wrote:
>
> Please don't believe Jon's propaganda. He has just very specific needs
> (high performance computing on desktops), and generalizes them in the
> way "it's not perfect for me anymore, so it's bad anyway". He has been
> doing that for years now, not seeing that he really harms the way ocaml
> is seen by newcomers.

I've seen some interesting parallel programming projects and language
extensions using ocaml. I suppose ocaml could benefit from a
parallelizing compiler & standardized explicit parallelism constructs,
and be a serious contender for the multicore "market". I personally
started out with Haskell with regards to contemporary high-level
languages, and then switched to ocaml because of performance and
sanity. I think I also love the higher-order modules =) I want to
rewrite my stock prediction program in ocaml nowadays. In Haskell, it
was a pain to work on large files. Good thing I lost the code in a hard
drive crash. The way I see it, ocaml has adequate performance, and is
excellent for algorithmic work. I have this half-finished project that features
ocaml implementation of some algorithms. You should see them, they
are almost identical to pseudo-code. I should move that project to ocamlforge.

Cheers,


-- 
Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University, Ankara
http://groups.yahoo.com/group/ai-philosophy
http://myspace.com/arizanesil http://myspace.com/malfunct


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 15:01             ` Keyan
  2009-12-21 15:13               ` Stefano Zacchiroli
@ 2009-12-21 15:27               ` Dario Teixeira
  2009-12-21 15:46                 ` Jacques Carette
  1 sibling, 1 reply; 57+ messages in thread
From: Dario Teixeira @ 2009-12-21 15:27 UTC (permalink / raw)
  To: Philip, Keyan; +Cc: caml-list

Hi,

> i dont want to go into a which-programming-language-is-best-for-what
> discussion (as this will never end), but at this point i wanted to know
> if ocaml is still alive, i.e. if you can still easily download and
> install it on a variety of OS, and if it will be supported in the future.

The fact that the compiler's source code is (a) available, and (b)
straightforward enough for mere mortals to understand should give you
some assurances that Ocaml can never die by fiat.  Moreover, there's
a vibrant community around it, both in industry and in the open-source
world.  (Ocaml support in Debian and Fedora is top-notch, for example).
Last but not least, Ocaml plays a central role in multiple INRIA
projects, which means its creators have all the reason to continue
maintaining it and improving it for the foreseeable future (and there's
some interesting goodies in the upcoming 3.12 release, for example).

Though I am grateful and acknowledge Jon Harrop's help in the beginner's
list, you should take his prognostications with a grain of salt.  Every
now and again he proclaims that "Ocaml is doomed! We're all gonna die!".
It has almost become a comedy catchphrase of sorts in this list...

So yes, do choose Ocaml for your project.  You won't regret it.

Best regards,
Dario Teixeira






^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 15:27               ` Dario Teixeira
@ 2009-12-21 15:46                 ` Jacques Carette
  0 siblings, 0 replies; 57+ messages in thread
From: Jacques Carette @ 2009-12-21 15:46 UTC (permalink / raw)
  To: Dario Teixeira; +Cc: Philip, Keyan, caml-list

I agree with most of what Dario Teixeira wrote, except for one small 
quibble:

Dario Teixeira wrote:
> Last but not least, Ocaml plays a central role in multiple INRIA
> projects, which means its creators have all the reason to continue
> maintaining it and improving it for the foreseeable future (and there's
> some interesting goodies in the upcoming 3.12 release, for example).
>   
Actually, this gives these projects an incentive to insure that Ocaml 
survives, which gives an incentive for some 'maintenance engineers' to 
be kept on-staff to insure that Ocaml does not bit-rot.  This gives only 
quite partial incentive to a team of researchers (the creators of Ocaml) 
to do maintenance (as that is usually not research, thus not the kind of 
work of interest to researchers).  And entropy is a real problem -- 
Ocaml is now quite mature, which means that radical changes are well 
nigh impossible; this is a serious disincentive for researchers.  End of 
quibble.

Personally, I would really really want to see a 4.00 release which 
really warrants that name.  The 3.XX line can be maintained for a few 
more years while people switch, in the same way gcc did this. 

In any case, I have nevertheless voted with my time and effort: I have 1 
large project being implemented in Ocaml, 3 medium ones in metaocaml, 
although I must admit that I have some 'research' code in Haskell (and 
in Maple, but that's another story).

Jacques


^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [Caml-list] Re: multicore wish
  2009-12-21 14:19     ` multicore wish Mihamina Rakotomandimby
@ 2009-12-21 16:15       ` Fischbacher T.
  2009-12-21 17:42       ` Dario Teixeira
  2009-12-21 18:43       ` Jon Harrop
  2 siblings, 0 replies; 57+ messages in thread
From: Fischbacher T. @ 2009-12-21 16:15 UTC (permalink / raw)
  To: Mihamina Rakotomandimby, caml-list, ocsigen


Mihamina,

> Ok, so for the beginner I am (must I ask on the beginners ML?): is
> multicore support just useless or not?

That *entirely* depends on what you want to do. If, for example, you 
have to do a large calculation that is limited by memory and not by CPU,
or, if you have an application that is trivially parallelized anyway, 
multicore support won't make much of a difference. There are 
(many) other applications, however, where it does matter quite a lot.

Actually, the biggest effect of multicore architectures I see is to shift
the emphasis from raw CPU power to memory bandwidth.

-- 
Thomas


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-21 14:19     ` multicore wish Mihamina Rakotomandimby
  2009-12-21 16:15       ` [Caml-list] " Fischbacher T.
@ 2009-12-21 17:42       ` Dario Teixeira
  2009-12-21 18:43       ` Jon Harrop
  2 siblings, 0 replies; 57+ messages in thread
From: Dario Teixeira @ 2009-12-21 17:42 UTC (permalink / raw)
  To: caml-list, ocsigen, Mihamina Rakotomandimby

Hi,

> I am beginning using Ocsigen, for a growing web project:
> Is multicore support useless for scaling on Ocsigen?

Categorically, yes.  In fact, I would say that the model used by Ocsigen
is close to being optimal performance-wise as far as web applications are
concerned.  The Ocsigen server and Eliom applications use Lwt for concurrency,
ensuring that the CPU is always busy and will not idle waiting for I/O.
Moreover, the green threads offered by Lwt are much lighter than system
threads, and avoid the context switching penalty incurred by the latter.

And what about those multiple cores?  Simple, if you have n cores, then
simply fire up n instances of the Ocsigen server, and put a dispatching
server like HAProxy or Ocsigen itself as frontend (there are some simple
tricks to ensure that the same client is always directed to same server).
This solution takes advantage of the fact that serving web requests from
multiple clients is embarrasingly parallel from the web application
standpoint. (Sure, you'll have contention on the database side, but
most DBMSs handle that reasonably well).

(And yes, I am aware that Lwt's performance could be improved further
by using syscalls like epoll/kqeueue/etc instead of select.  That is
however an implementation issue, not an architectural flaw).

Best regards,
Dario Teixeira






^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-21 14:19     ` multicore wish Mihamina Rakotomandimby
  2009-12-21 16:15       ` [Caml-list] " Fischbacher T.
  2009-12-21 17:42       ` Dario Teixeira
@ 2009-12-21 18:43       ` Jon Harrop
  2 siblings, 0 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-21 18:43 UTC (permalink / raw)
  To: caml-list

On Monday 21 December 2009 14:19:36 Mihamina Rakotomandimby wrote:
> Ok, so for the beginner I am (must I ask on the beginners ML?): is
> multicore support just useless or not?

I have found a great many uses for multicores but you need a decent foundation 
to make effective use of it.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 14:19         ` general question, was " Keyan
                             ` (2 preceding siblings ...)
  2009-12-21 14:50           ` Philip
@ 2009-12-21 18:48           ` Jon Harrop
  2010-01-03 10:49           ` Sylvain Le Gall
  4 siblings, 0 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-21 18:48 UTC (permalink / raw)
  To: caml-list

On Monday 21 December 2009 14:19:05 Keyan wrote:
> Hi,
>
> i have a large project written in C++, for which i am planing to write
> add-ons and tools in ocaml, e.g. different tools to analyse my code
> (dependency stuff), an interpreter for a script-language i plan to include,
> etc, etc. form my time at the uni i remembered that ocaml allows to compile
> libraries which can be included in c/c++ program, and i know people who use
> it extensively in other projects. therefore, i decided to give ocaml a try.
> i like functional programming, and my first steps with ocaml are very
> promising.
>
> following this discussion, i am not so sure anymore, if ocaml is a good
> decision. may be i got this discussion wrong, but if ocaml is dying out, i
> might have to look for another functional programming language to use with
> my project.

OCaml isn't dying out and it is still ideal for the applications you listed 
because they are not performance critical and do rely heavily upon an 
efficient GC. This discussion was essentially about the exact opposite end of 
the spectrum of applications, where performance is critical but GC is largely 
irrelevant.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] general question, was Re: OCaml is broken
  2009-12-21 14:50           ` Philip
  2009-12-21 15:01             ` Keyan
@ 2009-12-21 18:50             ` Jon Harrop
  1 sibling, 0 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-21 18:50 UTC (permalink / raw)
  To: caml-list

On Monday 21 December 2009 14:50:15 Philip wrote:
> Functional programming languages will never become mainstream.

C# 3 and Javascript are mainstream functional languages. F# is next.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: multicore wish [Was: Re: [Caml-list] Re: OCaml is broken]
  2009-12-21 13:31   ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow
  2009-12-21 14:19     ` multicore wish Mihamina Rakotomandimby
@ 2009-12-21 19:53     ` Jon Harrop
  2009-12-22 13:09       ` multicore wish Goswin von Brederlow
  1 sibling, 1 reply; 57+ messages in thread
From: Jon Harrop @ 2009-12-21 19:53 UTC (permalink / raw)
  To: caml-list

On Monday 21 December 2009 13:31:10 Goswin von Brederlow wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
> > We've discussed the problems with that before. Writing a parallel generic
> > quicksort seems to be a good test of a decent multicore capable language
> > implementation. Currently, F# is a *long* way ahead of everything open
> > source.
>
> How do you implement it?
>
> 1) divide at the top and then let each core sort its smaller array
>
> Say you have 2^n cores then the first n splits in merge sort would
> first sort the values into the 2 regions and then start a thread for
> each region (start a new one, do the other part in this thread). After
> n splits you would switch to a uni-core quicksort.
>
> For this you need to split well so each core ends up with a roughly
> equal sized chunk.

That is basically what is now being called "nested data parallelism". You 
precompute a parallel strategy, partition the work upfront and then execute 
embarassingly parallel tasks. This worked well decades ago for sparse linear 
algebra on supercomputers but it sucks on multicores because load is variable 
and there is no load balancing. So one of those "equal-sized" partitions is 
likely to take 10x longer than the others due to competing processes, 
scheduling or cache issues and all cores end up twiddling their thumbs 
waiting for that one to complete.

> 2) factory/worker approach
>
> Each core runs a factory/worker thread waiting on a job queue. You
> start by dumping the full array into the job queue. Some random core
> picks it up, splits it into 2 regions and dumps one into the job
> queue. If the job gets too small (PAGE_SIZE? cache line size? total
> size / cores^2?) the factory/worker switches to normal uni-core
> quicksort and sorts the whole chunk.
>
> The job queue should probably be priority based so larger chunks are
> sorted before smaller.
>
> Here the quality of each split is not so important. If a chunk is
> smaller, and therefore faster, the core just picks up the next job in
> the queue. But you need more syncronization between the cores for the
> job queue. On the other hand you aren't limited to 2^n cores. Any
> number will do.

This is a work sharing queue which is better because it is dynamically load 
balanced but still bad because you have global contention for a shared 
resource: the shared queue.

Cilk pioneered wait-free work-stealing task deques and Microsoft's Task 
Parallel Library (which will be part of .NET 4 in March 2010) copied the 
idea. You have a separate deque of tasks for each core. A core tries to pop a 
task off its deque. If there are no tasks on its deque then it tries to pop 
off a randomly-chosen other deque. If there is a task then the core begins 
executing that task and any child tasks are pushed back onto the local deque 
(and might get stolen before they are executed).

An important characteristic of this design is that child tasks are likely to 
be executed on the same core as their parent and, therefore, locality between 
related tasks is valuable. Consequently, many algorithms that recursively 
subdivide a large dataset (like quicksort) attain superlinear speedups 
because they make effective use of the sum of all L2 caches rather than just 
a single L2 cache. So this works extremely well with the multi-level 
hierarchical caches with interconnects that are the hallmark of multicore 
architecture.

You can then implement quicksort (and many other parallel divide and conquer 
algorithms) something like this:

  let rec serial_qsort cmp a i0 i3  =
    if i3 - i0 > 1 then
      let i1, i2 = partition cmp a i0 i3 in
      serial_qsort i0 i1;
      serial_qsort i2 i3

  let rec parallel_qsort cmp a i0 i3  =
    if i3 - i0 > 1 then
      if i3 - i0 < threshold then
        serial_qsort cmp a i0 i3
      else
        let i1, i2 = partition cmp a i0 i3 in
        let future = invoke (parallel_qsort cmp i0) i1 in
        parallel_qsort i2 i3
        future()

where the "invoke" combinator pushes a task onto the local task deque and 
returns a closure that, when applied, waits for that task to complete and 
returns its (unit) result. The "threshold" value protects against the 
function spending more time spawning tasks that doing actual work.

You can implement this in OCaml and F# but there are some important 
differences:

1. The array "a" is just an ordinary array of any type of values on the shared 
heap in F# but, for generality in OCaml, this must be both the underlying 
ordinary data and a manually-managed shared big array of indices into the 
ordinary data where the indices get sorted while the original data remain in 
place until they are permuted at the end.

2. The sorted subarrays are contiguous in memory and, at some subdivision, 
will fit into L2 cache. So F# offers optimal locality. In contrast, there is 
no locality whatsoever in the OCaml code and most accesses into the unsorted 
original array will incur cache misses right up to main memory. So the OCaml 
approach does not scale as well and will never see superlinear speedup 
because it cannot be cache friendly.

3. Child tasks are likely to be executed on the same core as their parent and 
use a subset of their parent's data in F#, so they offer the best possible 
locality. In contrast, child processes are likely to be executed on another 
core in OCaml and offer the worst possible locality.

4. Task deques can handle an arbitrary number of tasks limited only by memory 
whereas processes are a scarce resource and forking is likely to fail, 
whereupon the "invoke" combinator will simply execute sequentially. So it is 
much easier to write reliable and performant code in F# than OCaml.

5. OCaml's fork-based "invoke" combinator is many orders of magnitude slower 
than pushing a closure onto a concurrent task deque in F#.

6. The "threshold" value is a magic number derived from measurements on a 
given machine in my OCaml code but is dynamically adjusted in a 
machine-independent way by the "invoke" combinator in my F# code using atomic 
operations and real time profiling of the proportion of time spent spawning 
tasks vs doing actual work.

The same basic principles apply to many algorithms. Although it can sometimes 
be tricky to figure out how best to use this technology to parallelize a 
given algorithm (e.g. successive over relaxation), I have found that a great 
many algorithms can be parallelized effectively using this approach when you 
have a suitable foundation in place (like the TPL). Moreover, the ability to 
use ordinary constructs in F# instead of hacks like type-specific shared 
memory big arrays in OCaml makes it a lot easier to parallelize programs. My 
parallel Burrows-Wheeler Transform (BWT), for example, took 30 minutes to 
develop in F# and 2 days in OCaml.

Hopefully, HLVM will bring these kinds of benefits to OCaml, albeit in the 
form of a DSL for high performance parallel programming.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req:  10.1/8.0] Re: [Caml-list] Re: OCaml is broken
  2009-12-20 13:47     ` Yaron Minsky
  2009-12-20 16:01       ` Gerd Stolpmann
@ 2009-12-21 22:50       ` Erik Rigtorp
  2009-12-22 12:04         ` Erik Rigtorp
  2009-12-29 12:07         ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Richard Jones
  1 sibling, 2 replies; 57+ messages in thread
From: Erik Rigtorp @ 2009-12-21 22:50 UTC (permalink / raw)
  To: yminsky; +Cc: caml-list

On Sun, Dec 20, 2009 at 14:47, Yaron Minsky <yminsky@gmail.com> wrote:
> On Sun, Dec 20, 2009 at 7:21 AM, Erik Rigtorp <erik@rigtorp.com> wrote:
>>
>> The first step for OCaml would be to be able to run multiple
>> communicating instances of the runtime bound to one core each in one
>> process and have them communicate via lock free queues.
>
> We've done some experiments in this direction at Jane Street.  On Linux,
> we've been able to get fast enough IPC channels for our purposes that
> slamming things into the same memory space has not in the end been
> necessary.  (There is I agree some pain associated with running multiple
> runtimes in the same process.  If you're interested, contact me off-list and
> I can try to get you some of the details of what we ran into.)
> But have you tried using shared-memory segments for communicating between
> different processes?  You say the latencies are too high, but do you have
> any measurements you could share? Have you tried queues using shared memory
> segments, in particular?  Inter-thread communication has latency as well,
> and the performance issues depend on lots of things, OS and hardware
> platform included.  It would help in understanding the tradeoffs.

Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The
benchmarks are running on a cpuset with 1 core. I measure the time
from sending in one process until the other process receives the
message. So a context switch and the message passing is included in
the measurements.

Max/Min/Avg
* Pipes: 28205/5973/6259
* Unix domain sockets: 44256/7748/8153
* SYSv message queues: 19197/5895/6173
* Posix message queues: 37399/10965/11303
* TCP on loopback: 29017/7471/7885

So the latency is roughly 10µs for all these solutions. That latency
is pretty high and would be several times the processing time of the
message itself.

I haven't tried using shm for IPC yet. I'll try see if I can do
something with this lock-free queue:
http://www.liblfds.org/.

> As we go to higher-and-higher numbers of cores, I suspect that
> message-passing solutions are likely to scale better than shared memory, so
> I'm not so sure that OCaml is on the wrong path here.  I think that most of

I agree, but the message passing must be cheap and the creation and
scheduling of lightweight threads must be fast. In order to do that
the runtime needs to handle the communication via shared memory
between OS threads or processes, binding OS processes and threads to
cores and know about different cores different memory latencies and so
on.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req:  10.1/8.0] Re: [Caml-list] Re: OCaml is broken
  2009-12-21 22:50       ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
@ 2009-12-22 12:04         ` Erik Rigtorp
  2009-12-22 12:27           ` Mihamina Rakotomandimby
  2009-12-22 13:27           ` Gerd Stolpmann
  2009-12-29 12:07         ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Richard Jones
  1 sibling, 2 replies; 57+ messages in thread
From: Erik Rigtorp @ 2009-12-22 12:04 UTC (permalink / raw)
  To: yminsky; +Cc: caml-list

On Mon, Dec 21, 2009 at 23:50, Erik Rigtorp <erik@rigtorp.com> wrote:
> Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The
> benchmarks are running on a cpuset with 1 core. I measure the time
> from sending in one process until the other process receives the
> message. So a context switch and the message passing is included in
> the measurements.
>
> Max/Min/Avg
> * Pipes: 28205/5973/6259
> * Unix domain sockets: 44256/7748/8153
> * SYSv message queues: 19197/5895/6173
> * Posix message queues: 37399/10965/11303
> * TCP on loopback: 29017/7471/7885
>
> So the latency is roughly 10µs for all these solutions. That latency
> is pretty high and would be several times the processing time of the
> message itself.

Some more benchmarks:

Max/Min/Avg
* Spinlocking shm: 50897/403/761  (This one utilizes multiple cores,
since one core is just burning while waiting for data)
* Pthreads mutex shm: 27582/5246/6577

Forgot to say that all measurements are in nanoseconds.

Erik


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req:  10.1/8.0] Re: [Caml-list] Re: OCaml is broken
  2009-12-22 12:04         ` Erik Rigtorp
@ 2009-12-22 12:27           ` Mihamina Rakotomandimby
  2009-12-22 13:27           ` Gerd Stolpmann
  1 sibling, 0 replies; 57+ messages in thread
From: Mihamina Rakotomandimby @ 2009-12-22 12:27 UTC (permalink / raw)
  To: caml-list

> Erik Rigtorp <erik@rigtorp.com> :
> > Max/Min/Avg
> > * Pipes: 28205/5973/6259
> > * Unix domain sockets: 44256/7748/8153
> > * SYSv message queues: 19197/5895/6173
> > * Posix message queues: 37399/10965/11303
> > * TCP on loopback: 29017/7471/7885
> Some more benchmarks:
> Max/Min/Avg
> * Spinlocking shm: 50897/403/761  (This one utilizes multiple cores,
> since one core is just burning while waiting for data)
> * Pthreads mutex shm: 27582/5246/6577

And, then... is OCaml "broken"?  :-)


-- 
       Architecte Informatique chez Blueline/Gulfsat:
    Administration Systeme, Recherche & Developpement
                +261 34 29 155 34 / +261 33 11 207 36


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: multicore wish
  2009-12-21 19:53     ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop
@ 2009-12-22 13:09       ` Goswin von Brederlow
  2009-12-22 19:12         ` [Caml-list] " Jon Harrop
  0 siblings, 1 reply; 57+ messages in thread
From: Goswin von Brederlow @ 2009-12-22 13:09 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop <jon@ffconsultancy.com> writes:

> Cilk pioneered wait-free work-stealing task deques and Microsoft's Task 
> Parallel Library (which will be part of .NET 4 in March 2010) copied the 
> idea. You have a separate deque of tasks for each core. A core tries to pop a 
> task off its deque. If there are no tasks on its deque then it tries to pop 
> off a randomly-chosen other deque. If there is a task then the core begins 
> executing that task and any child tasks are pushed back onto the local deque 
> (and might get stolen before they are executed).
>
> An important characteristic of this design is that child tasks are likely to 
> be executed on the same core as their parent and, therefore, locality between 
> related tasks is valuable. Consequently, many algorithms that recursively 
> subdivide a large dataset (like quicksort) attain superlinear speedups 
> because they make effective use of the sum of all L2 caches rather than just 
> a single L2 cache. So this works extremely well with the multi-level 
> hierarchical caches with interconnects that are the hallmark of multicore 
> architecture.
>
> You can then implement quicksort (and many other parallel divide and conquer 
> algorithms) something like this:
>
>   let rec serial_qsort cmp a i0 i3  =
>     if i3 - i0 > 1 then
>       let i1, i2 = partition cmp a i0 i3 in
>       serial_qsort i0 i1;
>       serial_qsort i2 i3
>
>   let rec parallel_qsort cmp a i0 i3  =
>     if i3 - i0 > 1 then
>       if i3 - i0 < threshold then
>         serial_qsort cmp a i0 i3
>       else
>         let i1, i2 = partition cmp a i0 i3 in
>         let future = invoke (parallel_qsort cmp i0) i1 in
>         parallel_qsort i2 i3
>         future()
>
> where the "invoke" combinator pushes a task onto the local task deque and 
> returns a closure that, when applied, waits for that task to complete and 
> returns its (unit) result. The "threshold" value protects against the 
> function spending more time spawning tasks that doing actual work.
>
> You can implement this in OCaml and F# but there are some important 
> differences:
>
> 1. The array "a" is just an ordinary array of any type of values on the shared 
> heap in F# but, for generality in OCaml, this must be both the underlying 
> ordinary data and a manually-managed shared big array of indices into the 
> ordinary data where the indices get sorted while the original data remain in 
> place until they are permuted at the end.

Unless you have a primitive type that isn't a pointer.

The advantage with ocaml though is that you never have pointers into a
structure. Makes thinks a lot simpler for the GC and avoids large
overheads in memory.

> 2. The sorted subarrays are contiguous in memory and, at some subdivision, 
> will fit into L2 cache. So F# offers optimal locality. In contrast, there is 
> no locality whatsoever in the OCaml code and most accesses into the unsorted 
> original array will incur cache misses right up to main memory. So the OCaml 
> approach does not scale as well and will never see superlinear speedup 
> because it cannot be cache friendly.

On the other hand swapping two elements in the array has a constant
cost no matter what size they have. At some size there will be a break
even point where copying the data costs more than the cache misses and
with increasing size the cache won't help F# so much either.

> 3. Child tasks are likely to be executed on the same core as their parent and 
> use a subset of their parent's data in F#, so they offer the best possible 
> locality. In contrast, child processes are likely to be executed on another 
> core in OCaml and offer the worst possible locality.

But, if I understood you right, you first fork one process per
core. Those should then also each pin themself to one core. Each
process then has a work queue which is works through. So they will
always use the local data. Only when a queue runs dry they steal from
another process and ruin locality.

So I don't see where your argument fits. You are not creating childs
on the fly. Only at the start and they run till all the work is done.
At least in this example.

> 4. Task deques can handle an arbitrary number of tasks limited only by memory 
> whereas processes are a scarce resource and forking is likely to fail, 
> whereupon the "invoke" combinator will simply execute sequentially. So it is 
> much easier to write reliable and performant code in F# than OCaml.

Why would you fork in invoke? That only adds a closure to the task
queue for the core.

> 5. OCaml's fork-based "invoke" combinator is many orders of magnitude slower 
> than pushing a closure onto a concurrent task deque in F#.
>
> 6. The "threshold" value is a magic number derived from measurements on a 
> given machine in my OCaml code but is dynamically adjusted in a 
> machine-independent way by the "invoke" combinator in my F# code using atomic 
> operations and real time profiling of the proportion of time spent spawning 
> tasks vs doing actual work.

5+6 seem to be an implementation detail of some specific
implementation you are talking about. I don't see anything in the
theory that would require that.

> The same basic principles apply to many algorithms. Although it can sometimes 
> be tricky to figure out how best to use this technology to parallelize a 
> given algorithm (e.g. successive over relaxation), I have found that a great 
> many algorithms can be parallelized effectively using this approach when you 
> have a suitable foundation in place (like the TPL). Moreover, the ability to 
> use ordinary constructs in F# instead of hacks like type-specific shared 
> memory big arrays in OCaml makes it a lot easier to parallelize programs. My 
> parallel Burrows-Wheeler Transform (BWT), for example, took 30 minutes to 
> develop in F# and 2 days in OCaml.

It might be true that ocaml lacks some primitives for multi-core
operations. But I would say that is mostly because so far it hasn't
supported multi-core. If F# has great support for this then that might
be a good place to steal some ideas from. But so far have heart
nothing that would make F# fundamentally more capable than ocaml could
become.

> Hopefully, HLVM will bring these kinds of benefits to OCaml, albeit in the 
> form of a DSL for high performance parallel programming.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-22 12:04         ` Erik Rigtorp
  2009-12-22 12:27           ` Mihamina Rakotomandimby
@ 2009-12-22 13:27           ` Gerd Stolpmann
  2009-12-23 11:25             ` Erik Rigtorp
  1 sibling, 1 reply; 57+ messages in thread
From: Gerd Stolpmann @ 2009-12-22 13:27 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: yminsky, caml-list


Am Dienstag, den 22.12.2009, 13:04 +0100 schrieb Erik Rigtorp:
> On Mon, Dec 21, 2009 at 23:50, Erik Rigtorp <erik@rigtorp.com> wrote:
> > Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The
> > benchmarks are running on a cpuset with 1 core. I measure the time
> > from sending in one process until the other process receives the
> > message. So a context switch and the message passing is included in
> > the measurements.
> >
> > Max/Min/Avg
> > * Pipes: 28205/5973/6259
> > * Unix domain sockets: 44256/7748/8153
> > * SYSv message queues: 19197/5895/6173
> > * Posix message queues: 37399/10965/11303
> > * TCP on loopback: 29017/7471/7885
> >
> > So the latency is roughly 10µs for all these solutions. That latency
> > is pretty high and would be several times the processing time of the
> > message itself.
> 
> Some more benchmarks:
> 
> Max/Min/Avg
> * Spinlocking shm: 50897/403/761  (This one utilizes multiple cores,
> since one core is just burning while waiting for data)
> * Pthreads mutex shm: 27582/5246/6577
> 
> Forgot to say that all measurements are in nanoseconds.

That's for communication between processes, right? How would the picture
be different (especially comparing the latter two) if you do message
passing between threads? If I remember correctly, threads are more
light-weight in Solaris than processes. That could also affect context
switching times, and scheduler decisions.

Do you have source code? I could also run in on Linux, for comparison.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-22 19:12         ` [Caml-list] " Jon Harrop
@ 2009-12-22 18:02           ` Edgar Friendly
  2009-12-22 19:20             ` Jon Harrop
  2009-12-24 13:19           ` Goswin von Brederlow
  1 sibling, 1 reply; 57+ messages in thread
From: Edgar Friendly @ 2009-12-22 18:02 UTC (permalink / raw)
  To: caml-list

On 12/22/2009 01:12 PM, Jon Harrop wrote:
> On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote:
>    
>> The advantage with ocaml though is that you never have pointers into a
>> structure. Makes thinks a lot simpler for the GC and avoids large
>> overheads in memory.
>>      
> I don't understand what you mean by OCaml "never has pointers into a
> structure". Half the problem with OCaml is that OCaml almost always uses
> pointers and the programmer has no choice, e.g. for complex numbers.
>
>    
I think he means that ocaml structs (records, tuples) will only ever 
have pointers pointing to their beginning - you can't have a pointer to 
somewhere in the middle of a structure.

E


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-22 13:09       ` multicore wish Goswin von Brederlow
@ 2009-12-22 19:12         ` Jon Harrop
  2009-12-22 18:02           ` Edgar Friendly
  2009-12-24 13:19           ` Goswin von Brederlow
  0 siblings, 2 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-22 19:12 UTC (permalink / raw)
  To: caml-list

On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
> > 1. The array "a" is just an ordinary array of any type of values on the
> > shared heap in F# but, for generality in OCaml, this must be both the
> > underlying ordinary data and a manually-managed shared big array of
> > indices into the ordinary data where the indices get sorted while the
> > original data remain in place until they are permuted at the end.
>
> Unless you have a primitive type that isn't a pointer.

In OCaml, you would need to write a custom quicksort optimized for that 
particular type. In F#, the generic version just works and works efficiently.

> The advantage with ocaml though is that you never have pointers into a
> structure. Makes thinks a lot simpler for the GC and avoids large
> overheads in memory.

I don't understand what you mean by OCaml "never has pointers into a 
structure". Half the problem with OCaml is that OCaml almost always uses 
pointers and the programmer has no choice, e.g. for complex numbers.

> > 2. The sorted subarrays are contiguous in memory and, at some
> > subdivision, will fit into L2 cache. So F# offers optimal locality. In
> > contrast, there is no locality whatsoever in the OCaml code and most
> > accesses into the unsorted original array will incur cache misses right
> > up to main memory. So the OCaml approach does not scale as well and will
> > never see superlinear speedup because it cannot be cache friendly.
>
> On the other hand swapping two elements in the array has a constant
> cost no matter what size they have. At some size there will be a break
> even point where copying the data costs more than the cache misses and
> with increasing size the cache won't help F# so much either.

In theory, yes. In practice, that threshold is far larger than any value type 
that a real program would use so it is of no practical concern. Moreover, F# 
gives the programmer control over whether data are unboxed (value types) or 
boxed (reference types) anyway. In contrast, OCaml is tied to a few value 
types that are hard-coded into the GC.

> > 3. Child tasks are likely to be executed on the same core as their parent
> > and use a subset of their parent's data in F#, so they offer the best
> > possible locality. In contrast, child processes are likely to be executed
> > on another core in OCaml and offer the worst possible locality.
>
> But, if I understood you right, you first fork one process per
> core.

No, in OCaml I fork every child. That is the only transparent way to give the 
child a coherent view of the heap but it is extremely slow (~1ms):

  "F# can do 60MFLOPS of computation in the time it takes OCaml 
to fork a single process" -
http://caml.inria.fr/pub/ml-archives/caml-list/2009/06/542b8bed77022b4a4824de2da5b7f714.en.html

> Those should then also each pin themself to one core.

You have no idea which core a forked child process will run on.

> Each process then has a work queue which is works through. So they will 
> always use the local data. Only when a queue runs dry they steal from
> another process and ruin locality.

You are correctly describing the efficient solution based upon work-stealing 
queues that F# uses but OCaml cannot express it.

> So I don't see where your argument fits. You are not creating childs
> on the fly. Only at the start and they run till all the work is done.
> At least in this example.

No, every recursive invocation of the parallel quicksort spawns another child 
on-the-fly. That's precisely why it parallelizes so efficiently when you have 
wait-free work-stealing task deques and a shared heap. In general, you 
rewrite algorithms into this recursive divide and conquer form and 
parallelize when possible. You can parallelize a *lot* of problems 
efficiently that way.

> > 4. Task deques can handle an arbitrary number of tasks limited only by
> > memory whereas processes are a scarce resource and forking is likely to
> > fail, whereupon the "invoke" combinator will simply execute sequentially.
> > So it is much easier to write reliable and performant code in F# than
> > OCaml.
>
> Why would you fork in invoke?

Fork is currently the only transparent way to implement "invoke" but it is 
extremely slow and unreliable.

> > 5. OCaml's fork-based "invoke" combinator is many orders of magnitude
> > slower than pushing a closure onto a concurrent task deque in F#.
> >
> > 6. The "threshold" value is a magic number derived from measurements on a
> > given machine in my OCaml code but is dynamically adjusted in a
> > machine-independent way by the "invoke" combinator in my F# code using
> > atomic operations and real time profiling of the proportion of time spent
> > spawning tasks vs doing actual work.
>
> 5+6 seem to be an implementation detail of some specific
> implementation you are talking about.

Yes. I'm talking about today's OCaml and F# implementations.

> I don't see anything in the theory that would require that.

If by "in theory" you mean that a new performant concurrent GC for OCaml would 
solve these problems then yes. But I doubt OCaml is ever going to get one.

> > The same basic principles apply to many algorithms. Although it can
> > sometimes be tricky to figure out how best to use this technology to
> > parallelize a given algorithm (e.g. successive over relaxation), I have
> > found that a great many algorithms can be parallelized effectively using
> > this approach when you have a suitable foundation in place (like the
> > TPL). Moreover, the ability to use ordinary constructs in F# instead of
> > hacks like type-specific shared memory big arrays in OCaml makes it a lot
> > easier to parallelize programs. My parallel Burrows-Wheeler Transform
> > (BWT), for example, took 30 minutes to develop in F# and 2 days in OCaml.
>
> It might be true that ocaml lacks some primitives for multi-core
> operations. But I would say that is mostly because so far it hasn't
> supported multi-core.

Well, yes. :-)

> If F# has great support for this then that might be a good place to steal
> some ideas from. 

The infrastructure for this kind of shared memory parallel programming is 
really very simple. You just need a GC that can handle a shared heap (which 
HLVM already has) and work-stealing task deques. Then you can easily write 
parallel programs that leverage multicores and run a *lot* faster than 
anything that can be written in OCaml. You can even make this accessible to 
OCaml programmers as a DSL with automatic interop to make high performance 
parallel programming as easy as possible from OCaml.

> But so far have heart nothing that would make F# fundamentally more capable
> than ocaml could become.

In theory, OCaml could catch up with F#. In practice, the core of OCaml's 
implementation is so heavily optimized in another direction (and that will 
not change because it is OCaml's raison d'être) that it is worth starting 
from scratch. Modern libraries like LLVM make this comparatively painless and 
the improvements are vast. And besides, it is great fun! :-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-22 18:02           ` Edgar Friendly
@ 2009-12-22 19:20             ` Jon Harrop
  2009-12-24 12:58               ` Goswin von Brederlow
  0 siblings, 1 reply; 57+ messages in thread
From: Jon Harrop @ 2009-12-22 19:20 UTC (permalink / raw)
  To: caml-list

On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote:
> On 12/22/2009 01:12 PM, Jon Harrop wrote:
> > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote:
> >> The advantage with ocaml though is that you never have pointers into a
> >> structure. Makes thinks a lot simpler for the GC and avoids large
> >> overheads in memory.
> >
> > I don't understand what you mean by OCaml "never has pointers into a
> > structure". Half the problem with OCaml is that OCaml almost always uses
> > pointers and the programmer has no choice, e.g. for complex numbers.
>
> I think he means that ocaml structs (records, tuples) will only ever
> have pointers pointing to their beginning - you can't have a pointer to
> somewhere in the middle of a structure.

If so then I do not understand the relevance. You cannot have pointers into a 
structure in F# or HLVM either...

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-22 13:27           ` Gerd Stolpmann
@ 2009-12-23 11:25             ` Erik Rigtorp
  0 siblings, 0 replies; 57+ messages in thread
From: Erik Rigtorp @ 2009-12-23 11:25 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: yminsky, caml-list

On Tue, Dec 22, 2009 at 14:27, Gerd Stolpmann <gerd@gerd-stolpmann.de> wrote:
>
> Am Dienstag, den 22.12.2009, 13:04 +0100 schrieb Erik Rigtorp:
>> On Mon, Dec 21, 2009 at 23:50, Erik Rigtorp <erik@rigtorp.com> wrote:
>> > Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The
>> > benchmarks are running on a cpuset with 1 core. I measure the time
>> > from sending in one process until the other process receives the
>> > message. So a context switch and the message passing is included in
>> > the measurements.
>> >
>> > Max/Min/Avg
>> > * Pipes: 28205/5973/6259
>> > * Unix domain sockets: 44256/7748/8153
>> > * SYSv message queues: 19197/5895/6173
>> > * Posix message queues: 37399/10965/11303
>> > * TCP on loopback: 29017/7471/7885
>> >
>> > So the latency is roughly 10µs for all these solutions. That latency
>> > is pretty high and would be several times the processing time of the
>> > message itself.
>>
>> Some more benchmarks:
>>
>> Max/Min/Avg
>> * Spinlocking shm: 50897/403/761  (This one utilizes multiple cores,
>> since one core is just burning while waiting for data)
>> * Pthreads mutex shm: 27582/5246/6577
>>
>> Forgot to say that all measurements are in nanoseconds.
>
> That's for communication between processes, right? How would the picture
> be different (especially comparing the latter two) if you do message
> passing between threads? If I remember correctly, threads are more
> light-weight in Solaris than processes. That could also affect context
> switching times, and scheduler decisions.

With a system supporting green threads/tasklets/erlang processes over
multiple cores you can have 1µs message passing latencies without busy
waiting. I'll checkout the thread message passing too, but probably
not until after new years.

> Do you have source code? I could also run in on Linux, for comparison.

I'll have that approved by my company first. It would actually be
interesting to create a open source multiplatform IPC message passing
benchmark.

Erik


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-22 19:20             ` Jon Harrop
@ 2009-12-24 12:58               ` Goswin von Brederlow
  2009-12-24 16:51                 ` Jon Harrop
  0 siblings, 1 reply; 57+ messages in thread
From: Goswin von Brederlow @ 2009-12-24 12:58 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop <jon@ffconsultancy.com> writes:

> On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote:
>> On 12/22/2009 01:12 PM, Jon Harrop wrote:
>> > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote:
>> >> The advantage with ocaml though is that you never have pointers into a
>> >> structure. Makes thinks a lot simpler for the GC and avoids large
>> >> overheads in memory.
>> >
>> > I don't understand what you mean by OCaml "never has pointers into a
>> > structure". Half the problem with OCaml is that OCaml almost always uses
>> > pointers and the programmer has no choice, e.g. for complex numbers.
>>
>> I think he means that ocaml structs (records, tuples) will only ever
>> have pointers pointing to their beginning - you can't have a pointer to
>> somewhere in the middle of a structure.

Exactly.

> If so then I do not understand the relevance. You cannot have pointers into a 
> structure in F# or HLVM either...

If you have an array a of (int * int) then in ocaml a is an array of
pointers to tuples. a.(5) is a pointer to the 6th tuple.

You said that in F# the array will be really an array of tuples. Then
a.(5) will be a pointer into the array at the position where the 6th
tuple is. It does not point to the begining of an allocated block but
to the middle of one.

That means as long as a.(5) is reachable the full array has to remain
allocated or the GC has to recognise that only one (a few) items of an
array are reachable and copy them before freeing the array. The GC
also needs a way to find the begining of an allocated block from a
pointer into the block. Which means extra overhead in both memory and
time.


Another think is that tuples are immutable but arrays are mutable. In
ocaml you get this nice behaviour:

# let a = Array.init 5 (fun x -> (x,x));;
val a : (int * int) array = [|(0, 0); (1, 1); (2, 2); (3, 3); (4, 4)|]
# let x = a.(1);;
val x : int * int = (1, 1)
# a.(1) <- (2,2);;
- : unit = ()
# x;;
- : int * int = (1, 1)

In F# you would get (2,2) for your fast sortable array. A different
semantic.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-22 19:12         ` [Caml-list] " Jon Harrop
  2009-12-22 18:02           ` Edgar Friendly
@ 2009-12-24 13:19           ` Goswin von Brederlow
  2009-12-24 17:06             ` Jon Harrop
  1 sibling, 1 reply; 57+ messages in thread
From: Goswin von Brederlow @ 2009-12-24 13:19 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop <jon@ffconsultancy.com> writes:

> On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote:
>> Jon Harrop <jon@ffconsultancy.com> writes:
>> > 1. The array "a" is just an ordinary array of any type of values on the
>> > shared heap in F# but, for generality in OCaml, this must be both the
>> > underlying ordinary data and a manually-managed shared big array of
>> > indices into the ordinary data where the indices get sorted while the
>> > original data remain in place until they are permuted at the end.
>>
>> Unless you have a primitive type that isn't a pointer.
>
> In OCaml, you would need to write a custom quicksort optimized for that 
> particular type. In F#, the generic version just works and works efficiently.
>
>> The advantage with ocaml though is that you never have pointers into a
>> structure. Makes thinks a lot simpler for the GC and avoids large
>> overheads in memory.
>
> I don't understand what you mean by OCaml "never has pointers into a 
> structure". Half the problem with OCaml is that OCaml almost always uses 
> pointers and the programmer has no choice, e.g. for complex numbers.
>
>> > 2. The sorted subarrays are contiguous in memory and, at some
>> > subdivision, will fit into L2 cache. So F# offers optimal locality. In
>> > contrast, there is no locality whatsoever in the OCaml code and most
>> > accesses into the unsorted original array will incur cache misses right
>> > up to main memory. So the OCaml approach does not scale as well and will
>> > never see superlinear speedup because it cannot be cache friendly.
>>
>> On the other hand swapping two elements in the array has a constant
>> cost no matter what size they have. At some size there will be a break
>> even point where copying the data costs more than the cache misses and
>> with increasing size the cache won't help F# so much either.
>
> In theory, yes. In practice, that threshold is far larger than any value type 
> that a real program would use so it is of no practical concern. Moreover, F# 
> gives the programmer control over whether data are unboxed (value types) or 
> boxed (reference types) anyway. In contrast, OCaml is tied to a few value 
> types that are hard-coded into the GC.
>
>> > 3. Child tasks are likely to be executed on the same core as their parent
>> > and use a subset of their parent's data in F#, so they offer the best
>> > possible locality. In contrast, child processes are likely to be executed
>> > on another core in OCaml and offer the worst possible locality.
>>
>> But, if I understood you right, you first fork one process per
>> core.
>
> No, in OCaml I fork every child. That is the only transparent way to give the 
> child a coherent view of the heap but it is extremely slow (~1ms):

So if you add a (sleep 60) to the ocaml code then ocaml gets even
more worse than F#? You are comparing an implementation that is
specifically optimized to use things F# is good at and ocaml is
bad. To give a fair comparison you need to optimize the implementation
to the language.

>   "F# can do 60MFLOPS of computation in the time it takes OCaml 
> to fork a single process" -
> http://caml.inria.fr/pub/ml-archives/caml-list/2009/06/542b8bed77022b4a4824de2da5b7f714.en.html
>
>> Those should then also each pin themself to one core.
>
> You have no idea which core a forked child process will run on.

I can pin each child to one core and the scheduler will move it there.
I would expect a multi-core ocaml to actualy do this internaly already
so there is on GC threads per core ocaml runs on. But that depends on
the GC implementation that will be used.

>> Each process then has a work queue which is works through. So they will 
>> always use the local data. Only when a queue runs dry they steal from
>> another process and ruin locality.
>
> You are correctly describing the efficient solution based upon work-stealing 
> queues that F# uses but OCaml cannot express it.

You mean that you didn't implement that way. I can easily express that
with closures and pre-forked worker threads.

>> So I don't see where your argument fits. You are not creating childs
>> on the fly. Only at the start and they run till all the work is done.
>> At least in this example.
>
> No, every recursive invocation of the parallel quicksort spawns another child 
> on-the-fly. That's precisely why it parallelizes so efficiently when you have 
> wait-free work-stealing task deques and a shared heap. In general, you 
> rewrite algorithms into this recursive divide and conquer form and 
> parallelize when possible. You can parallelize a *lot* of problems 
> efficiently that way.

That seems awfully ineficient. Then you have a million children
running on maybe 4 cores and each job queue holds no job.

I think we mean different things by children. By children I mean what
the kernel sees as children. Newly forked threads. I think you mean
jobs that get put into the queues. There is no reason to fork a new
system thread every time the parallel quicksort splits an array in
two.

>> > 4. Task deques can handle an arbitrary number of tasks limited only by
>> > memory whereas processes are a scarce resource and forking is likely to
>> > fail, whereupon the "invoke" combinator will simply execute sequentially.
>> > So it is much easier to write reliable and performant code in F# than
>> > OCaml.
>>
>> Why would you fork in invoke?
>
> Fork is currently the only transparent way to implement "invoke" but it is 
> extremely slow and unreliable.

No, it is the insane way. Use closures.

>> > 5. OCaml's fork-based "invoke" combinator is many orders of magnitude
>> > slower than pushing a closure onto a concurrent task deque in F#.
>> >
>> > 6. The "threshold" value is a magic number derived from measurements on a
>> > given machine in my OCaml code but is dynamically adjusted in a
>> > machine-independent way by the "invoke" combinator in my F# code using
>> > atomic operations and real time profiling of the proportion of time spent
>> > spawning tasks vs doing actual work.
>>
>> 5+6 seem to be an implementation detail of some specific
>> implementation you are talking about.
>
> Yes. I'm talking about today's OCaml and F# implementations.
>
>> I don't see anything in the theory that would require that.
>
> If by "in theory" you mean that a new performant concurrent GC for OCaml would 
> solve these problems then yes. But I doubt OCaml is ever going to get one.
>
>> > The same basic principles apply to many algorithms. Although it can
>> > sometimes be tricky to figure out how best to use this technology to
>> > parallelize a given algorithm (e.g. successive over relaxation), I have
>> > found that a great many algorithms can be parallelized effectively using
>> > this approach when you have a suitable foundation in place (like the
>> > TPL). Moreover, the ability to use ordinary constructs in F# instead of
>> > hacks like type-specific shared memory big arrays in OCaml makes it a lot
>> > easier to parallelize programs. My parallel Burrows-Wheeler Transform
>> > (BWT), for example, took 30 minutes to develop in F# and 2 days in OCaml.
>>
>> It might be true that ocaml lacks some primitives for multi-core
>> operations. But I would say that is mostly because so far it hasn't
>> supported multi-core.
>
> Well, yes. :-)
>
>> If F# has great support for this then that might be a good place to steal
>> some ideas from. 
>
> The infrastructure for this kind of shared memory parallel programming is 
> really very simple. You just need a GC that can handle a shared heap (which 

Which there is one example so far for ocaml. Maybe not a verry good
one. Last simple benchmarking I remember it was like 3 times slower
than mono-core.

> HLVM already has) and work-stealing task deques. Then you can easily write 

Task deques you can implement given a multi-core GC, worker threads
and using closures.

> parallel programs that leverage multicores and run a *lot* faster than 
> anything that can be written in OCaml. You can even make this accessible to 
> OCaml programmers as a DSL with automatic interop to make high performance 
> parallel programming as easy as possible from OCaml.

Or just implement it properly in ocaml. The problem part is the
GC. The rest is easy.

>> But so far have heart nothing that would make F# fundamentally more capable
>> than ocaml could become.
>
> In theory, OCaml could catch up with F#. In practice, the core of OCaml's 
> implementation is so heavily optimized in another direction (and that will 
> not change because it is OCaml's raison d'être) that it is worth starting 
> from scratch. Modern libraries like LLVM make this comparatively painless and 
> the improvements are vast. And besides, it is great fun! :-)

I think there is an ocaml frontend for LLVM. So someone already did.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-24 12:58               ` Goswin von Brederlow
@ 2009-12-24 16:51                 ` Jon Harrop
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-24 16:51 UTC (permalink / raw)
  To: caml-list

On Thursday 24 December 2009 12:58:18 Goswin von Brederlow wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
> > On Tuesday 22 December 2009 18:02:32 Edgar Friendly wrote:
> >> On 12/22/2009 01:12 PM, Jon Harrop wrote:
> >> > On Tuesday 22 December 2009 13:09:27 Goswin von Brederlow wrote:
> >> >> The advantage with ocaml though is that you never have pointers into
> >> >> a structure. Makes thinks a lot simpler for the GC and avoids large
> >> >> overheads in memory.
> >> >
> >> > I don't understand what you mean by OCaml "never has pointers into a
> >> > structure". Half the problem with OCaml is that OCaml almost always
> >> > uses pointers and the programmer has no choice, e.g. for complex
> >> > numbers.
> >>
> >> I think he means that ocaml structs (records, tuples) will only ever
> >> have pointers pointing to their beginning - you can't have a pointer to
> >> somewhere in the middle of a structure.
>
> Exactly.
>
> > If so then I do not understand the relevance. You cannot have pointers
> > into a structure in F# or HLVM either...
>
> If you have an array a of (int * int) then in ocaml a is an array of
> pointers to tuples. a.(5) is a pointer to the 6th tuple.

Yes.

> You said that in F# the array will be really an array of tuples. Then
> a.(5) will be a pointer into the array at the position where the 6th
> tuple is.

No. The expression a.(5) loads all fields of the value type. For example, if 
it were an array of complex number then a.(5) would load the real and 
imaginary parts.

> That means as long as a.(5) is reachable the full array has to remain
> allocated or the GC has to recognise that only one (a few) items of an
> array are reachable and copy them before freeing the array. The GC
> also needs a way to find the begining of an allocated block from a
> pointer into the block. Which means extra overhead in both memory and
> time.
>
> Another think is that tuples are immutable but arrays are mutable. In
> ocaml you get this nice behaviour:
>
> # let a = Array.init 5 (fun x -> (x,x));;
> val a : (int * int) array = [|(0, 0); (1, 1); (2, 2); (3, 3); (4, 4)|]
> # let x = a.(1);;
> val x : int * int = (1, 1)
> # a.(1) <- (2,2);;
> - : unit = ()
> # x;;
> - : int * int = (1, 1)
>
> In F# you would get (2,2) for your fast sortable array. A different
> semantic.

No. Even if F# represented tuples as structs (which it does not, 
unfortunately) you would still get the same result as you do in OCaml.

HLVM already implements tuples as structs and your example works exactly as 
the OCaml does:

# let xs = create(6, (1, 1)) in let x = xs.(1) in xs.(1) <- (2, 2); x;;
- : `Struct[`Int; `Int] = (1, 1)

The only difference is that HLVM unboxes the pairs in the array so the heap 
contains only a single pointer and accessing a pair requires only one 
indirection.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-24 13:19           ` Goswin von Brederlow
@ 2009-12-24 17:06             ` Jon Harrop
  2009-12-27 12:45               ` Goswin von Brederlow
  0 siblings, 1 reply; 57+ messages in thread
From: Jon Harrop @ 2009-12-24 17:06 UTC (permalink / raw)
  To: caml-list

On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote:
> Jon Harrop <jon@ffconsultancy.com> writes:
> > No, in OCaml I fork every child. That is the only transparent way to give
> > the child a coherent view of the heap but it is extremely slow (~1ms):
>
> So if you add a (sleep 60) to the ocaml code then ocaml gets even
> more worse than F#? You are comparing an implementation that is
> specifically optimized to use things F# is good at and ocaml is
> bad. To give a fair comparison you need to optimize the implementation 
> to the language.

Post a better OCaml solution if you can.

> >> Each process then has a work queue which is works through. So they will
> >> always use the local data. Only when a queue runs dry they steal from
> >> another process and ruin locality.
> >
> > You are correctly describing the efficient solution based upon
> > work-stealing queues that F# uses but OCaml cannot express it.
>
> You mean that you didn't implement that way.

No, I mean OCaml cannot express it.

> I can easily express that with closures and pre-forked worker threads.

OCaml's threads do not run in parallel so that will not work.

> >> So I don't see where your argument fits. You are not creating childs
> >> on the fly. Only at the start and they run till all the work is done.
> >> At least in this example.
> >
> > No, every recursive invocation of the parallel quicksort spawns another
> > child on-the-fly. That's precisely why it parallelizes so efficiently
> > when you have wait-free work-stealing task deques and a shared heap. In
> > general, you rewrite algorithms into this recursive divide and conquer
> > form and parallelize when possible. You can parallelize a *lot* of
> > problems efficiently that way.
>
> That seems awfully ineficient. Then you have a million children
> running on maybe 4 cores and each job queue holds no job.
>
> I think we mean different things by children. By children I mean what
> the kernel sees as children. Newly forked threads. I think you mean 
> jobs that get put into the queues. There is no reason to fork a new
> system thread every time the parallel quicksort splits an array in
> two.

The children are lightweight tasks, not threads or processes.

> >> Why would you fork in invoke?
> >
> > Fork is currently the only transparent way to implement "invoke" but it
> > is extremely slow and unreliable.
>
> No, it is the insane way. Use closures.

Please demonstrate.

> > parallel programs that leverage multicores and run a *lot* faster than
> > anything that can be written in OCaml. You can even make this accessible
> > to OCaml programmers as a DSL with automatic interop to make high
> > performance parallel programming as easy as possible from OCaml.
>
> Or just implement it properly in ocaml. The problem part is the
> GC. The rest is easy.

No kidding.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-21  1:08         ` Gerd Stolpmann
  2009-12-21  4:30           ` Jon Harrop
@ 2009-12-26 17:08           ` orbitz
  1 sibling, 0 replies; 57+ messages in thread
From: orbitz @ 2009-12-26 17:08 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Jon Harrop, caml-list


On Dec 20, 2009, at 8:08 PM, Gerd Stolpmann wrote:

>> The following web page describes a commercial machine sold by Azul  
>> Systems
>> that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in  
>> a flat
>> SMP configuration:
>>
>>  http://www.azulsystems.com/products/compute_appliance.htm
>>
>> As you can see, a GC with shared memory can already scale across  
>> dozens of
>> cores and memory access is no more heterogeneous than it was 20  
>> years ago.
>> Also, note that homogeneous memory access is a red herring in this  
>> context
>> because it does not undermine the utility of a shared heap on a  
>> multicore.
>
> The benchmarks they mention can all easily be parallelized - that  
> stuff
> you can also do with multi-processing. The interesting thing would  
> be an
> inherent parallel algorithm where the same memory region is accessed  
> by
> multiple threads. Or at least a numeric program (your examples seem to
> be mostly from that area).

I'm not sure if it is relevant here, but it should be noted that a lot  
of the performance gains Azul gets is because they have built their  
own chips that do a lot of tricks for you under the hood.  Last I used  
an Azul Appliance, they perform quite poorly if you are hitting the  
same memory often from multiple threads (the machine I used was about  
4x slower than an equivalent Intel machine for a single core).  If the  
Azul tricks make it into desktop processors, that would likely be  
pretty great.

Also, for what it's worth, lots of cores have actually been less  
performant in the type of computing I currently do.  We want less  
cores and more physical boxes, making multiple processes running  
single threads a better solution for us.  We tend to become memory IO  
bound by multiple cores (the bus cannot keep up with us).  We are  
processing lots of biological data.  For the record we are not using  
Ocaml for our project, just an observation of what model works well  
for us.


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-24 17:06             ` Jon Harrop
@ 2009-12-27 12:45               ` Goswin von Brederlow
  2009-12-27 16:37                 ` Jon Harrop
  2009-12-28 12:28                 ` Gerd Stolpmann
  0 siblings, 2 replies; 57+ messages in thread
From: Goswin von Brederlow @ 2009-12-27 12:45 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

Jon Harrop <jon@ffconsultancy.com> writes:

> On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote:
>> Jon Harrop <jon@ffconsultancy.com> writes:
>> > No, in OCaml I fork every child. That is the only transparent way to give
>> > the child a coherent view of the heap but it is extremely slow (~1ms):
>>
>> So if you add a (sleep 60) to the ocaml code then ocaml gets even
>> more worse than F#? You are comparing an implementation that is
>> specifically optimized to use things F# is good at and ocaml is
>> bad. To give a fair comparison you need to optimize the implementation 
>> to the language.
>
> Post a better OCaml solution if you can.
>
>> >> Each process then has a work queue which is works through. So they will
>> >> always use the local data. Only when a queue runs dry they steal from
>> >> another process and ruin locality.
>> >
>> > You are correctly describing the efficient solution based upon
>> > work-stealing queues that F# uses but OCaml cannot express it.
>>
>> You mean that you didn't implement that way.
>
> No, I mean OCaml cannot express it.
>
>> I can easily express that with closures and pre-forked worker threads.
>
> OCaml's threads do not run in parallel so that will not work.

>> Or just implement it properly in ocaml. The problem part is the
>> GC. The rest is easy.
>
> No kidding.

There is one implementation: http://www.algo-prog.info/ocmc/web/
But as said maybe not a verry good one.

I tried implementing parallel threads under the original GC by forking
multiple instances of the same program and using a Bigarray to mmap
/dev/null for shared memory between the instances. That works for
sharing primitive types, flat records (records of primitive types) and
even fixed (or bound) sized structures but it gets more and more
complex to share each and needs some Obj magic, marshaling or C stubs
(except for primitive types). It works but is a real hack.

I hope someone will pick up the pices and update OCaml4Multicore to
the latest ocaml or maybe for ocaml to add it directly. If not then I
fear ocaml will be left behind soon.

MfG
        Goswin


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-27 12:45               ` Goswin von Brederlow
@ 2009-12-27 16:37                 ` Jon Harrop
  2009-12-28 12:28                 ` Gerd Stolpmann
  1 sibling, 0 replies; 57+ messages in thread
From: Jon Harrop @ 2009-12-27 16:37 UTC (permalink / raw)
  To: caml-list

On Sunday 27 December 2009 12:45:53 Goswin von Brederlow wrote:
> There is one implementation: http://www.algo-prog.info/ocmc/web/
> But as said maybe not a very good one.
>
> I tried implementing parallel threads under the original GC by forking
> multiple instances of the same program and using a Bigarray to mmap
> /dev/null for shared memory between the instances. That works for
> sharing primitive types, flat records (records of primitive types) and
> even fixed (or bound) sized structures but it gets more and more
> complex to share each and needs some Obj magic, marshaling or C stubs
> (except for primitive types). It works but is a real hack.

Once you've conceded to manual memory management (mmap of shared bigarrays) 
and low-level programming (no polymorphism, Obj.magic) you've lost the main 
advantages of OCaml and you still cannot get great performance.

> I hope someone will pick up the pices and update OCaml4Multicore to
> the latest ocaml or maybe for ocaml to add it directly. If not then I
> fear ocaml will be left behind soon.

Building upon OCaml rather than starting from scratch makes it vastly more 
difficult to implement a useful parallel GC not just because the GC and code 
gen must work in harmony together but because OCaml has been so heavily 
optimized in the wrong direction for this (e.g. a bit twiddled uniform 
representation that burdens the GC with everything from complex numbers to 
pairs).

That's why I think the best solution is to start from scratch and build a 
completely separate VM bred for shared-memory parallelism. Indeed, HLVM 
already outperforms OC4MC even though I have put a fraction of the effort 
into it and built a lot more surrounding infrastructure. For example, my 
latest GC only took 5 days to write.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-27 12:45               ` Goswin von Brederlow
  2009-12-27 16:37                 ` Jon Harrop
@ 2009-12-28 12:28                 ` Gerd Stolpmann
  2009-12-28 15:07                   ` Anil Madhavapeddy
  2009-12-28 18:05                   ` Xavier Leroy
  1 sibling, 2 replies; 57+ messages in thread
From: Gerd Stolpmann @ 2009-12-28 12:28 UTC (permalink / raw)
  To: Goswin von Brederlow; +Cc: Jon Harrop, caml-list


Am Sonntag, den 27.12.2009, 13:45 +0100 schrieb Goswin von Brederlow:
> Jon Harrop <jon@ffconsultancy.com> writes:
> 
> > On Thursday 24 December 2009 13:19:52 Goswin von Brederlow wrote:
> >> Jon Harrop <jon@ffconsultancy.com> writes:
> >> > No, in OCaml I fork every child. That is the only transparent way to give
> >> > the child a coherent view of the heap but it is extremely slow (~1ms):
> >>
> >> So if you add a (sleep 60) to the ocaml code then ocaml gets even
> >> more worse than F#? You are comparing an implementation that is
> >> specifically optimized to use things F# is good at and ocaml is
> >> bad. To give a fair comparison you need to optimize the implementation 
> >> to the language.
> >
> > Post a better OCaml solution if you can.
> >
> >> >> Each process then has a work queue which is works through. So they will
> >> >> always use the local data. Only when a queue runs dry they steal from
> >> >> another process and ruin locality.
> >> >
> >> > You are correctly describing the efficient solution based upon
> >> > work-stealing queues that F# uses but OCaml cannot express it.
> >>
> >> You mean that you didn't implement that way.
> >
> > No, I mean OCaml cannot express it.
> >
> >> I can easily express that with closures and pre-forked worker threads.
> >
> > OCaml's threads do not run in parallel so that will not work.
> 
> >> Or just implement it properly in ocaml. The problem part is the
> >> GC. The rest is easy.
> >
> > No kidding.
> 
> There is one implementation: http://www.algo-prog.info/ocmc/web/
> But as said maybe not a verry good one.
> 
> I tried implementing parallel threads under the original GC by forking
> multiple instances of the same program and using a Bigarray to mmap
> /dev/null for shared memory between the instances. That works for
> sharing primitive types, flat records (records of primitive types) and
> even fixed (or bound) sized structures but it gets more and more
> complex to share each and needs some Obj magic, marshaling or C stubs
> (except for primitive types). It works but is a real hack.

It works with all types:

https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli

look for init_value. It's non-released code yet.

However, there are some problems: Values outside the heap do not support
the polymorphic comparison and hash functions. That's a hard limitation,
e.g. you cannot even compare two strings, or build a hash table with
strings as keys. That limits the usefulness of shared memory.

Also, as we are striving for safe programs, there is the question when
shared memory can be safely released. The GC could help here, but there
are no ways to hook in, e.g. to get notified when there are no pointers
anymore to a value living in shared memory.

Gerd

> 
> I hope someone will pick up the pices and update OCaml4Multicore to
> the latest ocaml or maybe for ocaml to add it directly. If not then I
> fear ocaml will be left behind soon.
> 
> MfG
>         Goswin
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-28 12:28                 ` Gerd Stolpmann
@ 2009-12-28 15:07                   ` Anil Madhavapeddy
  2009-12-28 18:05                   ` Xavier Leroy
  1 sibling, 0 replies; 57+ messages in thread
From: Anil Madhavapeddy @ 2009-12-28 15:07 UTC (permalink / raw)
  To: Gerd Stolpmann, Thomas Gazagnaire
  Cc: Goswin von Brederlow, Jon Harrop, caml-list

On 28 Dec 2009, at 12:28, Gerd Stolpmann wrote:

> However, there are some problems: Values outside the heap do not support
> the polymorphic comparison and hash functions. That's a hard limitation,
> e.g. you cannot even compare two strings, or build a hash table with
> strings as keys. That limits the usefulness of shared memory.

Camlp4 may help here; Thomas Gazagnaire and I have been working on a language-integrated ORM, and it has a reliable type-conv hash generator library [1].

It works with mutable records as well as immutable ones (the main limitation of the built-in polymorphic hash function) but it could be used to generate explicit comparison functions for shared memory as well.  We need to split out the various support libraries in the ORM separately at some point anyway, so if the hash generator is of any use I'll cut a release.

[1] http://github.com/avsm/ocaml-orm-sqlite/tree/master/hash/

-anill

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-28 12:28                 ` Gerd Stolpmann
  2009-12-28 15:07                   ` Anil Madhavapeddy
@ 2009-12-28 18:05                   ` Xavier Leroy
  2009-12-29 16:44                     ` Gerd Stolpmann
  1 sibling, 1 reply; 57+ messages in thread
From: Xavier Leroy @ 2009-12-28 18:05 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

Gerd Stolpmann wrote:

> It works with all types:
> 
> https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli
> 
> look for init_value. It's non-released code yet.
> 
> However, there are some problems: Values outside the heap do not support
> the polymorphic comparison and hash functions. That's a hard limitation,
> e.g. you cannot even compare two strings, or build a hash table with
> strings as keys. That limits the usefulness of shared memory.

In OCaml 3.11 and later, you can call

   caml_page_table_add(In_static_area, start, end)

to inform the run-time system that the address range [start, end)
contains well-formed Caml data that polymorphic primitives can safely
work on.  This should solve your problem.

- Xavier Leroy


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] Re: OCaml is broken
  2009-12-21 22:50       ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  2009-12-22 12:04         ` Erik Rigtorp
@ 2009-12-29 12:07         ` Richard Jones
  1 sibling, 0 replies; 57+ messages in thread
From: Richard Jones @ 2009-12-29 12:07 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: yminsky, caml-list

On Mon, Dec 21, 2009 at 11:50:36PM +0100, Erik Rigtorp wrote:
> Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The
> benchmarks are running on a cpuset with 1 core. I measure the time
> from sending in one process until the other process receives the
> message. So a context switch and the message passing is included in
> the measurements.
> 
> Max/Min/Avg
> * Pipes: 28205/5973/6259
> * Unix domain sockets: 44256/7748/8153
> * SYSv message queues: 19197/5895/6173
> * Posix message queues: 37399/10965/11303
> * TCP on loopback: 29017/7471/7885
> 
> So the latency is roughly 10µs for all these solutions. That latency
> is pretty high and would be several times the processing time of the
> message itself.

Are you pinning processes?  Without pinning and understanding the
corresponding physical architecture of the machine, such tests are
pretty much useless.

Also - Solaris 10 ...?  That boat left a long time ago.  You should
really be thinking about migrating to modern operating systems run by
a company with a future.

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: multicore wish
  2009-12-28 18:05                   ` Xavier Leroy
@ 2009-12-29 16:44                     ` Gerd Stolpmann
  0 siblings, 0 replies; 57+ messages in thread
From: Gerd Stolpmann @ 2009-12-29 16:44 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list


Am Montag, den 28.12.2009, 19:05 +0100 schrieb Xavier Leroy:
> Gerd Stolpmann wrote:
> 
> > It works with all types:
> > 
> > https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli
> > 
> > look for init_value. It's non-released code yet.
> > 
> > However, there are some problems: Values outside the heap do not support
> > the polymorphic comparison and hash functions. That's a hard limitation,
> > e.g. you cannot even compare two strings, or build a hash table with
> > strings as keys. That limits the usefulness of shared memory.
> 
> In OCaml 3.11 and later, you can call
> 
>    caml_page_table_add(In_static_area, start, end)
> 
> to inform the run-time system that the address range [start, end)
> contains well-formed Caml data that polymorphic primitives can safely
> work on.  This should solve your problem.

Yes, in deed! Very nice.

As there are still unused bits in the page table... I'm thinking about
the following. One bit could be used by the GC to indicate that values
somewhere in the page are still referenced from heap memory. At the
beginning of the mark phase the bit is cleared. Whenever a pointer is
seen that goes to a page marked as In_static_area, the bit for the area
is set. As before, the mark phase ignores these pointers otherwise (they
are not followed). After the mark phase, one could check which of the
pages are still used, and this would make it possible to call finalisers
for whole pages. As we only add an "else case" to the mark loop, the
costs for this are negligible.

In the shared memory scenario, each process mapping share memory could
define such a finaliser, and only when it is called the memory block is
unmapped. (And when a global counter is decreased to 0, the shared
memory object as such can be deleted. But that's outside the ocaml
runtime.)

With that help, shared memory would be relatively safe to use. The only
remaining trap would be value pointers that go from shared memory to
heap memory - but ocaml has already lots of ways to control mutability
of values, so I think it is ok to let the programmer do this.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: general question, was Re: OCaml is broken
  2009-12-21 14:19         ` general question, was " Keyan
                             ` (3 preceding siblings ...)
  2009-12-21 18:48           ` Jon Harrop
@ 2010-01-03 10:49           ` Sylvain Le Gall
  2010-01-03 20:06             ` [Caml-list] " Jon Harrop
  4 siblings, 1 reply; 57+ messages in thread
From: Sylvain Le Gall @ 2010-01-03 10:49 UTC (permalink / raw)
  To: caml-list

On 21-12-2009, Keyan <ml@pulsschlag.net> wrote:
> Hi,
>
> i have a large project written in C++, for which i am planing to write
> add-ons and tools in ocaml, e.g. different tools to analyse my code
> (dependency stuff), an interpreter for a script-language i plan to
> include, etc, etc. form my time at the uni i remembered that ocaml
> allows to compile libraries which can be included in c/c++ program,
> and i know people who use it extensively in other projects. therefore,
> i decided to give ocaml a try. i like functional programming, and my
> first steps with ocaml are very promising.
>
> following this discussion, i am not so sure anymore, if ocaml is a
> good decision. may be i got this discussion wrong, but if ocaml is
> dying out, i might have to look for another functional programming
> language to use with my project.
>

OCaml is not dying out at all (v3.11.2 is being prepared, v3.12.0 is
coming soon). 

Of course, the core OCaml distribution (the one shipped by INRIA), is
missing some features. You can use libraries/alternative compiler to
have these features back (cothreads, jocaml, camlp3l). 

The only point of the whole discussion -- which is a recurring point by
some of those who participate -- is the lack of shared-memory
parallelism in the core language. Other languages like C or C++ are also
lacking this support in their core definition... I.e. there is way to
do it but you need to use pthread or Win32 thread.

All in all, you can go a long way writing your tools with OCaml without
encounting these problems. For the topic you describe, OCaml is a good
choice.

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [Caml-list] Re: general question, was Re: OCaml is broken
  2010-01-03 10:49           ` Sylvain Le Gall
@ 2010-01-03 20:06             ` Jon Harrop
  0 siblings, 0 replies; 57+ messages in thread
From: Jon Harrop @ 2010-01-03 20:06 UTC (permalink / raw)
  To: caml-list; +Cc: Sylvain Le Gall

On Sunday 03 January 2010 10:49:38 Sylvain Le Gall wrote:
> The only point of the whole discussion -- which is a recurring point by
> some of those who participate -- is the lack of shared-memory
> parallelism in the core language.

I solved the problem: the latest version of HLVM now facilitates 
high-performance shared-memory parallelism.

The remaining challenges to making this more user friendly are:

1. High-level constructs for parallelism in HLVM (task queues).

2. OCaml<->HLVM interop, probably by destructuring values passed from the 
OCaml world so that HLVM programs can use them directly and return results by 
mutating values on the OCaml side.

3. Camlp4 macros so users can write their HLVM code in an OCaml-like DSL.

I believe this is basically an optimal solution for OCaml's multicore problem 
given the practical constraints.

The future's looking bright again. :-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2010-01-03 18:52 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-19 19:38 OCaml is broken Jeff Shaw
2009-12-20  4:43 ` [Caml-list] " Jon Harrop
2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
2009-12-20 13:22     ` Martin Jambon
2009-12-20 13:47     ` Yaron Minsky
2009-12-20 16:01       ` Gerd Stolpmann
2009-12-21 22:50       ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
2009-12-22 12:04         ` Erik Rigtorp
2009-12-22 12:27           ` Mihamina Rakotomandimby
2009-12-22 13:27           ` Gerd Stolpmann
2009-12-23 11:25             ` Erik Rigtorp
2009-12-29 12:07         ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Richard Jones
2009-12-20 14:27     ` Dario Teixeira
2009-12-20 21:14       ` Jon Harrop
2009-12-21  1:08         ` Gerd Stolpmann
2009-12-21  4:30           ` Jon Harrop
2009-12-21  3:58             ` Yaron Minsky
2009-12-21  5:32             ` Markus Mottl
2009-12-21 13:29               ` Jon Harrop
2009-12-26 17:08           ` orbitz
2009-12-20 19:38     ` [***SPAM*** Score/Req: 10.1/8.0] " Jon Harrop
2009-12-21 12:26       ` Mihamina Rakotomandimby
2009-12-21 14:19         ` general question, was " Keyan
2009-12-21 14:40           ` [Caml-list] " rixed
2009-12-21 14:42           ` Gerd Stolpmann
2009-12-21 15:25             ` Eray Ozkural
2009-12-21 14:50           ` Philip
2009-12-21 15:01             ` Keyan
2009-12-21 15:13               ` Stefano Zacchiroli
2009-12-21 15:27               ` Dario Teixeira
2009-12-21 15:46                 ` Jacques Carette
2009-12-21 18:50             ` Jon Harrop
2009-12-21 18:48           ` Jon Harrop
2010-01-03 10:49           ` Sylvain Le Gall
2010-01-03 20:06             ` [Caml-list] " Jon Harrop
2009-12-21 13:07     ` [***SPAM*** Score/Req: 10.1/8.0] Re: [Caml-list] " Damien Doligez
2009-12-21 13:31   ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Goswin von Brederlow
2009-12-21 14:19     ` multicore wish Mihamina Rakotomandimby
2009-12-21 16:15       ` [Caml-list] " Fischbacher T.
2009-12-21 17:42       ` Dario Teixeira
2009-12-21 18:43       ` Jon Harrop
2009-12-21 19:53     ` multicore wish [Was: Re: [Caml-list] Re: OCaml is broken] Jon Harrop
2009-12-22 13:09       ` multicore wish Goswin von Brederlow
2009-12-22 19:12         ` [Caml-list] " Jon Harrop
2009-12-22 18:02           ` Edgar Friendly
2009-12-22 19:20             ` Jon Harrop
2009-12-24 12:58               ` Goswin von Brederlow
2009-12-24 16:51                 ` Jon Harrop
2009-12-24 13:19           ` Goswin von Brederlow
2009-12-24 17:06             ` Jon Harrop
2009-12-27 12:45               ` Goswin von Brederlow
2009-12-27 16:37                 ` Jon Harrop
2009-12-28 12:28                 ` Gerd Stolpmann
2009-12-28 15:07                   ` Anil Madhavapeddy
2009-12-28 18:05                   ` Xavier Leroy
2009-12-29 16:44                     ` Gerd Stolpmann
2009-12-20 11:56 ` [***SPAM*** Score/Req: 10.1/8.0] [Caml-list] Re: OCaml is broken Erik Rigtorp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).