caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* OCaml is broken
@ 2009-12-19  9:30 Erik Rigtorp
  2009-12-19  9:42 ` [Caml-list] " Stéphane Glondu
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Erik Rigtorp @ 2009-12-19  9:30 UTC (permalink / raw)
  To: caml-list

Hi!

I've been using Erlang and C++ to build a soft real-time system. As
the project has evolved we've needed to write more and more of the
code in C++ in order to achieve our latency requirements. But C++ is
not as performant as you might think until you start to write your own
allocators and cache aligning mallocs and datastructures. I've never
liked C++ so I decided to try OCaml and built a simple 100 line
program to build order books for Nasdaq. Turns out OCaml has really
competitive performance while being a really nice language.

However OCaml is broken! It does not provide any support for multicore
architectures, which by now is considered a bug! It doesn't even allow
me to load multiple runtimes into one C program.

Please fix OCaml! The first step would be to support multiple runtimes
running in the same process communicating using message queues.

Erik Rigtop


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] OCaml is broken
  2009-12-19  9:30 OCaml is broken Erik Rigtorp
@ 2009-12-19  9:42 ` Stéphane Glondu
  2009-12-19 10:38 ` Sylvain Le Gall
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Stéphane Glondu @ 2009-12-19  9:42 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: caml-list

Erik Rigtorp a écrit :
> However OCaml is broken! It does not provide any support for multicore
> architectures, which by now is considered a bug! [...]

You might be interested by OCaml4Multicore:

  http://www.algo-prog.info/ocmc/web/

It's still experimental, but its authors would love to have feedback.


Best regards,

-- 
Stéphane


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: OCaml is broken
  2009-12-19  9:30 OCaml is broken Erik Rigtorp
  2009-12-19  9:42 ` [Caml-list] " Stéphane Glondu
@ 2009-12-19 10:38 ` Sylvain Le Gall
  2009-12-19 18:22 ` [Caml-list] " Thomas Fischbacher
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: Sylvain Le Gall @ 2009-12-19 10:38 UTC (permalink / raw)
  To: caml-list

On 19-12-2009, Erik Rigtorp <erik@rigtorp.com> wrote:
>
> Please fix OCaml! The first step would be to support multiple runtimes
> running in the same process communicating using message queues.
>

You should take a look at:
http://jocaml.inria.fr/

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] OCaml is broken
  2009-12-19  9:30 OCaml is broken Erik Rigtorp
  2009-12-19  9:42 ` [Caml-list] " Stéphane Glondu
  2009-12-19 10:38 ` Sylvain Le Gall
@ 2009-12-19 18:22 ` Thomas Fischbacher
  2009-12-20 16:18 ` Gerd Stolpmann
  2010-01-01 16:25 ` [Caml-list] " Florian Weimer
  4 siblings, 0 replies; 21+ messages in thread
From: Thomas Fischbacher @ 2009-12-19 18:22 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: caml-list


Erik Rigtorp wrote:

> However OCaml is broken! It does not provide any support for multicore
> architectures, which by now is considered a bug! It doesn't even allow
> me to load multiple runtimes into one C program.

My washing machine is broken. I cannot bake Pizza with it.

-- 
best regards,
Thomas Fischbacher
t.fischbacher@soton.ac.uk


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] OCaml is broken
  2009-12-19  9:30 OCaml is broken Erik Rigtorp
                   ` (2 preceding siblings ...)
  2009-12-19 18:22 ` [Caml-list] " Thomas Fischbacher
@ 2009-12-20 16:18 ` Gerd Stolpmann
  2009-12-21 19:55   ` Erik Rigtorp
  2010-01-01 16:25 ` [Caml-list] " Florian Weimer
  4 siblings, 1 reply; 21+ messages in thread
From: Gerd Stolpmann @ 2009-12-20 16:18 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: caml-list


Am Samstag, den 19.12.2009, 10:30 +0100 schrieb Erik Rigtorp:
> Hi!
> 
> I've been using Erlang and C++ to build a soft real-time system. As
> the project has evolved we've needed to write more and more of the
> code in C++ in order to achieve our latency requirements. But C++ is
> not as performant as you might think until you start to write your own
> allocators and cache aligning mallocs and datastructures. I've never
> liked C++ so I decided to try OCaml and built a simple 100 line
> program to build order books for Nasdaq. Turns out OCaml has really
> competitive performance while being a really nice language.
> 
> However OCaml is broken! It does not provide any support for multicore
> architectures, which by now is considered a bug! It doesn't even allow
> me to load multiple runtimes into one C program.
> 
> Please fix OCaml! The first step would be to support multiple runtimes
> running in the same process communicating using message queues.

As you mention order books and soft-realtime, I guess your main concern
are minimized latencies. Well, you need then a style of parallelism that
focuses on a certain processing path for a single data item, and where
the latency is minimized by using several cores. I think ocaml is
unsuited for this type of task, but please don't call ocaml "broken"
because of this. Other types of parallelism can be well supported,
especially when you can accept multi-processing, and when you focus on
larger processing paths and partitioned data sets.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] OCaml is broken
  2009-12-20 16:18 ` Gerd Stolpmann
@ 2009-12-21 19:55   ` Erik Rigtorp
  2009-12-21 21:21     ` Sylvain Le Gall
  0 siblings, 1 reply; 21+ messages in thread
From: Erik Rigtorp @ 2009-12-21 19:55 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: caml-list

On Sun, Dec 20, 2009 at 17:18, Gerd Stolpmann <gerd@gerd-stolpmann.de> wrote:
> As you mention order books and soft-realtime, I guess your main concern
> are minimized latencies. Well, you need then a style of parallelism that
> focuses on a certain processing path for a single data item, and where
> the latency is minimized by using several cores. I think ocaml is
> unsuited for this type of task, but please don't call ocaml "broken"

It is actually perfectly suitable for this, as long as you can run
multiple runtimes and share data via low latency message passing. This
is kind of how Erlang does it, with a separate heap for each
lightweight thread (called process in Erlang).

> because of this. Other types of parallelism can be well supported,
> especially when you can accept multi-processing, and when you focus on
> larger processing paths and partitioned data sets.

I agree, but fork() and pipe() has problems.

1. Latency is high ~10µs when tuned on Solaris, substantially higher
>100µs on Linux.
2. Runtimes don't share memory so data has to be copied. This is fine
for small datasets or streaming data, like passing order book updates
around.

Even if I want to process a dataset and partition it and sends the
work to multiple processes there is no framework in OCaml for me to
use.

Erik


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: OCaml is broken
  2009-12-21 19:55   ` Erik Rigtorp
@ 2009-12-21 21:21     ` Sylvain Le Gall
  2009-12-29 12:00       ` [Caml-list] " Richard Jones
  0 siblings, 1 reply; 21+ messages in thread
From: Sylvain Le Gall @ 2009-12-21 21:21 UTC (permalink / raw)
  To: caml-list

On 21-12-2009, Erik Rigtorp <erik@rigtorp.com> wrote:
> On Sun, Dec 20, 2009 at 17:18, Gerd Stolpmann <gerd@gerd-stolpmann.de> wrote:
>
> Even if I want to process a dataset and partition it and sends the
> work to multiple processes there is no framework in OCaml for me to
> use.
>

There are many frameworks at hand, just search for it:
- ocamlp3l
- jocaml
- RPC with ocamlnet
- cothreads
- Ancient
- OCamlMPI

They maybe not look like exactly what you want, but they are close
enough to do what you want.

FYI, I have created a commercial application for sorting/processing big
files using OCaml. It runs using multi-processes as fast as other
commercial programs that do the same thing. In particular, it runs
faster than another well-known program written in C, using threads on
Windows and on Linux. 

Regards,
Sylvain Le Gall


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-21 21:21     ` Sylvain Le Gall
@ 2009-12-29 12:00       ` Richard Jones
  0 siblings, 0 replies; 21+ messages in thread
From: Richard Jones @ 2009-12-29 12:00 UTC (permalink / raw)
  To: Sylvain Le Gall; +Cc: caml-list

On Mon, Dec 21, 2009 at 09:21:09PM +0000, Sylvain Le Gall wrote:
> On 21-12-2009, Erik Rigtorp <erik@rigtorp.com> wrote:
> > On Sun, Dec 20, 2009 at 17:18, Gerd Stolpmann <gerd@gerd-stolpmann.de> wrote:
> >
> > Even if I want to process a dataset and partition it and sends the
> > work to multiple processes there is no framework in OCaml for me to
> > use.
> >
> 
> There are many frameworks at hand, just search for it:
> - ocamlp3l
> - jocaml
> - RPC with ocamlnet
> - cothreads
> - Ancient
> - OCamlMPI

Since the OP is interested in latencies, he may also want to look at
the tools lower down the stack for pinning processes and interrupts to
physical CPUs (eg. Tuna http://userweb.kernel.org/~acme/tuna/), and
also at RT kernels.

Red Hat is funding a large amount of research in this area under the
general brand name of MRG (Messaging, Real time and Grid computing):

http://www.redhat.com/mrg/

and we're doing this in partnership with some very large banks.

None of that is really specific to OCaml.  In fact the banks tend to
use Java(!)

Rich.

-- 
Richard Jones
Red Hat


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] OCaml is broken
  2009-12-19  9:30 OCaml is broken Erik Rigtorp
                   ` (3 preceding siblings ...)
  2009-12-20 16:18 ` Gerd Stolpmann
@ 2010-01-01 16:25 ` Florian Weimer
  4 siblings, 0 replies; 21+ messages in thread
From: Florian Weimer @ 2010-01-01 16:25 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: caml-list

* Erik Rigtorp:

> However OCaml is broken! It does not provide any support for multicore
> architectures, which by now is considered a bug!

The run-time library is sufficiently small so that you can run
multiple processes in parallel.  They will even share the code and
constant data.

> It doesn't even allow me to load multiple runtimes into one C
> program.

That should perhaps be fixed, yes.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-21  1:08         ` Gerd Stolpmann
  2009-12-21  4:30           ` Jon Harrop
@ 2009-12-26 17:08           ` orbitz
  1 sibling, 0 replies; 21+ messages in thread
From: orbitz @ 2009-12-26 17:08 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: Jon Harrop, caml-list


On Dec 20, 2009, at 8:08 PM, Gerd Stolpmann wrote:

>> The following web page describes a commercial machine sold by Azul  
>> Systems
>> that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in  
>> a flat
>> SMP configuration:
>>
>>  http://www.azulsystems.com/products/compute_appliance.htm
>>
>> As you can see, a GC with shared memory can already scale across  
>> dozens of
>> cores and memory access is no more heterogeneous than it was 20  
>> years ago.
>> Also, note that homogeneous memory access is a red herring in this  
>> context
>> because it does not undermine the utility of a shared heap on a  
>> multicore.
>
> The benchmarks they mention can all easily be parallelized - that  
> stuff
> you can also do with multi-processing. The interesting thing would  
> be an
> inherent parallel algorithm where the same memory region is accessed  
> by
> multiple threads. Or at least a numeric program (your examples seem to
> be mostly from that area).

I'm not sure if it is relevant here, but it should be noted that a lot  
of the performance gains Azul gets is because they have built their  
own chips that do a lot of tricks for you under the hood.  Last I used  
an Azul Appliance, they perform quite poorly if you are hitting the  
same memory often from multiple threads (the machine I used was about  
4x slower than an equivalent Intel machine for a single core).  If the  
Azul tricks make it into desktop processors, that would likely be  
pretty great.

Also, for what it's worth, lots of cores have actually been less  
performant in the type of computing I currently do.  We want less  
cores and more physical boxes, making multiple processes running  
single threads a better solution for us.  We tend to become memory IO  
bound by multiple cores (the bus cannot keep up with us).  We are  
processing lots of biological data.  For the record we are not using  
Ocaml for our project, just an observation of what model works well  
for us.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-22 13:27           ` Gerd Stolpmann
@ 2009-12-23 11:25             ` Erik Rigtorp
  0 siblings, 0 replies; 21+ messages in thread
From: Erik Rigtorp @ 2009-12-23 11:25 UTC (permalink / raw)
  To: Gerd Stolpmann; +Cc: yminsky, caml-list

On Tue, Dec 22, 2009 at 14:27, Gerd Stolpmann <gerd@gerd-stolpmann.de> wrote:
>
> Am Dienstag, den 22.12.2009, 13:04 +0100 schrieb Erik Rigtorp:
>> On Mon, Dec 21, 2009 at 23:50, Erik Rigtorp <erik@rigtorp.com> wrote:
>> > Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The
>> > benchmarks are running on a cpuset with 1 core. I measure the time
>> > from sending in one process until the other process receives the
>> > message. So a context switch and the message passing is included in
>> > the measurements.
>> >
>> > Max/Min/Avg
>> > * Pipes: 28205/5973/6259
>> > * Unix domain sockets: 44256/7748/8153
>> > * SYSv message queues: 19197/5895/6173
>> > * Posix message queues: 37399/10965/11303
>> > * TCP on loopback: 29017/7471/7885
>> >
>> > So the latency is roughly 10µs for all these solutions. That latency
>> > is pretty high and would be several times the processing time of the
>> > message itself.
>>
>> Some more benchmarks:
>>
>> Max/Min/Avg
>> * Spinlocking shm: 50897/403/761  (This one utilizes multiple cores,
>> since one core is just burning while waiting for data)
>> * Pthreads mutex shm: 27582/5246/6577
>>
>> Forgot to say that all measurements are in nanoseconds.
>
> That's for communication between processes, right? How would the picture
> be different (especially comparing the latter two) if you do message
> passing between threads? If I remember correctly, threads are more
> light-weight in Solaris than processes. That could also affect context
> switching times, and scheduler decisions.

With a system supporting green threads/tasklets/erlang processes over
multiple cores you can have 1µs message passing latencies without busy
waiting. I'll checkout the thread message passing too, but probably
not until after new years.

> Do you have source code? I could also run in on Linux, for comparison.

I'll have that approved by my company first. It would actually be
interesting to create a open source multiplatform IPC message passing
benchmark.

Erik


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-22 12:04         ` Erik Rigtorp
@ 2009-12-22 13:27           ` Gerd Stolpmann
  2009-12-23 11:25             ` Erik Rigtorp
  0 siblings, 1 reply; 21+ messages in thread
From: Gerd Stolpmann @ 2009-12-22 13:27 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: yminsky, caml-list


Am Dienstag, den 22.12.2009, 13:04 +0100 schrieb Erik Rigtorp:
> On Mon, Dec 21, 2009 at 23:50, Erik Rigtorp <erik@rigtorp.com> wrote:
> > Some IPC Benchmarks, Solaris 10 on a quad core Intel Core2 Duo. The
> > benchmarks are running on a cpuset with 1 core. I measure the time
> > from sending in one process until the other process receives the
> > message. So a context switch and the message passing is included in
> > the measurements.
> >
> > Max/Min/Avg
> > * Pipes: 28205/5973/6259
> > * Unix domain sockets: 44256/7748/8153
> > * SYSv message queues: 19197/5895/6173
> > * Posix message queues: 37399/10965/11303
> > * TCP on loopback: 29017/7471/7885
> >
> > So the latency is roughly 10µs for all these solutions. That latency
> > is pretty high and would be several times the processing time of the
> > message itself.
> 
> Some more benchmarks:
> 
> Max/Min/Avg
> * Spinlocking shm: 50897/403/761  (This one utilizes multiple cores,
> since one core is just burning while waiting for data)
> * Pthreads mutex shm: 27582/5246/6577
> 
> Forgot to say that all measurements are in nanoseconds.

That's for communication between processes, right? How would the picture
be different (especially comparing the latter two) if you do message
passing between threads? If I remember correctly, threads are more
light-weight in Solaris than processes. That could also affect context
switching times, and scheduler decisions.

Do you have source code? I could also run in on Linux, for comparison.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-21  5:32             ` Markus Mottl
@ 2009-12-21 13:29               ` Jon Harrop
  0 siblings, 0 replies; 21+ messages in thread
From: Jon Harrop @ 2009-12-21 13:29 UTC (permalink / raw)
  To: caml-list

On Monday 21 December 2009 05:32:38 Markus Mottl wrote:
> On Sun, Dec 20, 2009 at 23:30, Jon Harrop <jon@ffconsultancy.com> wrote:
> > Traffic here:
> >
> > 2007: 5814
> > 2008: 4051
> > 2009: 3071
>
> That's because I don't have much time to post here nowaydays.  I'm
> sure if Jon followed my example, we would have a parallel GC for OCaml
> by the end of the year.

HLVM already has a parallel GC. :-)

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-21  4:30           ` Jon Harrop
  2009-12-21  3:58             ` Yaron Minsky
@ 2009-12-21  5:32             ` Markus Mottl
  2009-12-21 13:29               ` Jon Harrop
  1 sibling, 1 reply; 21+ messages in thread
From: Markus Mottl @ 2009-12-21  5:32 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

On Sun, Dec 20, 2009 at 23:30, Jon Harrop <jon@ffconsultancy.com> wrote:
> Traffic here:
>
> 2007: 5814
> 2008: 4051
> 2009: 3071

That's because I don't have much time to post here nowaydays.  I'm
sure if Jon followed my example, we would have a parallel GC for OCaml
by the end of the year.

Regards,
Markus

-- 
Markus Mottl        http://www.ocaml.info        markus.mottl@gmail.com


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-21  1:08         ` Gerd Stolpmann
@ 2009-12-21  4:30           ` Jon Harrop
  2009-12-21  3:58             ` Yaron Minsky
  2009-12-21  5:32             ` Markus Mottl
  2009-12-26 17:08           ` orbitz
  1 sibling, 2 replies; 21+ messages in thread
From: Jon Harrop @ 2009-12-21  4:30 UTC (permalink / raw)
  To: caml-list

On Monday 21 December 2009 01:08:14 Gerd Stolpmann wrote:
> > The following web page describes a commercial machine sold by Azul
> > Systems that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory
> > in a flat SMP configuration:
> >
> >   http://www.azulsystems.com/products/compute_appliance.htm
> >
> > As you can see, a GC with shared memory can already scale across dozens
> > of cores and memory access is no more heterogeneous than it was 20 years
> > ago. Also, note that homogeneous memory access is a red herring in this
> > context because it does not undermine the utility of a shared heap on a
> > multicore.
>
> The benchmarks they mention can all easily be parallelized - that stuff
> you can also do with multi-processing.

Only if the result is small, otherwise you spend all of your time 
deserializing it. With a shared heap, you just return the resulting value by 
reference.

> The interesting thing would be an 
> inherent parallel algorithm where the same memory region is accessed by
> multiple threads.

Concurrent hash tables are a big thing for Azul:

  "Scales well up to 768 CPUs" -
  http://www.youtube.com/watch?v=WYXgtXWejRM

This blog entry describes performance on 750 cores:

  http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html

> Or at least a numeric program (your examples seem to be mostly from that
> area). 

Yes. You can look at matrix operations or linear algebra (QR decomposition) 
but also things like quicksort and graphics.

Would be interesting to compare symbolic performance as well though.

> > > - Have you considered that many Ocaml users prefer a GC that offers
> > > maximum single core performance,
> >
> > OCaml's GC is nowhere near offering maximum single core performance. Its
> > uniform data representation renders OCaml many times slower than its
> > competitors for many tasks. For example, filling a 10M float->float hash
> > table is over 18x slower with OCaml than with F#. FFT with a complex
> > number type is 5.5x slower with OCaml than F#. Fibonacci with floats is
> > 3.3x slower with OCaml than my own HLVM project (!).
>
> Sure, but these micro benchmarks are first seldom correct, and do not
> really count for real-world programs.
>
> For example, an important parameter of such benchmarks is the frequency
> the GC runs. Ocaml runs the GC very often - good for latencies, but bad
> for micro benchmarks because other runtimes simply delay the GC until
> some limits are exceeded, so these other runtimes often haven't run the
> GC even once in the short period of time the benchmark runs.

You're missing the point: every example I gave shouldn't be doing any GC at 
all and doesn't in F# but spends a lot of time in the GC in OCaml just 
because of unnecessary boxing. The mutator also takes longer because boxing 
damages locality.

> It is simply a fact that the ocaml developers had some preferences. E.g.
> allocating and freeing short-living values is extremely fast (often
> <10ns). This is very good when you do symbolic computations, or have
> lots of small strings, but ignorable for numeric stuff, or for programs
> where the lifetime of allocated memory is bound to server sessions. The
> minor GC is very fast, but, as you observe, the uniform representation
> has costs elsewhere.

Yes. That's why I think the best way forward is to develop HLVM.

> > >   because their application is parallelised via multiple processes
> > >   communicating via message passing?
> >
> > A circular argument based upon the self-selected group of remaining OCaml
> > users. Today's OCaml users use OCaml despite its shortcomings. If you
> > want to see the impact of OCaml's multicore unfriendliness, consider why
> > the OCaml community has haemorrhaged 50% of its users in only 2 years.
>
> Don't see that. That's just speculation - maybe some win32 ocaml users
> switched to F#,

I wasn't a win32 user. :-)

> but there are for sure also other reasons than multicore 
> support, e.g. GUIs and better Windows integration. Btw, where do you get
> your numbers from?

Traffic here:

2007: 5814
2008: 4051
2009: 3071

  http://groups.google.com/group/fa.caml/about

Or searches for OCaml on Google:

  http://www.google.com/trends?q=ocaml%2Cclojure%2Cf%23

The number of OCaml jobs has crashed as well:

  http://www.itjobswatch.co.uk/jobs/uk/ocaml.do

And, of course, what our customers say.

> There are many, many users for whom multicore is just a useless hype.

In 2005, the OCaml community was composed largely of performance junkies who 
came here because OCaml produced excellent performance from succinct and 
readable code on benchmark after benchmark. More people were buying OFS than 
were using Coq. I don't believe for a second that many of OCaml's former 
users thought multicore was just useless hype.

> Either the algorithms are inherently difficult to parallelize (and this
> is vast majority),

I have had great success parallelizing code.

> or are that easy (like all client/server stuff) that multi-processing is
> sufficient.

There are certainly applications where multicore is not beneficial.

> You can consider multicore as a marketing trick of the chip 
> industry to let the ordinary desktop user pay for a feature that is mostly
> interesting for datacenters. 

Ordinary desktop users have been paying top dollar for parallel computers in 
the form of GPUs for some time now. The use of GPUs for more general 
programming has been a really hot topic for years and just became viable. 
Even games consoles have multicores. ARM are making quadcores for your phone 
and netbook!

If I can get HLVM to make parallel OCaml-style programming easy, I think a lot 
of people would love it.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-21  4:30           ` Jon Harrop
@ 2009-12-21  3:58             ` Yaron Minsky
  2009-12-21  5:32             ` Markus Mottl
  1 sibling, 0 replies; 21+ messages in thread
From: Yaron Minsky @ 2009-12-21  3:58 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

[-- Attachment #1: Type: text/plain, Size: 2389 bytes --]

I find the ponderings on the popularity of OCaml to be of limited utility
--- those who pick OCaml based on its popularity are making a terrible
mistake.  OCaml was a deeply unpopular language in 2005 and remains so
today, the variations notwithstanding.  There are other good reasons to use
the language nonetheless.

On Sun, Dec 20, 2009 at 11:30 PM, Jon Harrop <jon@ffconsultancy.com> wrote:

> Or searches for OCaml on Google:
>
>  http://www.google.com/trends?q=ocaml%2Cclojure%2Cf%23


I'm not sure if OCaml is becoming more or less popular, but I find the
evidence for a decline less than convincing.  It is true that there is less
traffic on this list, but it's hard to know how to interpret this.  I
haven't gotten the sense that Python is in decline, but traffic on
comp.lang.python has also been declining since 2005.

Google Trends is also a confusing metric.  For example, it suggests that
Java, Python and C++ have been declining for years:

http://www.google.com/trends?q=java&ctab=0&geo=all&date=all&sort=0
http://www.google.com/trends?q=C%2B%2B&ctab=0&geo=all&date=all&sort=0
http://www.google.com/trends?q=Python&ctab=0&geo=all&date=all&sort=0

My suspicion is that Google Trends gives numbers normalized to the overall
search world, and so things that aren't growing fast look smaller as search
volume in general grows.  Obviously an up-and-coming language like clojure
still shows an upswing, as one would expect from an up-and-coming language.

The number of OCaml jobs has crashed as well:
>
>  http://www.itjobswatch.co.uk/jobs/uk/ocaml.do


I thought this was a silly metric when it spiked up, and continue to think
it's a silly metric today.  There are a tiny number of legitimate ocaml jobs
(and the same is true for Haskell, Clojure, Scala, SML, etc.) and the
ups-and-down in this tiny sample are not statistically significant.  Again:
don't pick OCaml because of the large number of OCaml jobs out there.  There
are very very few, both now and in '05.

Reliable metrics on a community like this are hard to come by, but things
seem quite vibrant to me.  There are always new OCaml startups popping into
existence, new libraries being written, and new things coming out of INRIA
(for example, the arrival of modules as first-class values, which is
expected in OCaml 3.12).  From my point of view, there is still no platform
out there I would rather be using.

y

[-- Attachment #2: Type: text/html, Size: 3572 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 21:14       ` Jon Harrop
@ 2009-12-21  1:08         ` Gerd Stolpmann
  2009-12-21  4:30           ` Jon Harrop
  2009-12-26 17:08           ` orbitz
  0 siblings, 2 replies; 21+ messages in thread
From: Gerd Stolpmann @ 2009-12-21  1:08 UTC (permalink / raw)
  To: Jon Harrop; +Cc: caml-list

> The following web page describes a commercial machine sold by Azul Systems 
> that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in a flat 
> SMP configuration:
> 
>   http://www.azulsystems.com/products/compute_appliance.htm
> 
> As you can see, a GC with shared memory can already scale across dozens of 
> cores and memory access is no more heterogeneous than it was 20 years ago. 
> Also, note that homogeneous memory access is a red herring in this context 
> because it does not undermine the utility of a shared heap on a multicore.

The benchmarks they mention can all easily be parallelized - that stuff
you can also do with multi-processing. The interesting thing would be an
inherent parallel algorithm where the same memory region is accessed by
multiple threads. Or at least a numeric program (your examples seem to
be mostly from that area).

> > - Have you considered that many Ocaml users prefer a GC that offers maximum
> > single core performance, 
> 
> OCaml's GC is nowhere near offering maximum single core performance. Its 
> uniform data representation renders OCaml many times slower than its 
> competitors for many tasks. For example, filling a 10M float->float hash 
> table is over 18x slower with OCaml than with F#. FFT with a complex number 
> type is 5.5x slower with OCaml than F#. Fibonacci with floats is 3.3x slower 
> with OCaml than my own HLVM project (!).

Sure, but these micro benchmarks are first seldom correct, and do not
really count for real-world programs.

For example, an important parameter of such benchmarks is the frequency
the GC runs. Ocaml runs the GC very often - good for latencies, but bad
for micro benchmarks because other runtimes simply delay the GC until
some limits are exceeded, so these other runtimes often haven't run the
GC even once in the short period of time the benchmark runs.

It is simply a fact that the ocaml developers had some preferences. E.g.
allocating and freeing short-living values is extremely fast (often
<10ns). This is very good when you do symbolic computations, or have
lots of small strings, but ignorable for numeric stuff, or for programs
where the lifetime of allocated memory is bound to server sessions. The
minor GC is very fast, but, as you observe, the uniform representation
has costs elsewhere.

> >   because their application is parallelised via multiple processes
> >   communicating via message passing? 
> 
> A circular argument based upon the self-selected group of remaining OCaml 
> users. Today's OCaml users use OCaml despite its shortcomings. If you want to 
> see the impact of OCaml's multicore unfriendliness, consider why the OCaml 
> community has haemorrhaged 50% of its users in only 2 years.

Don't see that. That's just speculation - maybe some win32 ocaml users
switched to F#, but there are for sure also other reasons than multicore
support, e.g. GUIs and better Windows integration. Btw, where do you get
your numbers from?

There are many, many users for whom multicore is just a useless hype.
Either the algorithms are inherently difficult to parallelize (and this
is vast majority), or are that easy (like all client/server stuff) that
multi-processing is sufficient. You can consider multicore as a
marketing trick of the chip industry to let the ordinary desktop user
pay for a feature that is mostly interesting for datacenters.

Gerd
-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 14:27     ` Dario Teixeira
@ 2009-12-20 21:14       ` Jon Harrop
  2009-12-21  1:08         ` Gerd Stolpmann
  0 siblings, 1 reply; 21+ messages in thread
From: Jon Harrop @ 2009-12-20 21:14 UTC (permalink / raw)
  To: caml-list

On Sunday 20 December 2009 14:27:00 Dario Teixeira wrote:
> Hi,
>
> > It's too bad that INRIA is not interested in fixing this bug. No
> > matter what people say I consider this a bug. Two cores is standard by
> > now, I'm used to 8, next year 32 and so on. OCaml will only become
> > more and more irrelevant. I hate to see that happening.
>
> This is a perennial topic in this list.  Without meaning to dwell too
> long on old arguments, I simply ask you to consider the following:
>
> - Do you really think a concurrent GC with shared memory will scale neatly
> to those 32 cores?
>
> - Will memory access remain homogeneous for all cores as soon as we get
> into the dozens of cores?

The following web page describes a commercial machine sold by Azul Systems 
that has up to 16 54-core CPUs (=864 cores) and 768 GB of memory in a flat 
SMP configuration:

  http://www.azulsystems.com/products/compute_appliance.htm

As you can see, a GC with shared memory can already scale across dozens of 
cores and memory access is no more heterogeneous than it was 20 years ago. 
Also, note that homogeneous memory access is a red herring in this context 
because it does not undermine the utility of a shared heap on a multicore.

> - Have you considered that many Ocaml users prefer a GC that offers maximum
> single core performance, 

OCaml's GC is nowhere near offering maximum single core performance. Its 
uniform data representation renders OCaml many times slower than its 
competitors for many tasks. For example, filling a 10M float->float hash 
table is over 18x slower with OCaml than with F#. FFT with a complex number 
type is 5.5x slower with OCaml than F#. Fibonacci with floats is 3.3x slower 
with OCaml than my own HLVM project (!).

>   because their application is parallelised via multiple processes
>   communicating via message passing? 

A circular argument based upon the self-selected group of remaining OCaml 
users. Today's OCaml users use OCaml despite its shortcomings. If you want to 
see the impact of OCaml's multicore unfriendliness, consider why the OCaml 
community has haemorrhaged 50% of its users in only 2 years.

> In this context, your "bug" is actually a "feature".

I'm not even sure you can substantiate that in the very specific context of 
distributed parallel theorem provers because other languages are so much more 
efficient at handling common abstractions like parametric polymorphism. Got 
any benchmarks?

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 13:47     ` Yaron Minsky
@ 2009-12-20 16:01       ` Gerd Stolpmann
  2009-12-21 22:50       ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  1 sibling, 0 replies; 21+ messages in thread
From: Gerd Stolpmann @ 2009-12-20 16:01 UTC (permalink / raw)
  To: yminsky; +Cc: Erik Rigtorp, caml-list


Am Sonntag, den 20.12.2009, 08:47 -0500 schrieb Yaron Minsky:
> On Sun, Dec 20, 2009 at 7:21 AM, Erik Rigtorp <erik@rigtorp.com>
> wrote:
>         The first step for OCaml would be to be able to run multiple
>         communicating instances of the runtime bound to one core each
>         in one
>         process and have them communicate via lock free queues.
> 
> 
> We've done some experiments in this direction at Jane Street.  On
> Linux, we've been able to get fast enough IPC channels for our
> purposes that slamming things into the same memory space has not in
> the end been necessary.  (There is I agree some pain associated with
> running multiple runtimes in the same process.  If you're interested,
> contact me off-list and I can try to get you some of the details of
> what we ran into.)
> 
> 
> But have you tried using shared-memory segments for communicating
> between different processes?  You say the latencies are too high, but
> do you have any measurements you could share? Have you tried queues
> using shared memory segments, in particular?  Inter-thread
> communication has latency as well, and the performance issues depend
> on lots of things, OS and hardware platform included.  It would help
> in understanding the tradeoffs.

I'm also experimenting now with shared memory (shm) as fast IPC
mechanism. I've extended ocamlnet with a few functions that allow to
copy an ocaml value into a shm segment which is accessible as bigarray:

https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/src/netsys/netsys_mem.mli

Look especially for init_string. (I've also to mention Ancient here
which inspired to this work.)

Having ocaml values in shm saves us from some marshalling costs which is
right now the biggest performance penalty when using multiple processes.
However, this causes some problems, and at some point modifications of
the ocaml runtime will be necessary:

- The polymorphic equality and hash primitives do not work anymore
  for values in such shm segments (and that really hurts,
  especially string comparison is broken)
- Given that the shm segment is set to read-only after being set up, it
  is not possible to have pointers from shm to other memory regions.
  This is good, as this would be very dangerous (GC may delete or move
  values in the regular heap). However, the question arises when the
  shm segment can be deleted. We would need help by the GC to identify
  segments that are no longer referenced.

Without that, shm will be restricted to a role as low-level
inter-process buffers. 

> As we go to higher-and-higher numbers of cores, I suspect that
> message-passing solutions are likely to scale better than shared
> memory, so I'm not so sure that OCaml is on the wrong path here.  I
> think that most of the work that's needed is going to come in the form
> of libraries, with only a little work in the compiler and the
> runtime.  Given that, I think this is an issue for the community to
> solve, not INRIA.

Well, message passing and shm do not exclude each other. We should
refine the terminology here: Actually, shm is just a basic mechanism
where several execution threads (including processes) can share memory.
What's often meant is, however, the role it plays for multi-threading,
i.e. shared mutable data structures. What's typical here is that several
threads write to the same memory regions. I don't know a good name for
naming that programming style - maybe multi-threading style shm is the
best.

I'm working on a local message passing queue that can be used for long
messages, based on shm, and where the messages can contain normal ocaml
values (although it is likely that these are copied to the normal heap
by the receiver, for the above mentioned reasons, but this is an
expensive copy). The whole point will be that the data marshalling costs
are minimized. So far I can already say, we will need some changes in
the runtime to make such a mechanism fast and safe.

Gerd

-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is  broken
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  2009-12-20 13:47     ` Yaron Minsky
@ 2009-12-20 14:27     ` Dario Teixeira
  2009-12-20 21:14       ` Jon Harrop
  1 sibling, 1 reply; 21+ messages in thread
From: Dario Teixeira @ 2009-12-20 14:27 UTC (permalink / raw)
  To: Erik Rigtorp; +Cc: caml-list

Hi,

> It's too bad that INRIA is not interested in fixing this bug. No
> matter what people say I consider this a bug. Two cores is standard by
> now, I'm used to 8, next year 32 and so on. OCaml will only become
> more and more irrelevant. I hate to see that happening.

This is a perennial topic in this list.  Without meaning to dwell too
long on old arguments, I simply ask you to consider the following:

- Do you really think a concurrent GC with shared memory will scale neatly
  to those 32 cores?

- Will memory access remain homogeneous for all cores as soon as we get into
  the dozens of cores?

- Have you considered that many Ocaml users prefer a GC that offers maximum
  single core performance, because their application is parallelised via
  multiple processes communicating via message passing?  In this context,
  your "bug" is actually a "feature".

Best regards,
Dario Teixeira






^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Caml-list] Re: OCaml is broken
  2009-12-19 19:38 Jeff Shaw
@ 2009-12-20  4:43 ` Jon Harrop
  2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
  0 siblings, 1 reply; 21+ messages in thread
From: Jon Harrop @ 2009-12-20  4:43 UTC (permalink / raw)
  To: caml-list

On Saturday 19 December 2009 19:38:41 Jeff Shaw wrote:
> My understanding is that since jocaml uses the regular ocaml runtime, it
> is also not multicore enabled.
>
> Haskell is a functional language that has good performance

GHC and the Haskell language itself have serious performance problems.

> that can use multiple processors, but the learning curve is steeper and
> higher. 

And Haskell lacks many of the features OCaml programmers take for granted.

> OCaml is a close relative of Standard ML, so there might be some
> implementation of SML that you like. MLTon might allow multicore use,
> but I'm not sure how mature it is. SML/NJ has a library or language
> extension called Concurrent ML, but I think SML/NJ might not use
> multiple processors.

MLton and SML/NJ are both multicore incapable. The PolyML implementation of 
SML is multicore friendly but last time I looked (many years ago) it was 100x 
slower than OCaml for floating point.

As long as you're looking at OCaml's close relatives with multicore support, 
F# is your only viable option. Soon, HLVM will provide a cross-platform open 
source solution. If you look further you will also find Scala and Clojure.

> Note that if you're not using a lot of threads, you can use Unix.fork to 
> do true multithreaded programming ocaml.

We've discussed the problems with that before. Writing a parallel generic 
quicksort seems to be a good test of a decent multicore capable language 
implementation. Currently, F# is a *long* way ahead of everything open 
source.

-- 
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-01-01 16:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-12-19  9:30 OCaml is broken Erik Rigtorp
2009-12-19  9:42 ` [Caml-list] " Stéphane Glondu
2009-12-19 10:38 ` Sylvain Le Gall
2009-12-19 18:22 ` [Caml-list] " Thomas Fischbacher
2009-12-20 16:18 ` Gerd Stolpmann
2009-12-21 19:55   ` Erik Rigtorp
2009-12-21 21:21     ` Sylvain Le Gall
2009-12-29 12:00       ` [Caml-list] " Richard Jones
2010-01-01 16:25 ` [Caml-list] " Florian Weimer
2009-12-19 19:38 Jeff Shaw
2009-12-20  4:43 ` [Caml-list] " Jon Harrop
2009-12-20 12:21   ` [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
2009-12-20 13:47     ` Yaron Minsky
2009-12-20 16:01       ` Gerd Stolpmann
2009-12-21 22:50       ` [***SPAM*** Score/Req: 10.1/8.0] Re: [***SPAM*** Score/Req: 10.1/8.0] " Erik Rigtorp
2009-12-22 12:04         ` Erik Rigtorp
2009-12-22 13:27           ` Gerd Stolpmann
2009-12-23 11:25             ` Erik Rigtorp
2009-12-20 14:27     ` Dario Teixeira
2009-12-20 21:14       ` Jon Harrop
2009-12-21  1:08         ` Gerd Stolpmann
2009-12-21  4:30           ` Jon Harrop
2009-12-21  3:58             ` Yaron Minsky
2009-12-21  5:32             ` Markus Mottl
2009-12-21 13:29               ` Jon Harrop
2009-12-26 17:08           ` orbitz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).