From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr X-Spam-Level: X-Spam-Status: No, score=0.7 required=5.0 tests=AWL,HTML_10_20,HTML_MESSAGE autolearn=disabled version=3.1.3 X-Original-To: caml-list@yquem.inria.fr Delivered-To: caml-list@yquem.inria.fr Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105]) by yquem.inria.fr (Postfix) with ESMTP id 0B862BBCA for ; Fri, 9 May 2008 23:13:29 +0200 (CEST) X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AnsFADtbJEjRVcbleWdsb2JhbACCNDePHAEBCwUCBAkPk1+FQA X-IronPort-AV: E=Sophos;i="4.27,462,1204498800"; d="scan'208";a="26013512" Received: from rv-out-0506.google.com ([209.85.198.229]) by mail4-smtp-sop.national.inria.fr with ESMTP; 09 May 2008 23:13:27 +0200 Received: by rv-out-0506.google.com with SMTP id f6so1531855rvb.3 for ; Fri, 09 May 2008 14:13:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; bh=DY+oaPMj2Qh+AiBZYmwbB4pACeDbQXhOcuSHRL4hqnU=; b=WX0ru21vX+V+vN1ppeDqL5GXWD1NNm2UOfrV9PevlKJmB36OflmfO8v12H4lVBcmj16PEs1ZZvsjSHVAP2SdYgzSOSWH0miXB9Bj8a8aVLSmLR29KtWnM93gfcltdkBdEY09cLwzEnmszu/2e9dvLNTt8WqRW2rV4SAPk1NwO10= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references; b=wvFrQzlwtgZ0T3hFBYBgmyhkTyMOqIPKGXWllgd0F1lPjtJ+qpITtv1m+utapT2893UuiPWS03U7M8BemlQaK5bPIsNVLjT17xr7Y3caHNuK9Tt2mfy7sQQ/kh0lPa+85kr5Ai3SqP6LuS2HmvK6yEDg5hhygbMVoVr12ci2wbg= Received: by 10.141.123.4 with SMTP id a4mr2373287rvn.172.1210367606180; Fri, 09 May 2008 14:13:26 -0700 (PDT) Received: by 10.140.193.3 with HTTP; Fri, 9 May 2008 14:13:26 -0700 (PDT) Message-ID: Date: Fri, 9 May 2008 23:13:26 +0200 From: "Berke Durak" To: "Till Varoquaux" Subject: Re: [Caml-list] Re: Why OCaml rocks Cc: caml-list@yquem.inria.fr In-Reply-To: <9d3ec8300805091400q1ed60bf8x95e31814ebf62473@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_5367_10707950.1210367606168" References: <200805090139.54870.jon@ffconsultancy.com> <200805090609.36123.jon@ffconsultancy.com> <1210331526.17578.32.camel@flake.lan.gerd-stolpmann.de> <200805091910.41381.jon@ffconsultancy.com> <1210365645.17578.88.camel@flake.lan.gerd-stolpmann.de> <9d3ec8300805091400q1ed60bf8x95e31814ebf62473@mail.gmail.com> X-Spam: no; 0.00; berke:01 durak:01 berke:01 durak:01 ocaml:01 ocaml:01 gerd:01 stolpmann:01 gerd:01 stolpmann:01 parallelism:01 o'caml:01 parallelism:01 o'caml:01 mutable:01 X-Attachments: cset="UTF-8" cset="UTF-8" ------=_Part_5367_10707950.1210367606168 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Fri, May 9, 2008 at 11:00 PM, Till Varoquaux wrote: > First of all let's try to stop the squabling and have some actual some > discussions with actual content (trolling is very tempting and I am > the first to fall for it). OCaml is extremly nice but not perfect. > Other languages have other tradeoffs and the INRIA is not here to > fullfill all our desires. > > > On Fri, May 9, 2008 at 9:40 PM, Gerd Stolpmann > wrote: > > > > Am Freitag, den 09.05.2008, 19:10 +0100 schrieb Jon Harrop: > >> On Friday 09 May 2008 12:12:00 Gerd Stolpmann wrote: > >> > I think the parallelism capabilities are already excellent. We have > been > >> > able to implement the application backend of Wink's people search in > >> > O'Caml, and it is of course a highly parallel system of programs. This > >> > is not the same class raytracers or desktop parallelism fall into - > this > >> > is highly professional supercomputing. I'm talking about a cluster of > >> > ~20 computers with something like 60 CPUs. > >> > > >> > Of course, we did not use multithreading very much. We are relying on > >> > multi-processing (both "fork"ed style and separately started > programs), > >> > and multiplexing (i.e. application-driven micro-threading). I > especially > >> > like the latter: Doing multiplexing in O'Caml is fun, and a substitute > >> > for most applications of multithreading. For example, you want to > query > >> > multiple remote servers in parallel: Very easy with multiplexing, > >> > whereas the multithreaded counterpart would quickly run into > scalability > >> > problems (threads are heavy-weight, and need a lot of resources). > >> > >> If OCaml is good for concurrency on distributed systems that is great > but it > >> is completely different to CPU-bound parallelism on multicores. > > > > You sound like somebody who tries to sell hardware :-) > > > > Well, our algorithms are quite easy to parallelize. I don't see a > > difference in whether they are CPU-bound or disk-bound - we also have > > lots of CPU-bound stuff, and the parallelization strategies are the > > same. > > > > The important thing is whether the algorithm can be formulated in a way > > so that state mutations are rare, or can at least be done in a > > "cache-friendly" way. Such algorithms exist for a lot of problems. I > > don't know which problems you want to solve, but it sounds like as if it > > were special problems. Like for most industries, most of our problems > > are simply "do the same for N objects" where N is very large, and > > sometimes "sort data", also for large N. > > > >> > In our case, the mutable data structures that count are on disk. > >> > Everything else is only temporary state. > >> > >> Exactly. That is a completely different kettle of fish to writing high > >> performance numerical codes for scientific computing. > > > > I don't understand. Relying on disk for sharing state is a big problem > > for us, but unavoidable. Disk is slow memory with a very special timing. > > Experience shows that even accessing state over the network is cheaper > > than over disk. Often, we end up designing our algorithms around the > > disk access characteristics. Compared to that the access to RAM-backed > > state over network is fast and easy. > > > shm_open shares memories through file descriptors and, under > linux/glibc, this done using /dev/shm. You can mmap this as a bigarray > and, voila, shared memory. This is quite nice for numerical > computation, plus you get closures etc... in your forks. Oh and COW on > modern OS's makes this very cheap. Yes, that's the kind of approach I like. - Do not forget to do a Gc.compact before forking to avoid collecting the same unreacahble data in each fork. - For sharing complex data, you can marshall into a shared Bigarray. If the speed of Marshal becomes a bottleneck, a specialized Marshal that skips most of the checks/byte-oriented, compact serialization things that extern.c currently does could speed things up. - A means for inter-process synchronization/communication is still needed. A userland solution using a shared memory consensus algorithm (which would probably require some C or assembly for atomic operations) could be cheap. -- Berke ------=_Part_5367_10707950.1210367606168 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline On Fri, May 9, 2008 at 11:00 PM, Till Varoquaux <till.varoquaux@gmail.com> wrote:
First of all let's try to stop the squabling and have some actual some
discussions with actual content (trolling is very tempting and I am
the first to fall for it). OCaml is extremly nice but not perfect.
Other languages have other tradeoffs and the INRIA is not here to
fullfill all our desires.


On Fri, May 9, 2008 at 9:40 PM, Gerd Stolpmann <info@gerd-stolpmann.de> wrote:
>
> Am Freitag, den 09.05.2008, 19:10 +0100 schrieb Jon Harrop:
>> On Friday 09 May 2008 12:12:00 Gerd Stolpmann wrote:
>> > I think the parallelism capabilities are already excellent. We have been
>> > able to implement the application backend of Wink's people search in
>> > O'Caml, and it is of course a highly parallel system of programs. This
>> > is not the same class raytracers or desktop parallelism fall into - this
>> > is highly professional supercomputing. I'm talking about a cluster of
>> > ~20 computers with something like 60 CPUs.
>> >
>> > Of course, we did not use multithreading very much. We are relying on
>> > multi-processing (both "fork"ed style and separately started programs),
>> > and multiplexing (i.e. application-driven micro-threading). I especially
>> > like the latter: Doing multiplexing in O'Caml is fun, and a substitute
>> > for most applications of multithreading. For example, you want to query
>> > multiple remote servers in parallel: Very easy with multiplexing,
>> > whereas the multithreaded counterpart would quickly run into scalability
>> > problems (threads are heavy-weight, and need a lot of resources).
>>
>> If OCaml is good for concurrency on distributed systems that is great but it
>> is completely different to CPU-bound parallelism on multicores.
>
> You sound like somebody who tries to sell hardware :-)
>
> Well, our algorithms are quite easy to parallelize. I don't see a
> difference in whether they are CPU-bound or disk-bound - we also have
> lots of CPU-bound stuff, and the parallelization strategies are the
> same.
>
> The important thing is whether the algorithm can be formulated in a way
> so that state mutations are rare, or can at least be done in a
> "cache-friendly" way. Such algorithms exist for a lot of problems. I
> don't know which problems you want to solve, but it sounds like as if it
> were special problems. Like for most industries, most of our problems
> are simply "do the same for N objects" where N is very large, and
> sometimes "sort data", also for large N.
>
>> > In our case, the mutable data structures that count are on disk.
>> > Everything else is only temporary state.
>>
>> Exactly. That is a completely different kettle of fish to writing high
>> performance numerical codes for scientific computing.
>
> I don't understand. Relying on disk for sharing state is a big problem
> for us, but unavoidable. Disk is slow memory with a very special timing.
> Experience shows that even accessing state over the network is cheaper
> than over disk. Often, we end up designing our algorithms around the
> disk access characteristics. Compared to that the access to RAM-backed
> state over network is fast and easy.
>
shm_open shares memories through file descriptors and, under
linux/glibc, this done using /dev/shm. You can mmap this as a bigarray
and, voila, shared memory. This is quite nice for numerical
computation, plus you get closures etc... in your forks. Oh and COW on
modern OS's makes this very cheap.

Yes, that's the kind of approach I like.

- Do not forget to do a Gc.compact before forking to avoid collecting the same unreacahble data in each fork.

- For sharing complex data, you can marshall into a shared Bigarray.

If the speed of Marshal becomes a bottleneck, a specialized Marshal that skips most of the checks/byte-oriented, compact serialization things that extern.c currently does could speed
things up.

- A means for inter-process synchronization/communication is still needed.  A userland solution using a shared memory consensus algorithm (which would probably require some
C or assembly for atomic operations) could be cheap.
--
Berke

------=_Part_5367_10707950.1210367606168--