From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <berke.durak@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.3 (2006-06-01) on yquem.inria.fr
X-Spam-Level: 
X-Spam-Status: No, score=0.7 required=5.0 tests=AWL,HTML_10_20,HTML_MESSAGE 
	autolearn=disabled version=3.1.3
X-Original-To: caml-list@yquem.inria.fr
Delivered-To: caml-list@yquem.inria.fr
Received: from mail4-relais-sop.national.inria.fr (mail4-relais-sop.national.inria.fr [192.134.164.105])
	by yquem.inria.fr (Postfix) with ESMTP id 0B862BBCA
	for <caml-list@yquem.inria.fr>; Fri,  9 May 2008 23:13:29 +0200 (CEST)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AnsFADtbJEjRVcbleWdsb2JhbACCNDePHAEBCwUCBAkPk1+FQA
X-IronPort-AV: E=Sophos;i="4.27,462,1204498800"; 
   d="scan'208";a="26013512"
Received: from rv-out-0506.google.com ([209.85.198.229])
  by mail4-smtp-sop.national.inria.fr with ESMTP; 09 May 2008 23:13:27 +0200
Received: by rv-out-0506.google.com with SMTP id f6so1531855rvb.3
        for <caml-list@yquem.inria.fr>; Fri, 09 May 2008 14:13:26 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=gamma;
        h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references;
        bh=DY+oaPMj2Qh+AiBZYmwbB4pACeDbQXhOcuSHRL4hqnU=;
        b=WX0ru21vX+V+vN1ppeDqL5GXWD1NNm2UOfrV9PevlKJmB36OflmfO8v12H4lVBcmj16PEs1ZZvsjSHVAP2SdYgzSOSWH0miXB9Bj8a8aVLSmLR29KtWnM93gfcltdkBdEY09cLwzEnmszu/2e9dvLNTt8WqRW2rV4SAPk1NwO10=
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:references;
        b=wvFrQzlwtgZ0T3hFBYBgmyhkTyMOqIPKGXWllgd0F1lPjtJ+qpITtv1m+utapT2893UuiPWS03U7M8BemlQaK5bPIsNVLjT17xr7Y3caHNuK9Tt2mfy7sQQ/kh0lPa+85kr5Ai3SqP6LuS2HmvK6yEDg5hhygbMVoVr12ci2wbg=
Received: by 10.141.123.4 with SMTP id a4mr2373287rvn.172.1210367606180;
        Fri, 09 May 2008 14:13:26 -0700 (PDT)
Received: by 10.140.193.3 with HTTP; Fri, 9 May 2008 14:13:26 -0700 (PDT)
Message-ID: <b903a8570805091413u4b29d08cn417f60cdee93cd53@mail.gmail.com>
Date: Fri, 9 May 2008 23:13:26 +0200
From: "Berke Durak" <berke.durak@gmail.com>
To: "Till Varoquaux" <till.varoquaux@gmail.com>
Subject: Re: [Caml-list] Re: Why OCaml rocks
Cc: caml-list@yquem.inria.fr
In-Reply-To: <9d3ec8300805091400q1ed60bf8x95e31814ebf62473@mail.gmail.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; 
	boundary="----=_Part_5367_10707950.1210367606168"
References: <200805090139.54870.jon@ffconsultancy.com>
	 <200805090609.36123.jon@ffconsultancy.com>
	 <1210331526.17578.32.camel@flake.lan.gerd-stolpmann.de>
	 <200805091910.41381.jon@ffconsultancy.com>
	 <1210365645.17578.88.camel@flake.lan.gerd-stolpmann.de>
	 <9d3ec8300805091400q1ed60bf8x95e31814ebf62473@mail.gmail.com>
X-Spam: no; 0.00; berke:01 durak:01 berke:01 durak:01 ocaml:01 ocaml:01 gerd:01 stolpmann:01 gerd:01 stolpmann:01 parallelism:01 o'caml:01 parallelism:01 o'caml:01 mutable:01 
X-Attachments: cset="UTF-8" cset="UTF-8" 

------=_Part_5367_10707950.1210367606168
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Fri, May 9, 2008 at 11:00 PM, Till Varoquaux <till.varoquaux@gmail.com>
wrote:

> First of all let's try to stop the squabling and have some actual some
> discussions with actual content (trolling is very tempting and I am
> the first to fall for it). OCaml is extremly nice but not perfect.
> Other languages have other tradeoffs and the INRIA is not here to
> fullfill all our desires.
>
>
> On Fri, May 9, 2008 at 9:40 PM, Gerd Stolpmann <info@gerd-stolpmann.de>
> wrote:
> >
> > Am Freitag, den 09.05.2008, 19:10 +0100 schrieb Jon Harrop:
> >> On Friday 09 May 2008 12:12:00 Gerd Stolpmann wrote:
> >> > I think the parallelism capabilities are already excellent. We have
> been
> >> > able to implement the application backend of Wink's people search in
> >> > O'Caml, and it is of course a highly parallel system of programs. This
> >> > is not the same class raytracers or desktop parallelism fall into -
> this
> >> > is highly professional supercomputing. I'm talking about a cluster of
> >> > ~20 computers with something like 60 CPUs.
> >> >
> >> > Of course, we did not use multithreading very much. We are relying on
> >> > multi-processing (both "fork"ed style and separately started
> programs),
> >> > and multiplexing (i.e. application-driven micro-threading). I
> especially
> >> > like the latter: Doing multiplexing in O'Caml is fun, and a substitute
> >> > for most applications of multithreading. For example, you want to
> query
> >> > multiple remote servers in parallel: Very easy with multiplexing,
> >> > whereas the multithreaded counterpart would quickly run into
> scalability
> >> > problems (threads are heavy-weight, and need a lot of resources).
> >>
> >> If OCaml is good for concurrency on distributed systems that is great
> but it
> >> is completely different to CPU-bound parallelism on multicores.
> >
> > You sound like somebody who tries to sell hardware :-)
> >
> > Well, our algorithms are quite easy to parallelize. I don't see a
> > difference in whether they are CPU-bound or disk-bound - we also have
> > lots of CPU-bound stuff, and the parallelization strategies are the
> > same.
> >
> > The important thing is whether the algorithm can be formulated in a way
> > so that state mutations are rare, or can at least be done in a
> > "cache-friendly" way. Such algorithms exist for a lot of problems. I
> > don't know which problems you want to solve, but it sounds like as if it
> > were special problems. Like for most industries, most of our problems
> > are simply "do the same for N objects" where N is very large, and
> > sometimes "sort data", also for large N.
> >
> >> > In our case, the mutable data structures that count are on disk.
> >> > Everything else is only temporary state.
> >>
> >> Exactly. That is a completely different kettle of fish to writing high
> >> performance numerical codes for scientific computing.
> >
> > I don't understand. Relying on disk for sharing state is a big problem
> > for us, but unavoidable. Disk is slow memory with a very special timing.
> > Experience shows that even accessing state over the network is cheaper
> > than over disk. Often, we end up designing our algorithms around the
> > disk access characteristics. Compared to that the access to RAM-backed
> > state over network is fast and easy.
> >
> shm_open shares memories through file descriptors and, under
> linux/glibc, this done using /dev/shm. You can mmap this as a bigarray
> and, voila, shared memory. This is quite nice for numerical
> computation, plus you get closures etc... in your forks. Oh and COW on
> modern OS's makes this very cheap.


Yes, that's the kind of approach I like.

- Do not forget to do a Gc.compact before forking to avoid collecting the
same unreacahble data in each fork.

- For sharing complex data, you can marshall into a shared Bigarray.

If the speed of Marshal becomes a bottleneck, a specialized Marshal that
skips most of the checks/byte-oriented, compact serialization things that
extern.c currently does could speed
things up.

- A means for inter-process synchronization/communication is still needed.
A userland solution using a shared memory consensus algorithm (which would
probably require some
C or assembly for atomic operations) could be cheap.
-- 
Berke

------=_Part_5367_10707950.1210367606168
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Fri, May 9, 2008 at 11:00 PM, Till Varoquaux &lt;<a href="mailto:till.varoquaux@gmail.com">till.varoquaux@gmail.com</a>&gt; wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
First of all let&#39;s try to stop the squabling and have some actual some<br>
discussions with actual content (trolling is very tempting and I am<br>
the first to fall for it). OCaml is extremly nice but not perfect.<br>
Other languages have other tradeoffs and the INRIA is not here to<br>
fullfill all our desires.<br>
<div><div></div><div class="Wj3C7c"><br>
<br>
On Fri, May 9, 2008 at 9:40 PM, Gerd Stolpmann &lt;<a href="mailto:info@gerd-stolpmann.de">info@gerd-stolpmann.de</a>&gt; wrote:<br>
&gt;<br>
&gt; Am Freitag, den 09.05.2008, 19:10 +0100 schrieb Jon Harrop:<br>
&gt;&gt; On Friday 09 May 2008 12:12:00 Gerd Stolpmann wrote:<br>
&gt;&gt; &gt; I think the parallelism capabilities are already excellent. We have been<br>
&gt;&gt; &gt; able to implement the application backend of Wink&#39;s people search in<br>
&gt;&gt; &gt; O&#39;Caml, and it is of course a highly parallel system of programs. This<br>
&gt;&gt; &gt; is not the same class raytracers or desktop parallelism fall into - this<br>
&gt;&gt; &gt; is highly professional supercomputing. I&#39;m talking about a cluster of<br>
&gt;&gt; &gt; ~20 computers with something like 60 CPUs.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Of course, we did not use multithreading very much. We are relying on<br>
&gt;&gt; &gt; multi-processing (both &quot;fork&quot;ed style and separately started programs),<br>
&gt;&gt; &gt; and multiplexing (i.e. application-driven micro-threading). I especially<br>
&gt;&gt; &gt; like the latter: Doing multiplexing in O&#39;Caml is fun, and a substitute<br>
&gt;&gt; &gt; for most applications of multithreading. For example, you want to query<br>
&gt;&gt; &gt; multiple remote servers in parallel: Very easy with multiplexing,<br>
&gt;&gt; &gt; whereas the multithreaded counterpart would quickly run into scalability<br>
&gt;&gt; &gt; problems (threads are heavy-weight, and need a lot of resources).<br>
&gt;&gt;<br>
&gt;&gt; If OCaml is good for concurrency on distributed systems that is great but it<br>
&gt;&gt; is completely different to CPU-bound parallelism on multicores.<br>
&gt;<br>
&gt; You sound like somebody who tries to sell hardware :-)<br>
&gt;<br>
&gt; Well, our algorithms are quite easy to parallelize. I don&#39;t see a<br>
&gt; difference in whether they are CPU-bound or disk-bound - we also have<br>
&gt; lots of CPU-bound stuff, and the parallelization strategies are the<br>
&gt; same.<br>
&gt;<br>
&gt; The important thing is whether the algorithm can be formulated in a way<br>
&gt; so that state mutations are rare, or can at least be done in a<br>
&gt; &quot;cache-friendly&quot; way. Such algorithms exist for a lot of problems. I<br>
&gt; don&#39;t know which problems you want to solve, but it sounds like as if it<br>
&gt; were special problems. Like for most industries, most of our problems<br>
&gt; are simply &quot;do the same for N objects&quot; where N is very large, and<br>
&gt; sometimes &quot;sort data&quot;, also for large N.<br>
&gt;<br>
&gt;&gt; &gt; In our case, the mutable data structures that count are on disk.<br>
&gt;&gt; &gt; Everything else is only temporary state.<br>
&gt;&gt;<br>
&gt;&gt; Exactly. That is a completely different kettle of fish to writing high<br>
&gt;&gt; performance numerical codes for scientific computing.<br>
&gt;<br>
&gt; I don&#39;t understand. Relying on disk for sharing state is a big problem<br>
&gt; for us, but unavoidable. Disk is slow memory with a very special timing.<br>
&gt; Experience shows that even accessing state over the network is cheaper<br>
&gt; than over disk. Often, we end up designing our algorithms around the<br>
&gt; disk access characteristics. Compared to that the access to RAM-backed<br>
&gt; state over network is fast and easy.<br>
&gt;<br>
</div></div>shm_open shares memories through file descriptors and, under<br>
linux/glibc, this done using /dev/shm. You can mmap this as a bigarray<br>
and, voila, shared memory. This is quite nice for numerical<br>
computation, plus you get closures etc... in your forks. Oh and COW on<br>
modern OS&#39;s makes this very cheap.</blockquote><div><br>Yes, that&#39;s the kind of approach I like.<br><br>- Do not forget to do a Gc.compact before forking to avoid collecting the same unreacahble data in each fork.<br>
<br>- For sharing complex data, you can marshall into a shared Bigarray.<br><br>If the speed of Marshal becomes a bottleneck, a specialized Marshal that skips most of the checks/byte-oriented, compact serialization things that extern.c currently does could speed<br>
things up.<br><br></div></div>- A means for inter-process synchronization/communication is still needed.&nbsp; A userland solution using a shared memory consensus algorithm (which would probably require some<br>C or assembly for atomic operations) could be cheap.<br>
-- <br>Berke<br><br>

------=_Part_5367_10707950.1210367606168--