From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mail3-relais-sop.national.inria.fr (mail3-relais-sop.national.inria.fr [192.134.164.104])
	by walapai.inria.fr (8.13.6/8.13.6) with ESMTP id p3K80JRt009493
	for <caml-list@sympa-roc.inria.fr>; Wed, 20 Apr 2011 10:00:19 +0200
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: AlYBANyRrk3VhjEVmWdsb2JhbACEUJJwjXwUAQEBAQEICwsHFCWIb60QkSqBKYNOegSSIQc
X-IronPort-AV: E=Sophos;i="4.64,245,1301868000"; 
   d="scan'208";a="81398016"
Received: from ihsmtp01cons.lis.interhost.com ([213.134.49.21])
  by mail3-smtp-sop.national.inria.fr with ESMTP; 20 Apr 2011 10:00:13 +0200
Received: from [192.168.1.64] ([178.166.9.223]) by ihsmtp01cons.lis.interhost.com with Microsoft SMTPSVC(6.0.3790.3959);
	 Wed, 20 Apr 2011 09:00:03 +0100
Message-ID: <4DAE9278.4050701@inescporto.pt>
Date: Wed, 20 Apr 2011 08:59:52 +0100
From: Hugo Ferreira <hmf@inescporto.pt>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.14) Gecko/20110223 Thunderbird/3.1.8
MIME-Version: 1.0
To: Gerd Stolpmann <info@gerd-stolpmann.de>
CC: Eray Ozkural <examachine@gmail.com>,
        Martin Jambon <martin.jambon@ens-lyon.org>, caml-list@inria.fr
References: <2054357367.219171.1300974318806.JavaMail.root@zmbs4.inria.fr>	 <4D8BD02D.1010505@inria.fr> <4D8C73C8.6020801@inescporto.pt>	 <1301055903.8429.314.camel@thinkpad>	 <341494683.237537.1301057887481.JavaMail.root@zmbs4.inria.fr>	 <4D8C944A.9060601@inria.fr> <4D8CB859.9040709@inescporto.pt>	 <4D8CDDCC.4010000@ens-lyon.org> <4D8CEAA4.2030403@inescporto.pt>	 <BANLkTikKSVLZcffTn6Ku8eZUyDdDT8cykA@mail.gmail.com> <1303244809.8429.1272.camel@thinkpad>
In-Reply-To: <1303244809.8429.1272.camel@thinkpad>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 20 Apr 2011 08:00:03.0678 (UTC) FILETIME=[F48FF7E0:01CBFF30]
Subject: Re: [Caml-list] Efficient OCaml multicore -- roadmap?

On 04/19/2011 09:26 PM, Gerd Stolpmann wrote:
> Am Dienstag, den 19.04.2011, 12:57 +0300 schrieb Eray Ozkural:
>>
>>
>> On Fri, Mar 25, 2011 at 9:19 PM, Hugo Ferreira<hmf@inescporto.pt>
>> wrote:
>>          On 03/25/2011 06:24 PM, Martin Jambon wrote:
>>                          On 03/25/2011 01:10 PM, Fabrice Le Fessant
>>                          wrote:
>>                                    Of course, sharing structured
>>                                  mutable data between threads will not
>>                                  be
>>                                  possible, but actually, it is a good
>>                                  thing if you want to write correct
>>                                  programs ;-)
>>
>>                  On 03/25/11 08:44, Hugo Ferreira replied:
>>                          I'll stick to my guns here. It simply makes
>>                          solving certain problem
>>                          unfeasible. Point in case: I work on machine
>>                          learning algorithms. I
>>                          use large data-structures that must be
>>                          processed (altered)
>>                          in order to learn. Because these
>>                          data-structures are large it become
>>                          impractical to copy this to a process every
>>                          time I start off a new
>>                          "thread".
>>
>>                  The solution would be to use get/set via a
>>                  message-passing interface.
>>
>>
>>
>>          Cannot see how this works. Say I want to share a balanced
>>          binary tree.
>>          Several processes/threads each take this tree and alter it by
>>          adding and
>>          deleting elements. Each (new) tree is then further processed
>>          by other
>>          processes/threads.
>>
>>          How can get/set be used in this scenario?
>>
>>
>>
>>
>> I think it won't have good performance and it won't scale, and it will
>> fail for truly delicate shared memory architectures of the future with
>> thousands of cores....
>>
>>
>> And neither will it support on-chip message passing facilities of
>> those future processors.
>>
>>
>> The shared memory message passing never worked too well, anyway, too
>> many redundant copies. Not fitting for high performance computing.
>>
>>
>> No need at all except for embarrassingly parallel applications. I
>> suppose that's the target, right?
>
> I did experiment a bit in the meantime with Netmulticore [1], my
> implementation of multi-processing with shared memory. Netmulticore
> provides both alterable data structures and a quite efficient message
> passing interface. The experience so far: Message passing wins if you do
> it the right way. Avoid ping-pong games like get/set - shared memory is
> far better for this kind of operation. The better topology for message
> passing are pipelines where data flows only into one direction.
>
> I don't know how this translates to Hugo's machine learning problem. I
> could imagine a shared data structure is good here for providing the
> starting point for learning. If you run several learning steps in
> parallel, you want to avoid that these steps lock out each other, i.e.
> try to ensure that they affect distinct parts of the matrix. These
> updates would be sent (using message passing) to an update manager,
> which would apply the learning results to the matrix, and compute the
> next version, for which a number of new learning steps would be started.
> I'm just guessing here how it could be done. In my imagination a clever
> combination of both (alterable) shared memory and message passing is the
> way to go.
>

Agreed. Currently I am using messaging via sexplib to send data-sets to 
slaves for processing and returns results to a master process the same
way. But my objective is to share data structures and return partial
results to a master process. Returning results requires messaging.

> Hugo, I'd like to do some more experiments into this direction. Is there
> a simple version of machine learning algorithm I could try to
> parallelize?
>

Unfortunately not. The "system" currently distributes the experiments
to 8 machines with 8 CPU's each connected in a cluster. The algorithms
themselves, used in each experiment, do _not_ use parallel processing. I
am now working on this (deal with issue of noisy data). My last
attempt tried to use the Ancient module but failed because I needed to
alter complex data-structures.

Once I get the basic algorithm down, I will try to use parallel 
processing again. Lot of work until then 8-(.

Note: a quick look at Netmulticore seems to indicate that I will still 
have to jump some hoops.

Hugo

> Gerd
>
> [1] Netmulticore: now available in ocamlnet-3.3.0test1:
> http://blog.camlcity.org/blog/multicore2.html
>
>>
>>
>> Best,
>>
>> --
>> Eray Ozkural, PhD candidate.  Comp. Sci. Dept., Bilkent University,
>> Ankara
>> http://groups.yahoo.com/group/ai-philosophy
>> http://myspace.com/arizanesil http://myspace.com/malfunct
>>
>
>