caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <gerd@gerd-stolpmann.de>
To: Hugo Ferreira <hmf@inescporto.pt>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] Shared memory parallel application: kernel threads
Date: Fri, 12 Mar 2010 13:34:28 +0100	[thread overview]
Message-ID: <1268397268.17070.242.camel@thinkpad> (raw)
In-Reply-To: <4B9A2BCB.3040607@inescporto.pt>

On Fr, 2010-03-12 at 11:55 +0000, Hugo Ferreira wrote:
> Hello,
> 
> I need to implement (meta) heuristic algorithms that
> uses parallelism in order to (attempt to) solve a (hard)
> machine learning problem that is inherently exponential.
> The aim is to take maximum advantage of the multi-core
> processors I have access to.
> 
> To that effect I have revisited many of the lively
> discussions in threads related to concurrency, parallelism
> and shared memory in this mailing list. I however still
> have many doubts, some of which are very basic.
> 
> My initial objective is to make a very simple tests that
> launches k jobs. Each of these jobs must access
> a common data set that is read-only. Each of the k threads
> in turn generates its own data. The data generated by the k
> jobs are then placed in a queue for further processing.
> 
> The process continues by launching (or reusing) k/2 jobs.
> Each job consumes two elements from the queue that where
> previously generated (the common data set must still be
> available). The process repeats itself until k=1. Note
> that the queued data is not small nor can I determine
> a fixed maximum size for it.
> 
> I have opted to use "kernel-level threads" that allow use
> of the (multi-core) processors but still allow "easy"
> access to shared memory".
> 
> I have done a cursory look at:
> - Ocaml.Threads
> - Ocaml.Unix (LinuxThreads)
> - coThreads
> - Ocamlnet2/3 (netshm, netcamlbox)
> (An eThreads library exists in the forge but I did not examine this)
> 
> My first concern is to take advantage of the multi-cores so:
> 
> 1. The thread library is not the answer
>     Chapter 24 - "The threads library is implemented by time-sharing on 
> a
>     single processor. It will not take advantage of multi-processor
>     machines." [1]
> 
> 2. LinuxThreads seems to be what I need
>     "The main strength of this approach is that it can take full
>      advantage of multiprocessors." [2]

I think you mix here several things up. LinuxThreads has nothing to do
with ocaml. It is an implementation of kernel threads for Linux on the C
level. It is considered as outdated as of today, and is usually replaced
by a better implementation (NPTL) that conforms more strictly to the
POSIX standard.

Ocaml uses for its multi-threading implementation the multi-threading
API the OS provides. This might be LinuxThreads or NPTL or something
else. So, on the lower half of the implementation the threads are kernel
threads, and multi-core-enabled. However, Ocaml prevents that more than
one of the kernel threads can run inside its runtime at any time. So
Ocaml code will always run only on one core (but you can call C code,
and this can then take full advantage of multi-cores).

This is the primary reason I am going with multi-processing in my
projects, and why Ocamlnet focuses on it.

The Netcamlbox module of Ocamlnet 3 might be interesting for you. Here
is an example program that mass-multiplies matrices on several cores:

https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/examples/camlbox/manymult.ml

Netcamlbox can move complex values to shared memory, so you are not
restricted to bigarrays. The matrix example uses float array array as
representation. Recursive variants should also be fine.

For providing shared data to all workers, you can simply load it into
the master process before the children processes are forked off. Another
option is (especially when it is a lot of data, and you cannot afford to
have n copies) to create another camlbox in the master process before
forking, and to copy the shared data into it before forking. This avoids
that the data is copied at fork time.

One drawback of Netcamlbox is that it is unsafe, and violating the
programming rules is punished with crashes. (But this also applies, to
some extent, to multi-threading, only that the rules are different.)

Gerd

> 
> Issue 1
> 
> In the manual [3] I see only references to function for the creation
> and  use of processes. I see no calls that allow me to simply generate
> and assign a function (job) to a thread (such as val create : ('a -> 'b)
>   -> 'a -> t in the Thread module). The unix library where LinuxThreads
> is now integrated shows the same API. Am I missing something or
> is their no way to launch "threaded functions" from the Unix module?
> Naturally I assume that threads and processes are not the same thing.
> 
> Issue 2
> 
> If I cannot launch kernel-threads to allow for easy memory sharing, what
> other options do I have besides netshm? The data I must share is defined
> by a recursive variant and is not simple numerical data.
> 
> I would appreciate any comments.
> 
> TIA,
> Hugo F.
> 
> 
> [1] http://caml.inria.fr/pub/docs/manual-ocaml/manual038.html
> [2] http://pauillac.inria.fr/~xleroy/linuxthreads/
> [3] http://caml.inria.fr/pub/docs/manual-ocaml/libref/ThreadUnix.html
> [4] http://caml.inria.fr/pub/docs/manual-ocaml/manual035.html
> 
> 
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


  reply	other threads:[~2010-03-12 12:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-12 11:55 Hugo Ferreira
2010-03-12 12:34 ` Gerd Stolpmann [this message]
2010-03-12 13:36   ` [Caml-list] " Hugo Ferreira
2010-03-12 14:30 ` Sylvain Le Gall
2010-03-12 14:54   ` [Caml-list] " Hugo Ferreira
2010-03-12 23:59     ` Philippe Wang
2010-03-13  9:12       ` Hugo Ferreira
2010-03-13 13:56 ` [Caml-list] " Richard Jones
2010-03-13 14:29   ` Hugo Ferreira
2010-03-13 15:10     ` Richard Jones
2010-03-13 15:37       ` Hugo Ferreira

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1268397268.17070.242.camel@thinkpad \
    --to=gerd@gerd-stolpmann.de \
    --cc=caml-list@yquem.inria.fr \
    --cc=hmf@inescporto.pt \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).