caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Goswin von Brederlow <goswin-v-b@web.de>
To: caml-list@inria.fr
Subject: Re: [Caml-list] Parallelism in newer versions of ocaml
Date: Thu, 16 Jul 2015 11:42:53 +0200	[thread overview]
Message-ID: <20150716094252.GA32592@frosties> (raw)
In-Reply-To: <CAK7rcp-+-tE+4fB9qe46Kw2aa=ujtDhJwTyodBMScW8mFyD_jQ@mail.gmail.com>

On Thu, Jun 18, 2015 at 03:53:46PM -0400, Kenneth Adam Miller wrote:
> Its my personal gathering that the development of ocaml has always been
> driven with the conviction that things should be done with a mathematical
> foundation that supports doing things well over just getting them done
> sooner. Quality over quantity kind of thing.
> 
> I was wondering what kinds of typing rules or new language constructs or
> otherwise judicious restrictions might be put in place for the facilitation
> of doing concurrency very well in ocaml, or if there are even any. Possibly
> it is just the code that the compiler produces that just now will actually
> resolve a thread creation function call? My thinking is the GC wasn't
> written with concurrent accesses in mind, but at the same time, the pi
> calculus would seem close to home with respect to the ocaml philosophy.
> 
> Can any body comment on any of this?

I think it comes down to 2 major things:

1) race conditions accessing data structures

Luckily anything immutable is mostly save from this so ocaml is in a
better position there than other languages. Ocaml would have to insert
memory barriers before modifying any mutable or passing objects to
other threads. Still there are a number of data structures in the
standard library that would need to have thread safe versions or the
user has to wrap them with locks (error prone).


2) Garbage collection in a shared world

With all threads having access to globals or when sharing objects
between threads it becomes much harder to determine what is alive in a
way that doesn't cost more cpu time than you gain from having multiple
cores. I believe that for it to work well something like the
generational heaps needs to be implemented / infered for thread.
Temporary data should reside on a thread/core local heap and be
handled by the local GC alone. Data that gets shared between threads
then needs to go to a shared heap, either by being allocated there
from the start (hard for the compiler to do) or getting moved there
(costs time to check for it and to move as needed).


It also depends majorly on what kind of multithreading/multiprocessing
you use. In recent years I have switched over more and more to a event
/ message passing approach to multiprocessing. Don't have shared data
at all (if you can avoid it). Instead have the data flow from one
thread to another where you always pass ownership to the new thread.

Confused? Imagine you want to build a car from lots of parts with lots
of workers. Traditionall multithreading would have one workplace where
the car is to be build. Each worker aquires the key to the workspace,
locks himself in and adds a wheel or door or whatever. Then he returns
the key for the next worker. As you can imagine there will be a lot of
waiting for the key. And the key might get lost. There are lots of
pitfals in a lock based approach.

Now think about how a modern car factory works. Instead of moving
around the workers and guarding the car with a key you move the car
from station to station. Each station has implicitly sole ownership of
the car and adds a tire or door or whatever and then passes the car on
to the next station. There is no key (no locking), the key can't get
lost. And it's less likely that you loose a whole car.

The extra benefit of the event/message passing approach is that you
can easily go from multithreading to multiprocessing to distributed
system. And hey, multiprocessing would work great for ocaml with each
process running its own ocaml and own GC all the cores of a cpu can be
used. You just need a problem where the communication between
processes isn't the deciding factor. :(

MfG
	Goswin

      parent reply	other threads:[~2015-07-16  9:43 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAK7rcp_SFAx0PS3bgqxUA1dquYuzapXXsgw5A1mirq1Ztdog=Q@mail.gmail.com>
2015-06-18 19:53 ` Kenneth Adam Miller
2015-06-18 20:00   ` Raoul Duke
2015-07-16  9:42   ` Goswin von Brederlow [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150716094252.GA32592@frosties \
    --to=goswin-v-b@web.de \
    --cc=caml-list@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).