caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Gerd Stolpmann <info@gerd-stolpmann.de>
To: Edgar Friendly <thelema314@gmail.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] SMP multithreading
Date: Tue, 16 Nov 2010 18:04:02 +0100	[thread overview]
Message-ID: <1289927042.16005.176.camel@thinkpad> (raw)
In-Reply-To: <4CE228CA.3030503@gmail.com>

Am Montag, den 15.11.2010, 22:46 -0800 schrieb Edgar Friendly:
> On 11/15/2010 09:27 AM, Wolfgang Draxinger wrote:
> > Hi,
> >
> > I've just read
> > http://caml.inria.fr/pub/ml-archives/caml-list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html
> > in particular this paragraph:
> > | What about hyperthreading?  Well, I believe it's the last convulsive
> > | movement of SMP's corpse :-)  We'll see how it goes market-wise.  At
> > | any rate, the speedups announced for hyperthreading in the Pentium 4
> > | are below a factor of 1.5; probably not enough to offset the overhead
> > | of making the OCaml runtime system thread-safe.
> >
> > This reads just like the "640k ought be enough for everyone". Multicore
> > systems are the standard today. Even the cheapest consumer machines
> > come with at least two cores. Once can easily get 6 core machines today.
> >
> > Still thinking SMP was a niche and was dying?
> >
> > So, what're the developments regarding SMP multithreading OCaml?
> >
> >
> > Cheers
> >
> > Wolfgang
> >
> At the risk of feeding a (possibly unintentional) troll, I'd like to 
> share some possibly new thoughts on this ever-living topic.
> 
> It looks like high-performance computing of the near future will be 
> built out of many machines (message passing), each with many cores 
> (SMP).  One could use message passing for all communication in such a 
> system, but a hybrid approach might be best for this architecture, with 
> use of shared memory within each box and message passing between.  Of 
> course the best choice depends strongly on the particular task.
> 
> In the long run, it'll likely be a combination of a few large, powerful 
> cores (Intel-CPU style w/ the capability to run a single thread as fast 
> as possible) with many many smaller compute engines (GPGPUs or the like, 
> optimized for power and area, closely coupled with memory) that provides 
> the highest performance density.
> 
> The question of how to program such an architecture seems as if it's 
> being answered without the functional community's input. What can we 
> contribute?

Yes, that's generally the right question. Current hardware is a kind of
experiment - vendors have only taken the multicore path because it is
right now the easiest way of improving the performance potential,
although it is questionable whether (non-server) applications can really
benefit from it (excluding here server apps because for these
parallelization is relatively easy to get). Future hardware will
probably be even more different - however, it is still unclear which
design paths will be taken. Could be manycores (many CPUs with
non-uniform RAM), could be specialized compute units. Maybe we'll see
again a separation of consumer and datacenter markets - the former
optimizing for numeric simulation applications (i.e. games), the latter
for high-throughput data paths and parallel CPU power. The problem here
is that this is all speculation.

There are some things we can do to improve the situation (and some ideas
are not realistic):

      * A probably not-so-difficult improvement would be better message
        passing between independent but local processes. I've started an
        experiment for such a mechanism
        (http://projects.camlcity.org/projects/dl/ocamlnet-3.0.3/doc/html-main/Netcamlbox.html), which tries to exploit that GC-managed memory has a well-known structure. With more help from the GC this could be made even better (safer, fewer corner cases).
      * We need more frameworks for parallel programming. I'm currently
        developing Plasma, a Map/Reduce framework. Using a framework has
        the big advantage that the whole program is structured so it
        profits from parallelization, and that it is possible to train
        developers for it that have no idea about parallelization. There
        are probably more algorithm schemes where this is possible.
      * I have a lot of doubts whether FP languages ever run well on SMP
        with a bigger number of cores. The problem is the relatively
        high memory allocation rate - the GC has to work a lot harder
        than in imperative languages. The OC4MC project uses
        thread-local minor heaps because of this. Probably this is not
        enough, and one even needs thread-local major heaps (plus a
        third generation for values accessed by several threads). All in
        all you could get the same effect by instantiating the ocaml
        runtime several times (if this were possible), let each runtime
        run in its own thread, and provide some extra functionality for
        passing values between threads and for sharing values. This
        would not be exactly the SMP model, but would allow a number of
        parallelization techniques, and is probably future-proof as it
        encourages message passing over sharing. This is certainly worth
        experimentation.
      * One can also tackle the problem from the multi-processing side:
        Provide better mechanisms for message passing (see above) and
        value sharing. (That's probably the path I'll follow for my own
        experiments - no modifications of the runtime, but play tricks
        with the OS.)
      * As somebody mentioned "implicit parallelization": Don't expect
        anything from this. Even if a good compiler finds ways to
        parallelize 20% of the code (which would be a lot), the runtime
        effect would be marginal. 80% of the code is run at normal speed
        (hopefully) and dominates the runtime behavior. The point is
        that such compiler-driven code improvements are only local
        optimizations. For getting good parallelization results you need
        to restructure the design of the program - well, maybe
        compiler2.0 can do this at some time, but this is not in sight.
      * Looking for more "automatic" speedups: I would more look for
        parallelizing parts of the GC (e.g. parallelized sweep), but
        this is probably running quickly against the memory bandwidth
        limit. Maybe using 2 cores for the GC would result in an
        improvement, and more cores get you nothing extra. At least
        worth an experiment.

Gerd


> 
> E.
> 
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
> 


-- 
------------------------------------------------------------
Gerd Stolpmann, Bad Nauheimer Str.3, 64289 Darmstadt,Germany 
gerd@gerd-stolpmann.de          http://www.gerd-stolpmann.de
Phone: +49-6151-153855                  Fax: +49-6151-997714
------------------------------------------------------------


  reply	other threads:[~2010-11-16 17:04 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-15 17:27 Wolfgang Draxinger
2010-11-16  6:46 ` [Caml-list] " Edgar Friendly
2010-11-16 17:04   ` Gerd Stolpmann [this message]
2010-11-16 20:35     ` Eray Ozkural
2010-11-16 22:13       ` Gerd Stolpmann
2010-11-16 23:04         ` Eray Ozkural
2010-11-16 23:52           ` Wolfgang Draxinger
2010-11-17  1:55             ` Eray Ozkural
2010-11-17  3:41             ` Jon Harrop
2010-11-17  3:47           ` Jon Harrop
2010-11-17  4:27             ` Eray Ozkural
2010-11-17  6:50               ` Gabriel Kerneis
2010-11-17 13:41                 ` Eray Ozkural
2010-11-17 21:15                   ` Jon Harrop
2010-11-18  0:28                     ` Eray Ozkural
2010-11-18  1:00                       ` Eray Ozkural
2010-11-16 19:07   ` Norman Hardy
2010-11-17 16:34   ` David Allsopp
2010-11-19 13:57     ` Eray Ozkural
2010-11-16 12:47 ` Sylvain Le Gall
2010-11-17 11:12   ` [Caml-list] " Goswin von Brederlow
2010-11-17 11:34     ` Sylvain Le Gall
2010-11-17 23:08       ` [Caml-list] " Christophe Raffalli
2010-11-19  9:01         ` Christophe TROESTLER
2010-11-19 15:58           ` Goswin von Brederlow
2010-11-20 11:55             ` Jon Harrop
2010-11-20 20:57               ` Goswin von Brederlow
     [not found] ` <AANLkTinyN2hHxm6ha2Yq4nx6NxY3So=BhFN_-EHKYfyc@mail.gmail.com>
2010-11-16 14:11   ` Wolfgang Draxinger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1289927042.16005.176.camel@thinkpad \
    --to=info@gerd-stolpmann.de \
    --cc=caml-list@yquem.inria.fr \
    --cc=thelema314@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).