Re: [Caml-list] JIT & HLVM, LLVM - Mikkel Fahnøe Jørgensen

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

From: "Mikkel Fahnøe Jørgensen" <mikkel@dvide.com>
To: Jon Harrop <jon@ffconsultancy.com>
Cc: caml-list@yquem.inria.fr
Subject: Re: [Caml-list] JIT & HLVM, LLVM
Date: Wed, 30 Sep 2009 21:35:34 +0200	[thread overview]
Message-ID: <caee5ad80909301235x7317beb4x424df798d48c2a19@mail.gmail.com> (raw)
In-Reply-To: <200909301622.57394.jon@ffconsultancy.com>

> Using one work stealing deque per core is much more efficient than the work
> sharing queue you described for two reasons:
>
> 1. Less global syncronization.
>
> 2. Subtasks are likely to be executed on the core that spawned them, which
> improves cache efficiency.

You could also create larger task chunks or schedule multiple tasks on
a serial queue to obtain some benefits of cache coherency. I'm not
saying that this is a better way than the one suggest.

Another issue is latency vs. throughput.
If you have a lot of short transactions like in a trading system, you
want things done now, not more efficiently in a while, then you'd
rather add more cores to the system and wasting some.

> The parallel "for" loop you described is similar to the TPL's and leaves a lot
> to be desired. Specifically, you want to execute clusters of consecutive
> tasks on the same core for two reasons:

umm yes - it just a convenience for getting things done I suppose, but
again, you could split the loop into nested loops to increase
granularity. But in relation to my other recent post, there are also
map-reduce for addressing these issues which work well across network
nodes.

> The former is easily solved with the Cilk/TPL design by using recursive
> subdivision instead of the index sharing scheme you described because
> subtasks are likely to be executed on the same core.

I also wondered why GCD insisted on starting indices in order and even
waste time syncing on the index counter since there is no guarantee of
execution order, other than than start time. I guess it would be easy
to modify GCD with a different scheduler - it is about 10 lines of
code in the library, and you could set a property on a queue to
suggest a preferred scheduler. However, I think the benefit of the
current approach is exactly to ensure that as many early indices as
possible run in parallel - which you might want for low latency.

Another issue that concerns me more is how serial queues when more
work is added to the queue. If the tasks keeps eating on the serial
queue, it might starve other more important work, unless the serial
queue takes a break - of course the thread is preempted, but it will
not give priority to older concurrent tasks still waiting to be pulled
from the global concurrent queue. Again I think this is easily fixed,
and I might have not understood this fully.

For high performance computing of many similar computations like pixel
processing, I think one should also have a look at OpenCL, and
possibly some map reduce top layer that can distribute work to
multiple nodes - such a layer could be built with GCD without relying
heavily on GCD for anything other than coordination.

Again, here latency is important - the sooner you can ship off work to
OpenCL (which feeds work to graphic coprocessors) the more work gets
done rather than having work lining up on the same core to ensure GCD
maximizes its own cpu cache.

This also applies to network communication: if you can send data
sooner, the other end can start computing earlier, and especially if
risk delays.

So there is an argument both ways - I think GCD is more intended for
systems programming than number crunching (or raytracing :-)

I think one should also remember that GCD is a programming
abstraction, and that there are many ways to implement it, and as time
progress some changes could be made.

next prev parent reply	other threads:[~2009-09-30 19:56 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-27 17:18 David McClain
2009-09-27 17:22 ` [Caml-list] " Vincent Aravantinos
2009-09-27 17:35   ` Edgar Friendly
2009-09-27 18:51     ` Jon Harrop
2009-09-27 19:07       ` Edgar Friendly
2009-09-27 19:23         ` kcheung
2009-09-27 19:33           ` Jon Harrop
2009-09-27 21:10           ` Jon Harrop
2009-09-27 21:45             ` Vincent Aravantinos
2009-09-30  1:08             ` Mikkel Fahnøe Jørgensen
2009-09-30 11:57               ` Stéphane Glondu
2009-09-30 12:54                 ` Mikkel Fahnøe Jørgensen
2009-09-30 13:42                   ` Stéphane Glondu
2009-09-30 19:11                     ` Mikkel Fahnøe Jørgensen
2009-09-30 15:22               ` Jon Harrop
2009-09-30 19:35                 ` Mikkel Fahnøe Jørgensen [this message]
2009-09-28 12:25     ` Gerd Stolpmann
     [not found] <20090927214558.D40C5BC5C@yquem.inria.fr>
2009-09-27 21:59 ` CUOQ Pascal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=caee5ad80909301235x7317beb4x424df798d48c2a19@mail.gmail.com \
    --to=mikkel@dvide.com \
    --cc=caml-list@yquem.inria.fr \
    --cc=jon@ffconsultancy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).