On Thu, Jul 31, 2008 at 7:01 AM, erik quanstrom <quanstro@quanstro.net> wrote:
> I've been writing a lot of Erlang code lately, and I keep thinking about,
> but not having too much time to do much about, wanting to have a runtime for
> the libthread "threads" that could auto-schedule them to libthread "procs",
> in much the same way Haskell "sparks" may end up real threads, or Erlang
> processes, might run in parallel.

the model is that there may be any number of procs sharing memory,
channels, etc.  each proc has at least one thread.  threads have their
own stack and one one-at-a time.

since threads run one at a time and have a few, well-known calls that
implicitly schedule, one often needs no locking.  this is like the big
kernel lock in linux.  and so in general converting threadcreate to
proccreate will break programs which rely on the implicit mutex
between threads to keep memory accesses from overlapping.

Channels are locked though right, implicitly, such that they can be used as a "safe" communication mechanism between proccreate'd contexts and threadcreate'd contexts in a safe way. 

Therefore, if you're not sharing memory between threads/procs, the model holds up just fine for the CSP style concurrency it was intended to implement.  Basically "don't share", and "just do message passing".

Obviously depending on the C calls involved, some are re-entrant and some are not, and that makes a rather large difference too.  You don't want to call strtok from two different threads (unless you have thread local storage available, then it might be safe).
 


my personal and uninformed opinion is that it's better to be explicit
about resource sharing and just lock critical sections—or better yet,
don't overlap data use.  use channels to transfer data.  if there's no
overlapping data access then proccreate and threadcreate may usually
be interchangable.

This is exactly what I'm referring to.
 


another personal opinion is that parallelism can be a
"performance hack".  however, when the speed differences
between various resources (e.g. disk drive seek vs anything else)
is great enough, the difference between working and broken
can be parallelism.

I think if you take the definition of concurrent processes, that is any set of processes with no defined sequence or ordering to them running, and you have them cooperating to produce a solution AND they can run at the same time, you're talking about a form of parallelism.  For many people, designing code that's concurrent is enough to express the solution in a sane way, without tracking mutexes and locks and shared resources, and parallelism is just "icing".

For some "pure parallelism" is the end goal.  And there's often a very different approach to getting that (OpenMP, Parallel Haskell vs Concurrent Haskell.  Erlang running with SMP enabled.)


Dave
 


- erik