caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
From: Xavier Leroy <xavier.leroy@inria.fr>
To: Lauri Alanko <la@iki.fi>
Cc: caml-list@inria.fr
Subject: Re: [Caml-list] Why systhreads?
Date: Mon, 25 Nov 2002 11:01:33 +0100	[thread overview]
Message-ID: <20021125110133.A12077@pauillac.inria.fr> (raw)
In-Reply-To: <20021123090806.GA633@la.iki.fi>; from la@iki.fi on Sat, Nov 23, 2002 at 11:08:06AM +0200

It seems that the annual discussion on threads started again.  Allow
me to deliver again my standard lecture on this topic.

Threads have at least three different purposes:

1- Parallelism on shared-memory multiprocessors.
2- Overlapping I/O and computation (while a thread is blocked on a network
   read, other threads may proceed).
3- Supporting the "coroutine" programming style
   (e.g. if a program has a GUI but performs long computations,
    using threads is a nicer way to structure the program than
    trying to wrap the long computation around the GUI event loop).

The goals of OCaml threads are (2) and (3) but not (1) (for reasons
that I'll get into later), with historical emphasis on (2) due to the
MMM (Web browser) and V6 (HTTP proxy) applications.

Pure user-level scheduling, or equivalently control operators (call/cc),
provide (3) but not (2).

To achieve (2) with a user-level scheduler such as OCaml's bytecode
thread library requires all sorts of hacks, such as non-blocking I/O
and select() under Unix, plus wrapping of all I/O operations so that
they call the user-level scheduler in cases where they are about to
block.  (Otherwise, the whole process would block, and not just the
calling thread.)

Not only this is ugly (read the sources of the bytecode thread library
to get an idea) and inefficient, but it interacts very poorly with
external libraries written in C.  For instance, deep inside the C
implementation of gethostbyname(), there are network reads that can
block; there is no way to wrap these with scheduler calls, short of
rewriting gethostbyname() entirely.

To make things worse, non-blocking I/O is done completely differently
under Unix and under Win32.  I'm not even sure Win32 provides enough
support for async I/O to write a real user-level scheduler.

Another issue with user-level threads, at least in native code, is the
handling of the thread stacks, especially if we wish to have thread
stacks that start small and grow on demand.  It can be done, but is
highly processor- and OS-dependent.  (For instance, stack handling on
the IA64 is, ah, peculiar: there are actually two stacks that grow in
opposite directions within the same memory area...)

One aspect of wisdom is to know when not to do something oneself, but
leave it to others.  Scheduling I/O and computation concurrently, and
managing process stacks, is the job of the operating system.  Trying
to do it entirely in a user-mode program is just not reasonable.
(For another reference point, see Java's move away from "green
threads" and towards system threads.)

What about parallelism on SMP machines?  The main issue here is that
the runtime system, and in particular the garbage collector and memory
manager, must be MP-safe.  This means minimizing global state, and
introducing locking around accesses to shared resources.  If done
naively (e.g. locking at each heap allocation), this can be extremely
costly; it also complicates the runtime system a lot.  Finally,
garbage collection can become a limiting factor if it is done in the
"stop the world" fashion (all threads stop during GC); a concurrent GC
avoids this problem, but adds tremendous complexity.

(Of course, all this SMP support stuff slows down the runtime system
even if there is only one processor, which is the case for almost all
our users...)

All this has been done before in the context of Caml: that was
Damien Doligez's Concurrent Caml Light system, in the early 90s.
Indeed, the incremental major GC that we have in OCaml is a
simplification of Damien's concurrent GC.  If you're interested, have
a look at Damien's publications.

Why was Concurrent Caml Light abandoned?  Too complex; too hard to debug
(despite the existence of a machine-checked proof of correctness);
and dubious practical interest.  Shared-memory multiprocessors have
never really "taken off", at least in the general public.  For large
parallel computations, clusters (distributed-memory systems) are the
norm.  For desktop use, monoprocessors are plenty fast.  Even if you
have a 4-processor SMP machine, it isn't clear whether you should
write your program using shared memory or using message passing -- the
latter is slightly more expensive, but scales to clusters...

What about hyperthreading?  Well, I believe it's the last convulsive
movement of SMP's corpse :-)  We'll see how it goes market-wise.  At
any rate, the speedups announced for hyperthreading in the Pentium 4
are below a factor of 1.5; probably not enough to offset the overhead
of making the OCaml runtime system thread-safe.  

In summary: there is no SMP support in OCaml, and it is very very
unlikely that there will ever be.  If you're into parallelism, better
investigate message-passing interfaces.

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


  parent reply	other threads:[~2002-11-25 10:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-11-23  9:08 Lauri Alanko
2002-11-24  7:36 ` Sven Luther
2002-11-24 17:41   ` Chris Hecker
2002-11-24 18:12     ` Basile STARYNKEVITCH
2002-11-24 21:10       ` Christopher Quinn
2002-11-24 17:14 ` Vitaly Lugovsky
2002-11-24 17:18   ` Lauri Alanko
2002-11-24 18:27   ` Dmitry Bely
2002-11-24 23:14     ` Vitaly Lugovsky
2002-11-27 14:33       ` Tim Freeman
2002-11-29 13:25         ` Vitaly Lugovsky
2002-11-25 10:01 ` Xavier Leroy [this message]
2002-11-25 14:20   ` Markus Mottl
2002-11-25 19:01   ` Blair Zajac
2002-11-25 21:06     ` james woodyatt
2002-11-25 22:20       ` Chris Hecker
2002-11-26  6:49         ` Sven Luther
2002-11-27 13:12         ` Damien Doligez
2002-11-27 18:04           ` Chris Hecker
2002-11-27 21:04             ` Gerd Stolpmann
2002-11-27 21:45               ` [Caml-list] Calling ocaml from external threads Quetzalcoatl Bradley
2002-11-26  9:02     ` [Caml-list] Why systhreads? Xavier Leroy
2002-11-26  9:29       ` Sven Luther
2002-11-26  9:34         ` Xavier Leroy
2002-11-26  9:39           ` Sven Luther
2002-11-26 18:42       ` Chris Hecker
2002-11-26 19:04   ` Dave Berry
2002-11-27  0:07   ` Lauri Alanko
2002-11-26 19:23 Gregory Morrisett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021125110133.A12077@pauillac.inria.fr \
    --to=xavier.leroy@inria.fr \
    --cc=caml-list@inria.fr \
    --cc=la@iki.fi \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).