From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: simonpj@microsoft.com
Received: from concorde.inria.fr (concorde.inria.fr [192.93.2.39]) by pauillac.inria.fr (8.7.6/8.7.3) with ESMTP id JAA13542 for <caml-redistribution@pauillac.inria.fr>; Tue, 7 Mar 2000 09:31:56 +0100 (MET)
Received: from mail4.microsoft.com (mail4.microsoft.com [131.107.3.122])
	by concorde.inria.fr (8.8.7/8.8.7) with SMTP id JAA12073
	for <caml-redistribution@pauillac.inria.fr>; Tue, 7 Mar 2000 09:31:49 +0100 (MET)
Received: from 157.54.9.103 by mail4.microsoft.com (InterScan E-Mail VirusWall NT); Tue, 07 Mar 2000 00:30:38 -0800 (Pacific Standard Time)
Received: by INET-IMC-04 with Internet Mail Service (5.5.2651.58)
	id <GF5YM9X2>; Tue, 7 Mar 2000 00:30:38 -0800
Message-ID: <C58E83D08FE21041A2E65B0F4051B9DB5B7C25@RED-MSG-19.redmond.corp.microsoft.com>
From: Simon Peyton-Jones <simonpj@microsoft.com>
To: "'Xavier Leroy'" <Xavier.Leroy@inria.fr>,
        caml-redistribution@pauillac.inria.fr
Cc: "Norman Ramsey (E-mail)" <nr@deas.harvard.edu>
Subject: RE: Interpreter vs hardware threads
Date: Tue, 7 Mar 2000 00:30:29 -0800 
X-Mailer: Internet Mail Service (5.5.2651.58)


| The sad fact is that there doesn't exist any implementation technique
| for threads that satisfy both viewpoints at once.  Very lightweight
| threads do exist, see e.g. the call/cc-based threads of SML/NJ, but
| entail significant performance penalties not only on I/O, but also on
| the actual running speed of the sequential code.  "Heavyweight"
| threads such as LinuxThreads or Win32 threads are very efficient
| w.r.t. I/O, but are expensive to start and to context-switch.

Is that really so, Xavier?  What percentage performance penalty do
you think is involved?  1%?  10%?  Factor of 2?

Very-lightweight threads typically have to do
a stack-overflow check at the start of each function, but apart from the
I don't see that they are necessarily less efficient than heavyweight
threads.
For potentially-blocking I/O operations it is true that there is
some extra work to do, much like a context switch.  The pointers need
to be saved in a safe place in case GC strikes, and a global lock should
be released so that a new heavyweight thread can take over the business
of running the lightweight threads if the I/O blocks.  But none of
this seems really expensive in terms of % of computation time, does it?

Concurrent Haskell takes the very-lightweight approach.  Stacks are
allocated in the heap and can grow arbitrarily.  A thread that is
blocked on a channel that is not reachable is garbage collected
(something that came up earlier in this conversation).

But I agree that there are compromises.  Notably, a call to a
C procedure that cannot block nor cause GC can be done more 
efficiently, so there's some incentive to have two kinds of
call, which is really a pain.  But the biggest compromise is that
it's hard to do preemption on very lightweight threads, esp if you
want to support accurate GC too.

Beyond Concurrent Haskell, I'm working on a specification for
very-lightweight
style concurrency in C--.  The idea is this: what is the primitive
support required from the C-- runtime that lets you implement lightweight
concurrency. You provide the scheduler, synchronisation, etc etc; C-- only
provides the stuff that lets you run and suspend a computation.
If anyone is interested, a working draft is at
	http://research.microsoft.com/~simonpj/tmp/c--concurrency.ps.gz
I'd be delighted to get feedback.

Simon