mixing Ocaml with another GC-ed language

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* mixing Ocaml with another GC-ed language
@ 2000-02-25 13:37 STARYNKEVITCH Basile
  2000-02-29 10:24 ` Xavier Leroy
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: STARYNKEVITCH Basile @ 2000-02-25 13:37 UTC (permalink / raw)
  To: caml-list

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=us-ascii, Size: 1672 bytes --]

Hello All,

In a few weeks I will perhaps code in Ocaml inside a static code
analyzer tool written in C++ (don't ask me why.. it is a painful
subject [see the thread "convincing management to code in Ocaml" on
this same mailing list, that I started last year]).

It is not really plain C++ because I did code a (precise, mostly
copying, generational) garbage collector which is used in the
project. So Ocaml code will be called from (and will probably upcall
to) C++ code using my GC. So I do know my GC quite well (and studied
Ocaml's GC a bit also). My GC also support finalized objects, which it
does not move (so their address remain fixed).

Does any people have concrete experience mixing Ocaml with another
GC-ed language (e.g. Java or Common Lisp) inside the same program?

I do have my precise ideas on the problem (essentially, avoid mixing
pointers from both worlds, either by copying data or by using my
finalized C++ GCed objects which are not moved by my GC). But I will
be happy to hear from other people's experience. Any pitfalls to avoid
are interesting to hear.

The custom tag object (introduced in Ocaml3, see the Ocaml CVS
webserver) might also be helpful.

Regards

N.B. Any opinions expressed here are only mine, and not of my organization.
N.B. Les opinions exprimees ici me sont personnelles et n engagent pas le CEA.

---------------------------------------------------------------------
Basile STARYNKEVITCH   ----  Commissariat à l Energie Atomique 
DTA/LETI/DEIN/SLA * CEA/Saclay b.528 (p111f) * 91191 GIF/YVETTE CEDEX * France
phone: 1,69.08.60.55; fax: 1.69.08.83.95 home: 1,46.65.45.53
email: Basile point Starynkevitch at cea point fr 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: mixing Ocaml with another GC-ed language
  2000-02-25 13:37 mixing Ocaml with another GC-ed language STARYNKEVITCH Basile
@ 2000-02-29 10:24 ` Xavier Leroy
  2000-03-01  3:48 ` Nice feature Max Skaller
  2000-03-01  3:52 ` Interpreter vs hardware threads Max Skaller
  2 siblings, 0 replies; 23+ messages in thread
From: Xavier Leroy @ 2000-02-29 10:24 UTC (permalink / raw)
  To: STARYNKEVITCH Basile, caml-list

> It is not really plain C++ because I did code a (precise, mostly
> copying, generational) garbage collector which is used in the
> project. So Ocaml code will be called from (and will probably upcall
> to) C++ code using my GC. So I do know my GC quite well (and studied
> Ocaml's GC a bit also). My GC also support finalized objects, which it
> does not move (so their address remain fixed).
> 
> Does any people have concrete experience mixing Ocaml with another
> GC-ed language (e.g. Java or Common Lisp) inside the same program?

I can't say I have concrete experience, but I believe the following
should work.

So, we have two garbage-collected languages A and B, each managing its
own heap.  Assume both A and B support 1- finalized objects, and 
2- explicit registration of GC roots.

Then, to make an A object available to B, register the A pointer as a
GC root for A (so that A doesn't reclaim it), allocate in B a proxy
block containing the A pointer, and put a finalization function on the
proxy block that un-registers the A pointer with A's GC when the proxy
block is finalized.

In this approach, A objects are viewed from B as an abstract type:
B can't do anything with them except call A functions to operate on
them.  Allowing B to operate directly on A objects (e.g. read and
write an array) is very language-dependent and generally hard; better
go through foreign function calls.

> I do have my precise ideas on the problem (essentially, avoid mixing
> pointers from both worlds, either by copying data or by using my
> finalized C++ GCed objects which are not moved by my GC).

Copying is another option (that's what stubs generated by CamlIDL do,
for instance).  You get the benefit of having a concrete view on the
data structure in both languages.  But copying can be expensive on
large structures, and also loses sharing for mutable objects.

> The custom tag object (introduced in Ocaml3, see the Ocaml CVS
> webserver) might also be helpful.

Right.  It's a generalization of OCaml's finalized objects, allowing
you to attach to a Caml memory block not only a finalization function,
but also an equality function, a hashing function, and serialization /
deserialization functions (called by output_value and input_value).

- Xavier Leroy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Nice feature
  2000-02-25 13:37 mixing Ocaml with another GC-ed language STARYNKEVITCH Basile
  2000-02-29 10:24 ` Xavier Leroy
@ 2000-03-01  3:48 ` Max Skaller
  2000-03-01  3:52 ` Interpreter vs hardware threads Max Skaller
  2 siblings, 0 replies; 23+ messages in thread
From: Max Skaller @ 2000-03-01  3:48 UTC (permalink / raw)
  Cc: caml-list

Just thought I'd say -- the new error diagnostic produced for missing
cases
from matches  -- listing cases that are not tested for -- is superb.
This is saving me a lot of time. Thanks!

BTW: I'm about to find out about 'convincing management' to use ocaml.
I'm working in a C++ shop (C++ is mandatory for much of the production
code for reasons of both efficiency and binary compatibility),
with the task of writing a code generator -- which will (probably)
generate C++, but will (hopefully) be written in ocaml.

[I tend to push people to breaking point though -- I'm also writing the 
code using my literate programming tool :-]

-- 
John (Max) Skaller at OTT [Open Telecommications Ltd]
mailto:maxs@in.ot.com.au      -- at work
mailto:skaller@maxtal.com.au  -- at home

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Interpreter vs hardware threads
  2000-02-25 13:37 mixing Ocaml with another GC-ed language STARYNKEVITCH Basile
  2000-02-29 10:24 ` Xavier Leroy
  2000-03-01  3:48 ` Nice feature Max Skaller
@ 2000-03-01  3:52 ` Max Skaller
  2000-03-01 18:55   ` Michael Hicks
  2000-03-01 20:02   ` Stefan Monnier
  2 siblings, 2 replies; 23+ messages in thread
From: Max Skaller @ 2000-03-01  3:52 UTC (permalink / raw)
  Cc: caml-list

Does anyone know how fast the ocaml bytecode interpreter
'switches' between threads (that is, what is the scheduling overhead?)
Assume most of the threads are waiting (say, on a condition variable).
Asume LOTS of threads (thousands).

[Hardware threads are not fast enough]

-- 
John (Max) Skaller at OTT [Open Telecommications Ltd]
mailto:maxs@in.ot.com.au      -- at work
mailto:skaller@maxtal.com.au  -- at home




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-01  3:52 ` Interpreter vs hardware threads Max Skaller
@ 2000-03-01 18:55   ` Michael Hicks
  2000-03-01 20:02   ` Stefan Monnier
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Hicks @ 2000-03-01 18:55 UTC (permalink / raw)
  To: Max Skaller; +Cc: caml-list

> Does anyone know how fast the ocaml bytecode interpreter
> 'switches' between threads (that is, what is the scheduling overhead?)
> Assume most of the threads are waiting (say, on a condition variable).
> Asume LOTS of threads (thousands).
> 
> [Hardware threads are not fast enough]

Remembering back to the thread scheduler in Ocaml 2.00 (I assume it's the
same one), I recall that the context switch directly relates to how many
threads are waiting.  This is because at each CS point, every thread is
checked to see if its timeout has expired or has some I/O ready.  As a
result, performance degrades sharply as you add more threads.  With a small
number of threads (around 10) on a Pentium II 300 MHz machine, we measured
the context switch time to be about 100 microsecs.  See our INFOCOM 99 paper
for more info on the system we were measuring this in (our active network,
PLANet): http://www.cis.upenn.edu/~switchware/papers/infocom99-plan.ps.

Mike

-- 
Michael Hicks
Ph.D. Candidate, the University of Pennsylvania
http://www.cis.upenn.edu/~mwh            mailto://mwh@dsl.cis.upenn.edu
"The pessimist sees difficulty in every opportunity.
The optimist sees the opportunity in every difficulty." -- Winston Churchill

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-01  3:52 ` Interpreter vs hardware threads Max Skaller
  2000-03-01 18:55   ` Michael Hicks
@ 2000-03-01 20:02   ` Stefan Monnier
  2000-03-02 18:18     ` William Chesters
  1 sibling, 1 reply; 23+ messages in thread
From: Stefan Monnier @ 2000-03-01 20:02 UTC (permalink / raw)
  To: caml-list

>>>>> "Max" == Max Skaller <maxs@in.ot.com.au> writes:
> Asume LOTS of threads (thousands).

(insert-file "many-threads-bad-design")

> [Hardware threads are not fast enough]

[ I assume you mean OS threads, since hardware-supported threads
  are not very common. ]

I guess it depends on the OS and the pthreads implementation, but
it seems that they are usually pretty fast (i.e. fast enough that
people more or less stopped trying to make them faster).

        Stefan

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-01 20:02   ` Stefan Monnier
@ 2000-03-02 18:18     ` William Chesters
  2000-03-03 16:59       ` John Max Skaller
  2000-03-06 17:35       ` Xavier Leroy
  0 siblings, 2 replies; 23+ messages in thread
From: William Chesters @ 2000-03-02 18:18 UTC (permalink / raw)
  To: caml-list

Stefan Monnier writes:
 > >>>>> "Max" == Max Skaller <maxs@in.ot.com.au> writes:
 > > Asume LOTS of threads (thousands).
 > 
 > (insert-file "many-threads-bad-design")
 > 
 > > [Hardware threads are not fast enough]
 > 
 > [ I assume you mean OS threads, since hardware-supported threads
 >   are not very common. ]
 > 
 > I guess it depends on the OS and the pthreads implementation, but
 > it seems that they are usually pretty fast (i.e. fast enough that
 > people more or less stopped trying to make them faster).

I think you are being patronising from a position of innocent
ignorance.  Judging by Max's .sig he's doing embedded telecoms boxes,
something like a GSM HLR which juggles many thousands of concurrent
transactions in some ghastly protocol or other.

   Typically the logic of each transaction is encoded in an
unstructured FSM---the same way as the wretched committees "specify"
them---rendered literally in C/C++. That's very fiddly and leads to
bugs, hence loss of revenue on a telephone-number scale, and a
well-founded reluctance to add features.  The convenience of using
threads and a more sensible language would pay for some loss of
performance, but obviously a massive hit for using lots of threads is
unacceptable.

   IIRC, Linux native pthreads, for one, aren't at the moment meant to
be used in huge numbers---the flood-test performance of those Linux
Java VMs which map Java threads onto them supposedly suffers compared
with those that don't.  But Xavier would be able to tell us more about
that :).

   (Of course, one could always ask the ocaml team that won the ICFP
competition in such spectacular fashion to knock off an
ocaml-to-event-driven-FSM compiler ...)

   Max---if you end up using ocaml for embedded work I'd be very
interested to hear about it.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-02 18:18     ` William Chesters
@ 2000-03-03 16:59       ` John Max Skaller
  2000-03-06 17:35       ` Xavier Leroy
  1 sibling, 0 replies; 23+ messages in thread
From: John Max Skaller @ 2000-03-03 16:59 UTC (permalink / raw)
  To: William Chesters; +Cc: caml-list

William Chesters wrote:
 
>    Max---if you end up using ocaml for embedded work I'd be very
> interested to hear about it.

	I am using it at the moment to build 'code generation' tools  --
for that ocaml is definitely suitable.


-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-02 18:18     ` William Chesters
  2000-03-03 16:59       ` John Max Skaller
@ 2000-03-06 17:35       ` Xavier Leroy
  2000-03-06 23:11         ` John Max Skaller
  2000-03-07 13:19         ` Julian Assange
  1 sibling, 2 replies; 23+ messages in thread
From: Xavier Leroy @ 2000-03-06 17:35 UTC (permalink / raw)
  To: William Chesters, caml-list

>    IIRC, Linux native pthreads, for one, aren't at the moment meant to
> be used in huge numbers---the flood-test performance of those Linux
> Java VMs which map Java threads onto them supposedly suffers compared
> with those that don't.  But Xavier would be able to tell us more about
> that :).

My pleasure :-)  It is true that threads in LinuxThreads are not as
lightweight as some would like, mostly because they map 1-to-1 to
kernel processes, and the Linux kernel limits the number of processes
to 512 (for the super-user) or 256 (for normal users).  Also, kernel
scheduling takes time linear in the number of processes (just like the
OCaml bytecode-thread scheduler, which was strongly inspired by that
of the Linux kernel), so context switching times increase as more
threads are run.

More generally, there are two very different viewpoints on threads
that call for radically different implementations.

The first viewpoint is that threads are a convenient abstraction to
exploit fully some hardware features, such as multiple processors and
overlapping of I/O and computation.  You can exploit multiple
processors and overlap I/O without threads using e.g. communicating
Unix processes and asynchronous I/O.  But it's a pain to program;
threads make the programming of such applications much easier.

In this viewpoint, you never need huge numbers of threads.  For heavy
computation on a multiprocessor, N + 1 or N + 2 threads, where N is
the number of processors, delivers optimal throughput; you're not
going to run any faster with more threads.  For overlapped I/O on
PC-class hardware, 20 threads or so are plenty: if you overlap more
than 20 I/O requests at the same time, the disks and network cards
won't follow, and throughput will not be increased either.

Both LinuxThreads and to a lesser extent Caml threads were designed
with that kind of applications in mind.  Caml threads can't exploit
multiprocessors because the runtime system isn't safe
w.r.t. concurrent execution, but they have proven quite effective for
overlapped I/O, e.g. in the V6 intelligent Web proxy.

The other viewpoint is that threads are a code structuring facility,
making it easier to write programs that conceptually perform a lot of
things concurrently, e.g. in response to external stimuli.  In this
viewpoint, threads should be as lightweight as possible (starting a
thread shouldn't be much more expensive than calling a function), and
limited in number only by available memory.

The sad fact is that there doesn't exist any implementation technique
for threads that satisfy both viewpoints at once.  Very lightweight
threads do exist, see e.g. the call/cc-based threads of SML/NJ, but
entail significant performance penalties not only on I/O, but also on
the actual running speed of the sequential code.  "Heavyweight"
threads such as LinuxThreads or Win32 threads are very efficient
w.r.t. I/O, but are expensive to start and to context-switch.
Two-level thread libraries are a compromise that is appealing on
paper, but not that much "lighter" than pure kernel-based threads in
reality.

Add to this the fact that the primitives provided by the underlying OS
affect quite a lot thread libraries.  For instance, some Unixes
provide async I/O notification via signals, and that could be used to
speed up the Caml bytecode-thread scheduler, but not all Unixes
provide them.  Also, if we were certain that the underlying OS
provides native threads, we could build a two-level scheduler for
bytecode threads that would probably be more efficient.  But of course
we have to shoot for the lowest common denominator.

So, don't expect miracles from Caml threads, either bytecode or
system.  As Michael Hicks said, the current bytecode thread scheduler
could be improved to run in time proportional to the number of threads
waiting on I/O, rather than on the number of threads.  That would help
if most of your threads are blocked on mutexes, conditions or event
channels, but not if most of your threads perform I/O.

>    (Of course, one could always ask the ocaml team that won the ICFP
> competition in such spectacular fashion to knock off an
> ocaml-to-event-driven-FSM compiler ...)

I'm not sure that OCaml is the best input language for generating
FSMs, but, yes, generating FSMs from a high-level concurrent language
is an excellent approach.  You get both ultimate execution speed and a
readable, maintainable description of the program.  Have a look for
instance at Esterel (another glorious INRIA product :-))

- Xavier Leroy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-06 17:35       ` Xavier Leroy
@ 2000-03-06 23:11         ` John Max Skaller
  2000-03-07 13:19         ` Julian Assange
  1 sibling, 0 replies; 23+ messages in thread
From: John Max Skaller @ 2000-03-06 23:11 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: William Chesters, caml-list

Xavier Leroy wrote:

> I'm not sure that OCaml is the best input language for generating
> FSMs, but, yes, generating FSMs from a high-level concurrent language
> is an excellent approach.  You get both ultimate execution speed and a
> readable, maintainable description of the program.  Have a look for
> instance at Esterel (another glorious INRIA product :-))

	OK. Thanks for the explanation. 'Threads' are typically
delayed waiting for 'any' event, and occasionally, a specific
event (response to a request).

-- 
John (Max) Skaller, mailto:skaller@maxtal.com.au
10/1 Toxteth Rd Glebe NSW 2037 Australia voice: 61-2-9660-0850
checkout Vyper http://Vyper.sourceforge.net
download Interscript http://Interscript.sourceforge.net



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-06 17:35       ` Xavier Leroy
  2000-03-06 23:11         ` John Max Skaller
@ 2000-03-07 13:19         ` Julian Assange
  2000-03-08 20:12           ` Xavier Leroy
  2000-03-08 23:30           ` Max Skaller
  1 sibling, 2 replies; 23+ messages in thread
From: Julian Assange @ 2000-03-07 13:19 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: William Chesters, caml-list, proff

Xavier Leroy <Xavier.Leroy@inria.fr> writes:

> >    IIRC, Linux native pthreads, for one, aren't at the moment meant to
> > be used in huge numbers---the flood-test performance of those Linux
> > Java VMs which map Java threads onto them supposedly suffers compared
> > with those that don't.  But Xavier would be able to tell us more about
> > that :).
> 
> My pleasure :-)  It is true that threads in LinuxThreads are not as
> lightweight as some would like, mostly because they map 1-to-1 to
> kernel processes, and the Linux kernel limits the number of processes
> to 512 (for the super-user) or 256 (for normal users).  Also, kernel
> scheduling takes time linear in the number of processes (just like the
> OCaml bytecode-thread scheduler, which was strongly inspired by that
> of the Linux kernel), so context switching times increase as more
> threads are run.

I recently worked on a project in C that used a FSM to simulate
several thousand concurrent IO threads, as part of a concurrent
version of dig (able to recursively zone transfer all of *.au in under
three minutes). Due to the complicated, but inherently forward moving
nature of the operations involved (i.e do this. if there is no error
or timeout, do the next step, and so on for about 30 steps) together
with inter-state resources (i.e various queues, buffers and so on) this
ended up as an extremely large and unintuitive piece of code. A lot of
the added complexity was a result of having to explicity deallocate
data and maintain reference counts, and manually follow allocation
dependencies, which in the circumstances of multiple interdependent
event queues and states became extremely tedious.

What is the source of the linearity of thread context switches in
ocaml?  Are there ways to eliminate it? If so I'd be happy to have a
look at doing so.

Cheers,
Julian.

--
Stefan Kahrs in [Kah96] discusses the
   notion of completeness--programs which never go wrong can be
   type-checked--which complements Milner's notion of
   soundness--type-checked programs never go wrong [Mil78].

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-07 13:19         ` Julian Assange
@ 2000-03-08 20:12           ` Xavier Leroy
  2000-03-19  6:10             ` Julian Assange
  2000-03-08 23:30           ` Max Skaller
  1 sibling, 1 reply; 23+ messages in thread
From: Xavier Leroy @ 2000-03-08 20:12 UTC (permalink / raw)
  To: Julian Assange; +Cc: William Chesters, caml-list

> What is the source of the linearity of thread context switches in
> ocaml?

Well, the list of all threads is scanned at each context switch to
find runnable threads, determine whether we need to select(), etc.

> Are there ways to eliminate it? If so I'd be happy to have a
> look at doing so.

It's been on my "to do" list for a while, so feel free to give it a try...

The standard trick is to split the collection of threads into several
queues: a queue of runnable threads and a queue of threads waiting for
I/O or timer operations, for instance.  Kepp the latter sorted by
timer expiration date to find easily the timers that have expired.
Also, suspended threads (threads waiting on internal events such as
mutexes and conditions) are not in those two queues, but added back to
the runnable queue when woken up.

This should get the complexity of context switches down to O(N_io),
where N_io is the number of threads waiting on I/O.  You could do
better by using heavyweight threads or signal-based async I/O instead
of select(), but it becomes less portable.

- Xavier Leroy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-07 13:19         ` Julian Assange
  2000-03-08 20:12           ` Xavier Leroy
@ 2000-03-08 23:30           ` Max Skaller
  1 sibling, 0 replies; 23+ messages in thread
From: Max Skaller @ 2000-03-08 23:30 UTC (permalink / raw)
  To: Julian Assange; +Cc: Xavier Leroy, William Chesters, caml-list

Julian Assange wrote:

> What is the source of the linearity of thread context switches in
> ocaml?  Are there ways to eliminate it? If so I'd be happy to have a
> look at doing so.

One problem with this is that it only affects the bytecode interpreter.
It would be nice to have lightweight threads available for compiled
ocaml too.

The system I need to implement uses a (heavyweight) event dispatcher
thread and a few (heavyweight) worker threads. The events are
'executed',
in other words, the client code built as an object with callbacks.
This is very fast: there's no need to 'check' for a blocked IO
operation and restart the 'thread'. The thread is restarted directly
by the event.

This all works fine, except that it is very hard to code
the logic of a 'logical' thread of control. So I need to
'control invert' a client program, which is written in procedural style,
to hide the ugly event driven implementation.

-- 
John (Max) Skaller at OTT [Open Telecommications Ltd]
mailto:maxs@in.ot.com.au      -- at work
mailto:skaller@maxtal.com.au  -- at home

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-08 20:12           ` Xavier Leroy
@ 2000-03-19  6:10             ` Julian Assange
  0 siblings, 0 replies; 23+ messages in thread
From: Julian Assange @ 2000-03-19  6:10 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: William Chesters, caml-list, proff

Here <http://www.kegel.com/c10k.html> is an extremely interesting
article/resource list about io/thread scaling implementation.

I'm involved in a linguistic project that needs to make several
thousand concurrent connections to mine the internet for corpi. At the
moment neither ocaml lwt or heavier pthreads are capable of scaling
well enough to do this. If this problem were solved ocaml would make
an wonderful language for all kinds of massively concurrent io
interaction.

Cheers,
Julian.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-08 20:02 ` Xavier Leroy
@ 2000-03-12  1:45   ` Julian Assange
  0 siblings, 0 replies; 23+ messages in thread
From: Julian Assange @ 2000-03-12  1:45 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: Simon Peyton-Jones, caml-list, Norman Ramsey (E-mail), proff


A few more questions about threads:

        1) is there any way to wait on the termination of single or multiple threads?
           There doesn't seem to be and this seems to be a strange oversight.

        2) What happens to Thread.t variables onces the thread they reference
           has disappeared?

        3) related to the above, what happens to
           file descriptor backing of {in,out}_channel variables once they
           are no longer referenced (i.e are those files automatically closed?)

Cheers,
Julian.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Interpreter vs hardware threads
@ 2000-03-10 18:01 Manuel Fahndrich
  0 siblings, 0 replies; 23+ messages in thread
From: Manuel Fahndrich @ 2000-03-10 18:01 UTC (permalink / raw)
  To: 'Xavier Leroy', 'caml-list@inria.fr'
  Cc: Norman Ramsey (E-mail)

The advantage of heap closures in SML/NJ for efficient thread creation is
only present when the threads are super lightweight, that is completely
handled by the SML/NJ system via callcc/throw. If we want the threads to be
scheduled by the OS, because e.g. we are running on multi-processor
machines, or we don't want to block on IO, then each thread again needs some
C stack for the OS interaction, which destroys the heap closure advantage.

What we really need is an OS w/o a stack.

-Manuel

-----Original Message-----
From: Xavier Leroy [mailto:Xavier.Leroy@inria.fr]
Sent: Friday, March 10, 2000 12:00 AM
To: caml-redistribution@pauillac.inria.fr
Cc: Norman Ramsey (E-mail)
Subject: Re: Interpreter vs hardware threads

> | Very lightweight
> | threads do exist, see e.g. the call/cc-based threads of SML/NJ, but
> | entail significant performance penalties not only on I/O, but also on
> | the actual running speed of the sequential code.
> Is that really so, Xavier?  What percentage performance penalty do
> you think is involved?  1%?  10%?  Factor of 2?

For SML/NJ, I think their stackless execution model (all activation
records beging heap-allocated) entail a significant speed penalty for
sequential code -- at least 20%, I'd say.  But it sure gives you
blazingly fast thread creation.

The Concurrent Haskell approach that you describe is somehow less
lightweight than the SML/NJ approach.  In particular, thread creation
is going to be more expensive because of the cost of allocating and
setting up a new stack.  Also, the memory consumption of each thread
is going to be higher.  I guess the SML/NJ approach could accommodate
one million threads, while Concurrent Haskell sounds more in the 100 000s.
Still, I agree the solution you outline is pretty lightweight.

The expression "lightweight threads" is getting too vague.  I guess we
need several degrees of lightweightness, from SML/NJ's "ultra-lightweight
threads" to POSIX's "not so lightweight threads", with Haskell's
"pretty lightweight" threads and Caml's "lightweight, but no more"
threads...

> For potentially-blocking I/O operations it is true that there is
> some extra work to do, much like a context switch.  The pointers need
> to be saved in a safe place in case GC strikes, and a global lock should
> be released so that a new heavyweight thread can take over the business
> of running the lightweight threads if the I/O blocks.  But none of
> this seems really expensive in terms of % of computation time, does it?

You have to add to this the overhead on the I/O operation itself.  If
your threads build upon heavyweight threads for I/O, that should be
negligible.  (Say, one or two extra kernel context switches.)  But if
you have to perform I/O polling via select() (like OCaml bytecode
threads do for maximal portability), the overhead becomes significant.
(select() is one of the most expensive Unix system calls, in
particular because the kernel has to scan a huge set of file
descriptors just to determine which ones to wait upon; it's so bad
that Unix 98 introduced an alternate form, poll(), that does exactly
the same thing but with a more compact description of the f.d. sets.)

- Xavier Leroy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
  2000-03-07  8:30 Simon Peyton-Jones
@ 2000-03-08 20:02 ` Xavier Leroy
  2000-03-12  1:45   ` Julian Assange
  0 siblings, 1 reply; 23+ messages in thread
From: Xavier Leroy @ 2000-03-08 20:02 UTC (permalink / raw)
  To: Simon Peyton-Jones, caml-list; +Cc: Norman Ramsey (E-mail)

> | Very lightweight
> | threads do exist, see e.g. the call/cc-based threads of SML/NJ, but
> | entail significant performance penalties not only on I/O, but also on
> | the actual running speed of the sequential code.
> Is that really so, Xavier?  What percentage performance penalty do
> you think is involved?  1%?  10%?  Factor of 2?

For SML/NJ, I think their stackless execution model (all activation
records beging heap-allocated) entail a significant speed penalty for
sequential code -- at least 20%, I'd say.  But it sure gives you
blazingly fast thread creation.

The Concurrent Haskell approach that you describe is somehow less
lightweight than the SML/NJ approach.  In particular, thread creation
is going to be more expensive because of the cost of allocating and
setting up a new stack.  Also, the memory consumption of each thread
is going to be higher.  I guess the SML/NJ approach could accommodate
one million threads, while Concurrent Haskell sounds more in the 100 000s.
Still, I agree the solution you outline is pretty lightweight.

The expression "lightweight threads" is getting too vague.  I guess we
need several degrees of lightweightness, from SML/NJ's "ultra-lightweight
threads" to POSIX's "not so lightweight threads", with Haskell's
"pretty lightweight" threads and Caml's "lightweight, but no more"
threads...

> For potentially-blocking I/O operations it is true that there is
> some extra work to do, much like a context switch.  The pointers need
> to be saved in a safe place in case GC strikes, and a global lock should
> be released so that a new heavyweight thread can take over the business
> of running the lightweight threads if the I/O blocks.  But none of
> this seems really expensive in terms of % of computation time, does it?

You have to add to this the overhead on the I/O operation itself.  If
your threads build upon heavyweight threads for I/O, that should be
negligible.  (Say, one or two extra kernel context switches.)  But if
you have to perform I/O polling via select() (like OCaml bytecode
threads do for maximal portability), the overhead becomes significant.
(select() is one of the most expensive Unix system calls, in
particular because the kernel has to scan a huge set of file
descriptors just to determine which ones to wait upon; it's so bad
that Unix 98 introduced an alternate form, poll(), that does exactly
the same thing but with a more compact description of the f.d. sets.)

- Xavier Leroy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Interpreter vs hardware threads
@ 2000-03-07  9:06 Simon Peyton-Jones
  0 siblings, 0 replies; 23+ messages in thread
From: Simon Peyton-Jones @ 2000-03-07  9:06 UTC (permalink / raw)
  To: 'STARYNKEVITCH Basile'; +Cc: caml-redistribution

|     Simon> If anyone is interested, a working draft is at
|     Simon> http://research.microsoft.com/~simonpj/tmp/c--concurrency.ps.gz
|     Simon> I'd be delighted to get feedback.
| 
| I'm getting 
| 
| Error - File Not Found

That's strange.  I got that for a while too, but it seems to work now.
This one works too:

	http://research.microsoft.com/~simonpj/tmp/c--conc.ps.gz

Simon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Interpreter vs hardware threads
@ 2000-03-07  8:30 Simon Peyton-Jones
  2000-03-08 20:02 ` Xavier Leroy
  0 siblings, 1 reply; 23+ messages in thread
From: Simon Peyton-Jones @ 2000-03-07  8:30 UTC (permalink / raw)
  To: 'Xavier Leroy', caml-redistribution; +Cc: Norman Ramsey (E-mail)

| The sad fact is that there doesn't exist any implementation technique
| for threads that satisfy both viewpoints at once.  Very lightweight
| threads do exist, see e.g. the call/cc-based threads of SML/NJ, but
| entail significant performance penalties not only on I/O, but also on
| the actual running speed of the sequential code.  "Heavyweight"
| threads such as LinuxThreads or Win32 threads are very efficient
| w.r.t. I/O, but are expensive to start and to context-switch.

Is that really so, Xavier?  What percentage performance penalty do
you think is involved?  1%?  10%?  Factor of 2?

Very-lightweight threads typically have to do
a stack-overflow check at the start of each function, but apart from the
I don't see that they are necessarily less efficient than heavyweight
threads.
For potentially-blocking I/O operations it is true that there is
some extra work to do, much like a context switch.  The pointers need
to be saved in a safe place in case GC strikes, and a global lock should
be released so that a new heavyweight thread can take over the business
of running the lightweight threads if the I/O blocks.  But none of
this seems really expensive in terms of % of computation time, does it?

Concurrent Haskell takes the very-lightweight approach.  Stacks are
allocated in the heap and can grow arbitrarily.  A thread that is
blocked on a channel that is not reachable is garbage collected
(something that came up earlier in this conversation).

But I agree that there are compromises.  Notably, a call to a
C procedure that cannot block nor cause GC can be done more 
efficiently, so there's some incentive to have two kinds of
call, which is really a pain.  But the biggest compromise is that
it's hard to do preemption on very lightweight threads, esp if you
want to support accurate GC too.

Beyond Concurrent Haskell, I'm working on a specification for
very-lightweight
style concurrency in C--.  The idea is this: what is the primitive
support required from the C-- runtime that lets you implement lightweight
concurrency. You provide the scheduler, synchronisation, etc etc; C-- only
provides the stuff that lets you run and suspend a computation.
If anyone is interested, a working draft is at
	http://research.microsoft.com/~simonpj/tmp/c--concurrency.ps.gz
I'd be delighted to get feedback.

Simon

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Interpreter vs hardware threads
  2000-03-03  9:10 Edwin Young
@ 2000-03-06 13:42 ` William Chesters
  0 siblings, 0 replies; 23+ messages in thread
From: William Chesters @ 2000-03-06 13:42 UTC (permalink / raw)
  To: 'caml-list@pauillac.inria.fr'

Edwin Young writes:
 > From: William Chesters [mailto:williamc@paneris.org]
 > > Judging by Max's .sig he's doing embedded telecoms boxes,
 > > something like a GSM HLR which juggles many thousands of concurrent
 > > transactions in some ghastly protocol or other.
 > 
 > If that's the case, perhaps he should investigate Erlang. Since it was
 > designed by Ericsson specifically for embedded telephony apps, it would seem
 > an ideal fit. It does indeed support thousands of concurrent threads
 > (internal to the interpreter rather than OS-based).

Absolutely, and Erlang looked great to me, except (ironically) I
couldn't help feeling that the lack of imperative data structures was
going to be a bit of a pain.  However they were apparently going to
fix that.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
@ 2000-03-03 12:12 Juergen Pfitzenmaier
  0 siblings, 0 replies; 23+ messages in thread
From: Juergen Pfitzenmaier @ 2000-03-03 12:12 UTC (permalink / raw)
  To: caml-list

If William Chester is right with his guessing and Max Skaller wants
to use threads in ocamnl to handle thousands of concurrent phone calls,
then I would really go for hardware threads *with* ocaml. I would think
about how to translate (a subset of) ocaml into vhdl and put my program
into a fpga. I thought about such things some time ago and consider it
possible within reasonable time.

pfitzen

^ permalink raw reply	[flat|nested] 23+ messages in thread

* RE: Interpreter vs hardware threads
@ 2000-03-03  9:10 Edwin Young
  2000-03-06 13:42 ` William Chesters
  0 siblings, 1 reply; 23+ messages in thread
From: Edwin Young @ 2000-03-03  9:10 UTC (permalink / raw)
  To: 'caml-list@pauillac.inria.fr'

From: William Chesters [mailto:williamc@paneris.org]
> Judging by Max's .sig he's doing embedded telecoms boxes,
> something like a GSM HLR which juggles many thousands of concurrent
> transactions in some ghastly protocol or other.

If that's the case, perhaps he should investigate Erlang. Since it was
designed by Ericsson specifically for embedded telephony apps, it would seem
an ideal fit. It does indeed support thousands of concurrent threads
(internal to the interpreter rather than OS-based).

Regards,

--
Edwin Young

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Interpreter vs hardware threads
@ 2000-03-02 17:25 David Gurr
  0 siblings, 0 replies; 23+ messages in thread
From: David Gurr @ 2000-03-02 17:25 UTC (permalink / raw)
  To: williamc, caml-list, Xavier.Leroy

> >    (Of course, one could always ask the ocaml team that won the ICFP
> > competition in such spectacular fashion to knock off an
> > ocaml-to-event-driven-FSM compiler ...)
> 
> I'm not sure that OCaml is the best input language for generating
> FSMs, but, yes, generating FSMs from a high-level concurrent language
> is an excellent approach.  You get both ultimate execution speed and a
> readable, maintainable description of the program.  Have a look for
> instance at Esterel (another glorious INRIA product :-))
> 
> - Xavier Leroy
> 
And Lucid-Synchronie?  Any thoughts on JoCaml with Cilk-like threads?



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2000-03-24 15:45 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2000-02-25 13:37 mixing Ocaml with another GC-ed language STARYNKEVITCH Basile
2000-02-29 10:24 ` Xavier Leroy
2000-03-01  3:48 ` Nice feature Max Skaller
2000-03-01  3:52 ` Interpreter vs hardware threads Max Skaller
2000-03-01 18:55   ` Michael Hicks
2000-03-01 20:02   ` Stefan Monnier
2000-03-02 18:18     ` William Chesters
2000-03-03 16:59       ` John Max Skaller
2000-03-06 17:35       ` Xavier Leroy
2000-03-06 23:11         ` John Max Skaller
2000-03-07 13:19         ` Julian Assange
2000-03-08 20:12           ` Xavier Leroy
2000-03-19  6:10             ` Julian Assange
2000-03-08 23:30           ` Max Skaller
2000-03-02 17:25 David Gurr
2000-03-03  9:10 Edwin Young
2000-03-06 13:42 ` William Chesters
2000-03-03 12:12 Juergen Pfitzenmaier
2000-03-07  8:30 Simon Peyton-Jones
2000-03-08 20:02 ` Xavier Leroy
2000-03-12  1:45   ` Julian Assange
2000-03-07  9:06 Simon Peyton-Jones
2000-03-10 18:01 Manuel Fahndrich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).