caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] Threading: Using and Building
@ 2004-04-12 15:19 John Goerzen
  2004-04-12 17:12 ` Xavier Leroy
  2004-04-13  0:37 ` Brian Hurt
  0 siblings, 2 replies; 7+ messages in thread
From: John Goerzen @ 2004-04-12 15:19 UTC (permalink / raw)
  To: caml-list

Hello,

I am looking at using multi-threaded programs in OCaml, but have some
questions:

*** Regarding the thread support itself

Chapter 24 of the OCaml documentaion says that "The threads library is
implemented by time-sharing on a single processor.  It will not take
advantage of multi-processor machines."  That's bad.

But then later on I notice that there are two threading options: system
threads and VM-level threads.  The introductory paragraph does not seem
to apply to system threads which, in other languages at least, do not
behave that way.  So I am rather puzzled about the actual level of
thread support is here.

*** Regarding building programs and libraries with threading support

My next concern is building programs and libraries to support threading.
Chapter 24 also mentions that programs must be linked with -thread, and
all object files compiled with -thread, if the final result is to
support threads.  Alternatively, -vmthread could be substituted.

So my questions are:

 1. Few libraries out there build themselves with -thread or -vmthread.
    Is that to be considered a bug?  Is there a workaround short
    of recompiling them?

 2. Can a library or object file built with -thread or -vmthread be used
    in a non-threaded program (one that does not use -thread or
    -vmthread?)  Can it be used in a threaded program that uses the
    *other* option?  (-thread vs. -vmthread)  Does the answer to this
    question vary depending on whether C code is used, or are there
    other things that can be done in the code to increase compatibility?

 3. I am assuming that -thread and -vmthread are not universally
    supported across OCaml platforms.  Would it be correct to assume,
    then, that one should check for the presence of -thread or -vmthread
    at build time?  Do there exist platforms for which neither are
    supported?

 4. If I am developing an application...  what can I do if it is
    multi-threaded but depends on libraries that are not built in a
    multi-threaded fashion on the user's system?  What if the libraries are
    built with the wrong type of threading (-thread vs. -vmthread)?  What
    if some libraries are built with one type and some with another?

 5. If I am developing a library... what must I do to make it maximally
    compatible with non-threaded applications and both types of
    threaded applications the user may be developing?

 6. What considerations must one take into account when developing C
    interfaces that will be used in multithreaded OCaml programs?

 7. Do any of the standard build systems (OCamlMake, configure.in, etc)
    take into account the above answers in a useful way for an
    application or library developer?

 8. How do I know which, if any, standard or third-party libraries
    installed on my system are threadsafe, and which threading model
    they support?

Thanks,

John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Threading: Using and Building
  2004-04-12 15:19 [Caml-list] Threading: Using and Building John Goerzen
@ 2004-04-12 17:12 ` Xavier Leroy
  2004-04-13  0:37 ` Brian Hurt
  1 sibling, 0 replies; 7+ messages in thread
From: Xavier Leroy @ 2004-04-12 17:12 UTC (permalink / raw)
  To: John Goerzen; +Cc: caml-list

> *** Regarding the thread support itself
> 
> Chapter 24 of the OCaml documentaion says that "The threads library is
> implemented by time-sharing on a single processor.  It will not take
> advantage of multi-processor machines."  That's bad.
> 
> But then later on I notice that there are two threading options: system
> threads and VM-level threads.  The introductory paragraph does not seem
> to apply to system threads which, in other languages at least, do not
> behave that way.

Even with system threads, the OCaml runtime system and GC isn't
thread-safe, so only one Caml thread can execute at any given time
(this is achieved via a "master mutex" on the runtime system).  So,
the introductory sentence holds (mostly) for system threads as well:
you will not get parallel execution of Caml threads.  Although on a
multiprocessor with system threads, it is possible to have one Caml
thread running in parallel with one or several C threads.

> *** Regarding building programs and libraries with threading support
> 
> My next concern is building programs and libraries to support threading.
> Chapter 24 also mentions that programs must be linked with -thread, and
> all object files compiled with -thread, if the final result is to
> support threads.  Alternatively, -vmthread could be substituted.

These requirements are a bit more stringent than what is actually needed.
The real requirements are:
- The -thread or -vmthread option must be given at link-time
  (so that different versions of the standard library and Unix library
   can be picked up)
- The -thread or -vmthread option must be given when compiling a
  source file that uses modules from the thread library (e.g. Thread, Mutex,
  Event, etc).

Actually, if you do not meet one of these requirements, you'll get
"file not found" errors when compiling or linking.  So, don't worry
about forgetting -thread or -vmthread, the compiler will remind you.

> So my questions are:
> 
>  1. Few libraries out there build themselves with -thread or -vmthread.
>     Is that to be considered a bug?  Is there a workaround short
>     of recompiling them?

As a consequence of the requirements above, you can use a library not
compiled with -thread in a program that uses threads.  So, it's not a bug
and you shouldn't worry about the library not being compiled with -thread.
(But beware about non-thread safe libraries, see below.)

>  2. Can a library or object file built with -thread or -vmthread be used
>     in a non-threaded program (one that does not use -thread or
>     -vmthread?)  Can it be used in a threaded program that uses the
>     *other* option?  (-thread vs. -vmthread)  Does the answer to this
>     question vary depending on whether C code is used, or are there
>     other things that can be done in the code to increase compatibility?

Yes.  Yes.  No.  No.

>  3. I am assuming that -thread and -vmthread are not universally
>     supported across OCaml platforms.  Would it be correct to assume,
>     then, that one should check for the presence of -thread or -vmthread
>     at build time?  Do there exist platforms for which neither are
>     supported?

My experience is that -vmthread works on basically all variants of Unix,
-thread works on Windows (with the MS and MinGW ports) and most of the
recent Unix derivatives.

I'm not sure about Windows with the Cygwin port.  It could be that
neither -thread nor -vmthread works on this platform due to
insufficient emulation of Unix syscalls and POSIX threads, but maybe not.

Oh, yes, Mac OS 9 and earlier used to support neither -thread nor
-vmthread, but they now rest in peace.

>  4. If I am developing an application...  what can I do if it is
>     multi-threaded but depends on libraries that are not built in a
>     multi-threaded fashion on the user's system?  What if the libraries are
>     built with the wrong type of threading (-thread vs. -vmthread)?  What
>     if some libraries are built with one type and some with another?

See question 2.  There should be no problems.

>  5. If I am developing a library... what must I do to make it maximally
>     compatible with non-threaded applications and both types of
>     threaded applications the user may be developing?

Give it a thread-safe (reentrant) API.  That is, avoid global storage.
Developers of threaded applications will bless you.

>  6. What considerations must one take into account when developing C
>     interfaces that will be used in multithreaded OCaml programs?

By default, the "master mutex" that ensures single-threaded execution
of Caml code also ensures single-threaded execution of the C code
called from Caml.  That makes it mostly safe to use non-thread-aware C
code in threaded Caml applications.  

It is possible for C/Caml interface code to explicitly relinquish the
master mutex using the enter_blocking_section() and leave_blocking_section()
functions.  (Google for these names, you should find earlier posts of
mine describing how to use them.)  This enables another Caml thread to
execute concurrently.

>  7. Do any of the standard build systems (OCamlMake, configure.in, etc)
>     take into account the above answers in a useful way for an
>     application or library developer?

No idea.

>  8. How do I know which, if any, standard or third-party libraries
>     installed on my system are threadsafe, and which threading model
>     they support?

As in most other languages: their documentation should say so clearly,
but in general one has to read their source code to figure this out.

- Xavier Leroy

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Threading: Using and Building
  2004-04-12 15:19 [Caml-list] Threading: Using and Building John Goerzen
  2004-04-12 17:12 ` Xavier Leroy
@ 2004-04-13  0:37 ` Brian Hurt
  2004-04-13  2:15   ` John Goerzen
  1 sibling, 1 reply; 7+ messages in thread
From: Brian Hurt @ 2004-04-13  0:37 UTC (permalink / raw)
  To: John Goerzen; +Cc: caml-list

On Mon, 12 Apr 2004, John Goerzen wrote:

> Hello,
> 
> I am looking at using multi-threaded programs in OCaml, but have some
> questions:
> 
> *** Regarding the thread support itself
> 
> Chapter 24 of the OCaml documentaion says that "The threads library is
> implemented by time-sharing on a single processor.  It will not take
> advantage of multi-processor machines."  That's bad.
> 
> But then later on I notice that there are two threading options: system
> threads and VM-level threads.  The introductory paragraph does not seem
> to apply to system threads which, in other languages at least, do not
> behave that way.  So I am rather puzzled about the actual level of
> thread support is here.

The threading is all user-space threading.  It doesn't take advantage of 
multiple CPUs, doesn't use system threads, if one thread blocks they all 
block, etc.

There are two problems with multithreading.  First, it makes the GC more
difficult and more costly.  Currently, the GC runs in the same system
thread as everything else, and thus it doesn't have synchronization
issues.  In a multi-threaded environment, you have synchronization issues
which slows things down.  Second, most people don't know how to write
safe, correct, efficient multithreaded programs.  It's harder than it
looks.  I think something like MPI between seperate processes would be a 
better way to take advantage of multi-processors.

On Unix, you don't lose much doing multi-proccess (forks) instead of
threads.  Switching between processes in Unix isn't any slower than
switching between threads within a process.  But I've seen benchmarks
which show that task switching between seperate processes was
signifigantly slower than switching between threads within a process on
NT4 (and that switching between threads on NT4 was about as expensive as 
switching between processes on Unix).  They may have fixed this, or they 
may not have.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Threading: Using and Building
  2004-04-13  0:37 ` Brian Hurt
@ 2004-04-13  2:15   ` John Goerzen
  2004-04-13  4:54     ` Brian Hurt
  2004-04-13  7:44     ` Benjamin Geer
  0 siblings, 2 replies; 7+ messages in thread
From: John Goerzen @ 2004-04-13  2:15 UTC (permalink / raw)
  To: Brian Hurt; +Cc: caml-list

On Mon, Apr 12, 2004 at 07:37:09PM -0500, Brian Hurt wrote:
> The threading is all user-space threading.  It doesn't take advantage of 
> multiple CPUs, doesn't use system threads, if one thread blocks they all 
> block, etc.

Which explains the existance of the separate Unix library for
threadying; I'm assuming that this library uses non-blocking fd's...

> There are two problems with multithreading.  First, it makes the GC more
> difficult and more costly.  Currently, the GC runs in the same system
> thread as everything else, and thus it doesn't have synchronization
> issues.  In a multi-threaded environment, you have synchronization issues
> which slows things down.  Second, most people don't know how to write
> safe, correct, efficient multithreaded programs.  It's harder than it
> looks.  I think something like MPI between seperate processes would be a 
> better way to take advantage of multi-processors.

I know little about the GC issue, save that two other garbage-collected
languages I have used (Java and Python) are able to work around it.

Python does have a global interpreter lock that protects sections of
Python code in certain situations, I believe.  However, Python uses true
threads, so it is fine to call blocking system calls in threads (it will
not block the entire program).

You are correct that there are gotchas with multithreaded programming.
However, there are some significant advantages.  One is that there is no
need to establish lines of communication between one process and
another -- that saves a significant amount of work right there.  It's
also pretty easy in most systems to fire up a thread to run a function
and check for (or be notified about) its result later.

OCaml's design, minimizing the need to update variables, seems to lend
itself nicely towards threadsafe programming.

I think that, overall, the fact that "some people can't do x correctly"
is not a reason to make x unsupported in a language.  After all, we've
seen programs all over that have buggy calls to open(2) -- especially
those writing in /tmp.  That doesn't mean we have to turn off open(2) on
our systems :-)

The alternatives are not necessarily easier anyway.  Forking and
communicating over pipes involves the development of a streaming
protocol and a way to send data across the pipe and decode it at the
other end; existing objects cannot just be reused.  Synchronization can
still be an issue, especially regarding open file descriptors, user
interfaces, and disk files.  Deadlock can occur on pipes due to any
number of reasons as well.

> On Unix, you don't lose much doing multi-proccess (forks) instead of
> threads.  Switching between processes in Unix isn't any slower than
> switching between threads within a process.  But I've seen benchmarks

My primary concern here lies not with the performance difference between
forking and threads in the general case, but rather the design
differences.  For certain tasks, threading is just easier.  And it looks
like OCaml has some serious limitations on threading.

-- John

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Threading: Using and Building
  2004-04-13  2:15   ` John Goerzen
@ 2004-04-13  4:54     ` Brian Hurt
  2004-04-13  7:44     ` Benjamin Geer
  1 sibling, 0 replies; 7+ messages in thread
From: Brian Hurt @ 2004-04-13  4:54 UTC (permalink / raw)
  To: John Goerzen; +Cc: caml-list

On Mon, 12 Apr 2004, John Goerzen wrote:

> I know little about the GC issue, save that two other garbage-collected
> languages I have used (Java and Python) are able to work around it.

You can work around it, it just costs.  You end up implementing barriers- 
you can implement either read barriers or write barriers.  Write barriers 
would probably work better for Ocaml.  But basically, every modification 
of heap-allocated objects becomes a synchronization point with the GC.  
This will slow you down.

See:
http://www.amazon.com/exec/obidos/tg/detail/-/0471941484/qid=1081826531/sr=1-1/ref=sr_1_1_xs_stripbooks_i1_xgl14/104-8572163-9037569?v=glance&s=books

Or:
http://www.iecc.com/gclist/GC-faq.html

for more detail.

> You are correct that there are gotchas with multithreaded programming.
> However, there are some significant advantages.  One is that there is no
> need to establish lines of communication between one process and
> another -- that saves a significant amount of work right there.  It's
> also pretty easy in most systems to fire up a thread to run a function
> and check for (or be notified about) its result later.

The ability to have easy communication between threads is exactly the 
problem- you can have *accidental* and uncontrolled communication between 
threads- aka race conditions.  Badly applied synchronization can lead to 
deadlocks and livelocks.  Finding these to fix them can be royal pains.

I'd love to see Jocaml picked up again.  Enlist the type system to prevent 
these problems.

> 
> OCaml's design, minimizing the need to update variables, seems to lend
> itself nicely towards threadsafe programming.

Almost, but not quite.

> 
> I think that, overall, the fact that "some people can't do x correctly"
> is not a reason to make x unsupported in a language.  After all, we've
> seen programs all over that have buggy calls to open(2) -- especially
> those writing in /tmp.  That doesn't mean we have to turn off open(2) on
> our systems :-)

No, but we do provide mktemp() and mkstemp() functions to try and fix 
these problems.  And a number of features we do just drop outright- 
pointer arithmetic, for example.

There are better ways of doing things and there are worse ways.  If there 
is a better way that solves the problem, one that doesn't have the failure 
modes, then I'm all for it.  It's a trade-off- signifigantly fewer bugs 
for slightly less flexibility/ease.  

> 
> The alternatives are not necessarily easier anyway.  Forking and
> communicating over pipes involves the development of a streaming
> protocol and a way to send data across the pipe and decode it at the
> other end; existing objects cannot just be reused.  Synchronization can
> still be an issue, especially regarding open file descriptors, user
> interfaces, and disk files.  Deadlock can occur on pipes due to any
> number of reasons as well.

MPI gives you the wrappers around pipes.  Deadlocking can occur with 
mutexs and synchonized as well.  What can't happen is accidental sharing 
aka race conditions.  Jocaml solves the deadlocking/livelocking problems, 
but a) isn't maintained currently, and b) doesn't distribute across a 
network like MPI (no beowulf clusters).

> 
> > On Unix, you don't lose much doing multi-proccess (forks) instead of
> > threads.  Switching between processes in Unix isn't any slower than
> > switching between threads within a process.  But I've seen benchmarks
> 
> My primary concern here lies not with the performance difference between
> forking and threads in the general case, but rather the design
> differences.  For certain tasks, threading is just easier.  And it looks
> like OCaml has some serious limitations on threading.
> 

Having done both, I'd say MPI is easier to make *correct*, and Jocaml 
better still.

-- 
"Usenet is like a herd of performing elephants with diarrhea -- massive,
difficult to redirect, awe-inspiring, entertaining, and a source of
mind-boggling amounts of excrement when you least expect it."
                                - Gene Spafford 
Brian

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Threading: Using and Building
  2004-04-13  2:15   ` John Goerzen
  2004-04-13  4:54     ` Brian Hurt
@ 2004-04-13  7:44     ` Benjamin Geer
  2004-04-13 19:47       ` David Brown
  1 sibling, 1 reply; 7+ messages in thread
From: Benjamin Geer @ 2004-04-13  7:44 UTC (permalink / raw)
  To: John Goerzen; +Cc: Brian Hurt, caml-list

John Goerzen wrote:
> On Mon, Apr 12, 2004 at 07:37:09PM -0500, Brian Hurt wrote:
> 
>>The threading is all user-space threading.  It doesn't take advantage of 
>>multiple CPUs, doesn't use system threads, if one thread blocks they all 
>>block, etc.

That's not true; see Xavier's explanation:

http://caml.inria.fr/archives/200211/msg00274.html

Specifically, 'while a thread is blocked on a network read, other 
threads may proceed'.

> However, Python uses true
> threads, so it is fine to call blocking system calls in threads (it will
> not block the entire program).

This is also the case in Caml; see above.

Ben

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Caml-list] Threading: Using and Building
  2004-04-13  7:44     ` Benjamin Geer
@ 2004-04-13 19:47       ` David Brown
  0 siblings, 0 replies; 7+ messages in thread
From: David Brown @ 2004-04-13 19:47 UTC (permalink / raw)
  To: Benjamin Geer; +Cc: John Goerzen, Brian Hurt, caml-list

On Tue, Apr 13, 2004 at 08:44:47AM +0100, Benjamin Geer wrote:

> >However, Python uses true
> >threads, so it is fine to call blocking system calls in threads (it will
> >not block the entire program).
> 
> This is also the case in Caml; see above.

It is important for the wrapper stub to be written correctly, though.

Dave

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2004-04-13 19:47 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-04-12 15:19 [Caml-list] Threading: Using and Building John Goerzen
2004-04-12 17:12 ` Xavier Leroy
2004-04-13  0:37 ` Brian Hurt
2004-04-13  2:15   ` John Goerzen
2004-04-13  4:54     ` Brian Hurt
2004-04-13  7:44     ` Benjamin Geer
2004-04-13 19:47       ` David Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).