caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed
* [Caml-list] gc question: thread stacks, fibers, etc.
@ 2002-10-04 10:25 Chris Hecker
  2002-10-06 18:13 ` [Caml-list] Coroutines Jerome Vouillon
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Chris Hecker @ 2002-10-04 10:25 UTC (permalink / raw)
  To: caml-list; +Cc: buzzard


I'm looking at implementing fibers/coroutines/cooperative-threads 
(described below for reference) in ocaml, and I've run into a small issue, 
a question, and a couple confusions.

First, the issue:  Because they're cooperative, there's no way a fiber can 
run unless somebody tells it to do so using its handle/Fiber.t.  This means 
that if a fiber's handle goes out of scope, the entire fiber, including its 
stack, is garbage and should be able to be collected (unlike a thread, 
where even if the Thread.t is gone, the thread is still 
running/scheduled).  However, I can't figure out if it's possible to 
implement fibers so they work like this.

In more detail: a fiber is going to need a stack, as mentioned 
below.  Following the systhreads code (win32.c & posix.c), I'll allocate 
the stack outside the caml heap.  This stack will get scanned by the gc 
using the scan_roots_hook from C, just like in the systhreads code.  The 
problem is, this design treats the stack as a root, so the stack itself 
can't be gc'd, even if it's orphaned.

Is there a way to make the stack just another heap object?  I could easily 
put the pointer to the stack in an Abstract_tag block, but then the stack 
won't get scanned by the gc at all, which is bad.  It seems like what I 
want is another kind of block or another function in the custom_operations 
for "let me manually scan the opaque data in this block, which itself 
points to data that contains values, and if it's all garbage, finalize me".

Does that make sense?  The behavior I want is to not have to keep track of 
the fibers if I don't care if they get gc'd, just like other objects.

My next question is a clarification of the way caml works:  the stack can 
contain values, but never blocks, right?  In C terms, it can contain 
pointers but not actual objects that other people can point to?  All actual 
boxed data is on the heap, right?  So, I can just delete a fiber's stack 
and it'll work?  Put it yet another explicit way, the reason you have to 
register roots when you work with values on the stack in C is because you 
don't want the gc to free things you're pointing to, not because of any 
funkiness of others pointing into your stack.

Finally, a couple confusions:

- final_fun in caml_thread_handle in win32.c is not used?

- HANDLE in caml_thread_handle in win32 seems odd, since it's scanned by 
the gc, yet it's a windows handle.  who's to say that it won't randomly end 
up as a value that looks like a pointer into one of the caml heaps?

Chris


Fibers/Coroutines/Cooperative-Threads
You can read a bunch about these on the net.  Basically, they're 
lightweight nonpreemtive threads.  They allow you to yield back to the 
creator (or a general scheduler) anywhere in the fiber with an explicit 
yield(), and when the fiber is resumed it returns from the yield and 
continues on.  They're very useful for breaking up long computations where 
real threading is too complex to be worthwhile, but where breaking up the 
computation into chunks is a mess (and generates a heinous FSM).  Game AIs 
are a good example, network protocol negotiations are another.  In C, you 
implement them by mallocing a stack and with a little snippet of asm, or 
mucking with the jmpbuf fields.  They're pretty trivial to get working 
fairly robustly (how you grow the stack is an issue).  I guess they're sort 
of related to continuations (or at least, you can implement coroutines if 
your language supports continuations), but they don't seem to be as heavy-duty.

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Caml-list] Coroutines
  2002-10-04 10:25 [Caml-list] gc question: thread stacks, fibers, etc Chris Hecker
@ 2002-10-06 18:13 ` Jerome Vouillon
  2002-10-06 19:46   ` Chris Hecker
  2002-10-12 15:58   ` John Max Skaller
  2002-10-12 16:33 ` [Caml-list] gc question: thread stacks, fibers, etc John Max Skaller
  2002-10-13  8:32 ` Xavier Leroy
  2 siblings, 2 replies; 8+ messages in thread
From: Jerome Vouillon @ 2002-10-06 18:13 UTC (permalink / raw)
  To: Chris Hecker; +Cc: caml-list

On Fri, Oct 04, 2002 at 03:25:44AM -0700, Chris Hecker wrote:
> I'm looking at implementing fibers/coroutines/cooperative-threads 
> (described below for reference) in ocaml, and I've run into a small issue, 
> a question, and a couple confusions.

I just spent a few hours implementing a small coroutine library.  It
is fully written in Ocaml.  Below is a quick description.  Would it
satisfy your needs ? I can send you a copy, or make it available on the
Web if you like.

-- Jérôme

A coroutine returning values of type 'a has type 'a output.

    type 'a output

You can spawn it.  You then get an handle of type 'a input

    type 'a input
    val spawn : 'a output -> 'a input

The caller can get a value from the coroutine.

    val receive : 'a input -> 'a

A coroutine can return a value of type 'a.

    val send : 'a -> 'a output

It can also exit.  If a caller try to get more values from the
coroutine afterwards, an exception Exited will be raised.

    exception Exited
    val exit : 'a output

Coroutines can be combined sequentially: "e >> fun  () -> e'" is a
routine that behaves first as "e", then as "e'".

    val (>>) : 'a output -> (unit -> 'a output) -> 'a output

It is sometimes useful to have a coroutine "nop" that does nothing.
Note that the expression "nop >> fun () -> e" behaves as "e", while
the expression "exit >> (fun ()-> e)" behaves as "exit".

    val nop : 'a output

So for instance, the coroutine f below returns the integers 1, 2 and
3, then exits.

    # let f =
        spawn
          (send 1 >> fun () ->
           send 2 >> fun () ->
           send 3);;
    val f : int Coroutines.input = <abstr>
    # receive f;;
    - : int = 1
    # receive f;;
    - : int = 2
    # receive f;;
    - : int = 3
    # receive f;;
    Exception: Coroutines.Exited.

You can define a coroutine "indexed ~len:n f" which returns
successively "f 0", "f 1", ... "f (n - 1)".

    let rec indexed_rec f n l =
      if n = l then exit else
      send (f n) >> fun () ->
      indexed_rec f (n + 1) l

    let indexed ?(len = -1) f = spawn (indexed_rec f 0 len)

You can use it to define an array iterator.

    let array_iterator a = indexed ~len:(Array.length a) (fun i -> a.(i))

So, "array_iterator [|1;2;3|]" will behave just the same as "f" above.

The coroutine below returns all integers starting from "first".

    let integers first = indexed (fun i -> i + first)

As a more complex example, here is the Erathostenes sieve implemented
using coroutines.

    let rec filter_rec p st =
      let v = receive st in
      (if p v then send v else nop) >> fun () ->
      filter_rec p st

    let filter p st = spawn (filter_rec p st)

    let filter_multiples n st = filter (fun m -> not (m mod n = 0)) st

    let rec primes_rec st =
      let n = receive st in
      send n >> fun () ->
      primes_rec (filter_multiples n st)

    let primes = spawn (primes_rec (integers 2))
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Coroutines
  2002-10-06 18:13 ` [Caml-list] Coroutines Jerome Vouillon
@ 2002-10-06 19:46   ` Chris Hecker
  2002-10-12 15:58   ` John Max Skaller
  1 sibling, 0 replies; 8+ messages in thread
From: Chris Hecker @ 2002-10-06 19:46 UTC (permalink / raw)
  To: Jerome Vouillon; +Cc: caml-list


>I just spent a few hours implementing a small coroutine library.  It
>is fully written in Ocaml.  Below is a quick description.  Would it
>satisfy your needs ? I can send you a copy, or make it available on the
>Web if you like.

Ah, it looks like your library is done with cps.  The problem is, how would 
you yield in the middle of a for-loop or something?  I assume the answer is 
"don't do that" :), which is valid, but a bit frustrating if you just want 
to turn any bit of code into a fiber and be able to yield anywhere it's 
convenient.  On the upside, you can do yours in vanilla caml.  Definitely 
post your library on the web, I'd be interested in seeing it.  Thanks!

I realized after I sent my post about the stack that I can easily prototype 
my Fiber library with Threads.  I'll call what I want Fibers to 
differentiate them from Coroutines where there's value passing a la Knuth, 
and to imply that Fibers are imperative in nature.  I can make a thread per 
fiber, and then have a mutex per thread, and have my yield/switch calls do 
the right mutex silliness to make the threads behave cooperatively rather 
than preemptively.

This is a bit heavyweight for what I want in the systhreads case (I'd like 
Fibers to be very lightweight and cheap, since they don't need any os 
machinery, just a quick context load), but it will allow a fully functional 
(er, operational :) prototype so I can see if this is what I really 
want.  I could ease a bit of the efficiency concerns by adding a thread api 
for turning off the tick thread in the case where no "real" threads are 
created, since mine will never be able to be preempted anyway so setting 
young_limit and the signal is just a waste of time.

Chris


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] Coroutines
  2002-10-06 18:13 ` [Caml-list] Coroutines Jerome Vouillon
  2002-10-06 19:46   ` Chris Hecker
@ 2002-10-12 15:58   ` John Max Skaller
  1 sibling, 0 replies; 8+ messages in thread
From: John Max Skaller @ 2002-10-12 15:58 UTC (permalink / raw)
  To: caml-list

Jerome Vouillon wrote:

> On Fri, Oct 04, 2002 at 03:25:44AM -0700, Chris Hecker wrote:
> 
>>I'm looking at implementing fibers/coroutines/cooperative-threads 
>>(described below for reference) in ocaml, and I've run into a small issue, 
>>a question, and a couple confusions.
>>
> 
> I just spent a few hours implementing a small coroutine library.  It
> is fully written in Ocaml.  Below is a quick description.  Would it
> satisfy your needs ? I can send you a copy, or make it available on the
> Web if you like.
> 
> -- Jérôme
> 
> A coroutine returning values of type 'a has type 'a output.
> 
>     type 'a output
> 
> You can spawn it.  You then get an handle of type 'a input
> 
>     type 'a input
>     val spawn : 'a output -> 'a input
> 
> The caller can get a value from the coroutine.
> 
>     val receive : 'a input -> 'a
> 
> A coroutine can return a value of type 'a.
> 
>     val send : 'a -> 'a output
> 


These aren't general coroutines, but a subclass I'd call
'filters' corresponding to unix processes with standard input
and output.

[Felix coroutines are even more restricted .. they can read input,
but there is no provision for output.]


Despite the apparent lack of generality,
a very wide class of coroutines can be implemented by careful
choice of the data type 'a, and 'hand written' switches which
provide further coroutine emulation within each library coroutine.

So I think this library has a good interface, indeed,
I'd consider it for the standard distribution. It is simple
and lightweight.

BTW: I note the interface is 'demand driven' .. you make it
work by reading data from the last stage of a filter chain,
and control ripples back to the first input stage.

Felix, on the other hand, is event driven .. you make it
works by sending data to coroutines ..

'Real' coroutines can read and write on multiple channels,
exactly like threads with blocking I/O  except there
is no preemptive switching, obviating the need for
thread safe coding.

Two well known (and very good) implementations:
Windows 3.1 and early Mac OS.

Finally .. you might want to consider JoCaml ..
it is very beautiful (though the implementation using
Posix threads is likely to be slow compared to
native exchange of control primitives)


-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] gc question: thread stacks, fibers, etc.
  2002-10-04 10:25 [Caml-list] gc question: thread stacks, fibers, etc Chris Hecker
  2002-10-06 18:13 ` [Caml-list] Coroutines Jerome Vouillon
@ 2002-10-12 16:33 ` John Max Skaller
  2002-10-12 18:54   ` Chris Hecker
  2002-10-13  8:32 ` Xavier Leroy
  2 siblings, 1 reply; 8+ messages in thread
From: John Max Skaller @ 2002-10-12 16:33 UTC (permalink / raw)
  To: caml-list

Chris Hecker wrote:

> 
> Fibers/Coroutines/Cooperative-Threads

> In C, you implement them by mallocing a stack and with a little snippet 
> of asm, or mucking with the jmpbuf fields.  They're pretty trivial to 
> get working fairly robustly


.. and fairly useless in most demanding applications due to the
impossibility of dynamically extending/shrinking the stack.
If that isn't a problem .. you might as well just use
real threads ..

Felix solves this problem by using heap allocated stack
frames .. there is a cost in that many procedure calls
must be done indirectly via a driver routine which
maintains the top of stack pointer for each thread.
It also preserves high efficiency by disallowing
blocking I/O (yields) in functional code, which use
the machine stack as usual.

BTW: I consider the technique a case of continuations,
but it is probably more correct to consider it
as a system of resumptions.



-- 
John Max Skaller, mailto:skaller@ozemail.com.au
snail:10/1 Toxteth Rd, Glebe, NSW 2037, Australia.
voice:61-2-9660-0850


-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] gc question: thread stacks, fibers, etc.
  2002-10-12 16:33 ` [Caml-list] gc question: thread stacks, fibers, etc John Max Skaller
@ 2002-10-12 18:54   ` Chris Hecker
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Hecker @ 2002-10-12 18:54 UTC (permalink / raw)
  To: John Max Skaller, caml-list


>>In C, you implement them by mallocing a stack and with a little snippet 
>>of asm, or mucking with the jmpbuf fields.  They're pretty trivial to get 
>>working fairly robustly
>.. and fairly useless in most demanding applications due to the
>impossibility of dynamically extending/shrinking the stack.
>If that isn't a problem .. you might as well just use
>real threads ..

Hmm, not sure I understand the last sentence there.

If you care about not having a fixed stack, you can handle it in the same 
way a thread handles its stack if you're willing to write os-dependent code 
(set guard pages, catch exceptions, etc.).  In ocaml, you can probably 
realloc in a non-os-dependent way fairly effectively, because you've got 
all the stack frames around in the frame tables and there are no pointers 
into the stack (well, except if you've called C functions in the middle...hmm).

Ignoring that, one reason you want fibers instead of threads is that you 
don't want preemption for some problems because it complicates your code to 
handle the asynchrony.  If you're saying you should just use non-preemtive 
threads (or make them non-preemptive with mutexes), well that's what I'm 
calling fibers.

Chris

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] gc question: thread stacks, fibers, etc.
  2002-10-04 10:25 [Caml-list] gc question: thread stacks, fibers, etc Chris Hecker
  2002-10-06 18:13 ` [Caml-list] Coroutines Jerome Vouillon
  2002-10-12 16:33 ` [Caml-list] gc question: thread stacks, fibers, etc John Max Skaller
@ 2002-10-13  8:32 ` Xavier Leroy
  2002-10-14  7:18   ` Chris Hecker
  2 siblings, 1 reply; 8+ messages in thread
From: Xavier Leroy @ 2002-10-13  8:32 UTC (permalink / raw)
  To: Chris Hecker; +Cc: caml-list, buzzard

> Is there a way to make the stack just another heap object?  I could easily 
> put the pointer to the stack in an Abstract_tag block, but then the stack 
> won't get scanned by the gc at all, which is bad.

I'm not sure whether you're talking about bytecode stacks or
native-code stacks.  For bytecode stacks, all stack slots contain
valid Caml values, hence you could conceptually store the stack in a
zero-tagged block and let the GC scan it.  There are several
"gotchas", though:
- The GC will scan the whole memory block, not stopping at the stack
pointer.  Hence, some stack slots that have been popped already will
be treated as live pointers, delaying the collection of otherwise dead
blocks.
- Exception handler blocks in the stack are chained using direct
pointers.  Hence, relocating the stack (like the minor copying
collector and the heap compactor do) will screw this chaining.

Native-code stacks are much more complex and absolutely require a
special scanning function that interprets the stack frame maps
generated by ocamlopt.

> It seems like what I 
> want is another kind of block or another function in the custom_operations 
> for "let me manually scan the opaque data in this block, which itself 
> points to data that contains values, and if it's all garbage, finalize me".

This is one way to go about your problem, but such custom scanning
functions are not currently supported.

An alternative is to allocate the stack outside the heap, and scan it
via scanning hooks like the thread library does, but manipulate it
from the Caml side through a proxy, heap-allocated custom block.  The
custom block has a finalization function that the GC will call when no
reference to the proxy (i.e. to the fiber) remain.  The finalization
function can then decide to kill the fiber (free its stack) if it so
pleases.

> My next question is a clarification of the way caml works:  the stack can 
> contain values, but never blocks, right?  In C terms, it can contain 
> pointers but not actual objects that other people can point to?  All actual 
> boxed data is on the heap, right?  So, I can just delete a fiber's stack 
> and it'll work?  

Correct.  OCaml never allocates GCed memory blocks in the stack, hence
there are no pointers from the heap into the stack.

> Finally, a couple confusions:
> - final_fun in caml_thread_handle in win32.c is not used?

It is just a placeholder where alloc_final will store a pointer to a
finalization function (caml_thread_finalize).  This should be
rewritten to use the new "custom blocks" API.

> - HANDLE in caml_thread_handle in win32 seems odd, since it's scanned by 
> the gc, yet it's a windows handle.

No, it's not scanned by the GC because it resides in a block with tag
Custom_tag.

Hope this answers your questions,

- Xavier Leroy
-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Caml-list] gc question: thread stacks, fibers, etc.
  2002-10-13  8:32 ` Xavier Leroy
@ 2002-10-14  7:18   ` Chris Hecker
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Hecker @ 2002-10-14  7:18 UTC (permalink / raw)
  To: Xavier Leroy; +Cc: caml-list


>An alternative is to allocate the stack outside the heap, and scan it
>via scanning hooks like the thread library does, but manipulate it
>from the Caml side through a proxy, heap-allocated custom block.  The
>custom block has a finalization function that the GC will call when no
>reference to the proxy (i.e. to the fiber) remain.  The finalization
>function can then decide to kill the fiber (free its stack) if it so
>pleases.

But this doesn't solve the original problem of the fiber containing a 
reference to itself on its stack (or any fiber points to any other fiber), 
right?  The scanning hook functions assume they're scanning roots, so they 
can't check for circularity and release the whole mess.  The function I 
need is something that says "the values I'm sending to the scanner are not 
stored in roots, so check for circularities that includes them".  Or 
something like that.  Make sense?  Perhaps there's a way to hook that in 
the gc?  Maybe I'm missing something.

Anyway, ignoring the circularity problem for the moment, deleting the stack 
from under a fiber works because the stack itself can never have a pointer 
from the heap into it, so it's always okay to delete (according to your 
answer to my other question).  I guess you'd have to be careful that no C 
functions that were still on the stack of that fiber had ever passed a 
pointer into their stack to anyone who keeps it elsewhere (which in 
ordinary cases is okay, since when writing the C code you assume you'll get 
returned-to in an orderly fashion).  Maybe that's an okay constraint to 
place on code, but a bug of that nature would certainly be hard to find, so 
maybe it's not okay (this is why people shouldn't kill threads, nb. the 
other thread  :).

Hmm, now that I think about it, I can just use an exception for this, 
exactly like the other caml-list discussion was wishing it could for 
threads.  In other words, I'm only ever going to consider toasting a fiber 
when it's yielded, so when the fiber gets resumed it'll be in the yield 
code.  That code can check whether the gc has requested finalization of the 
fiber, and throw an exception back up through the fiber.  That will solve 
all the stack issues, because the exception will unwind it all, and the 
fiber can catch it if it wants and do any cleanup.

Yes, that's much cleaner than just killing it anyway.  That just leaves the 
circularity issue above.

Thanks!
Chris

-------------------
To unsubscribe, mail caml-list-request@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2002-10-14  7:18 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-10-04 10:25 [Caml-list] gc question: thread stacks, fibers, etc Chris Hecker
2002-10-06 18:13 ` [Caml-list] Coroutines Jerome Vouillon
2002-10-06 19:46   ` Chris Hecker
2002-10-12 15:58   ` John Max Skaller
2002-10-12 16:33 ` [Caml-list] gc question: thread stacks, fibers, etc John Max Skaller
2002-10-12 18:54   ` Chris Hecker
2002-10-13  8:32 ` Xavier Leroy
2002-10-14  7:18   ` Chris Hecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).