Faking concurrency using Unix forks and pipes (ray tracing results)

caml-list - the Caml user's mailing list
 help / color / mirror / Atom feed

* Faking concurrency using Unix forks and pipes (ray tracing results)
@ 2007-06-04 10:56 Jon Harrop
  2007-06-04 15:33 ` [Caml-list] " Jonathan Bryant
  0 siblings, 1 reply; 6+ messages in thread
From: Jon Harrop @ 2007-06-04 10:56 UTC (permalink / raw)
  To: caml-list

I just got a first working version of .NET style asynchronous invocation 
working in OCaml using process forking.

The following OCaml function forks a new process and computes "f x" in that 
process, returning a function that blocks and returns the result using 
marshalling.

let invoke (f : 'a -> 'b) x : unit -> 'b =
  let input, output = Unix.pipe() in
  match Unix.fork() with
  | 0 ->
    Unix.close input;
    let output = Unix.out_channel_of_descr output in
    Marshal.to_channel output (try `Res(f x) with e -> `Exn e) [];
    exit 0
  | _ ->
      Unix.close output;
      let input = Unix.in_channel_of_descr input in
      fun () ->
	match Marshal.from_channel input with
	| `Res x -> x
	| `Exn e -> raise e

This function tries to account for reraising exceptions on the parent process 
but that is untested.

You can write a higher-order "map" function in terms of invoke like this:

let ( |> ) x f = f x

let map (f : 'a -> 'b) a : 'b array =
  Array.map (invoke f) a |>
      Array.map (fun f -> f())

When you apply this map to an array, a new process is forked for each element. 
As forking is time consuming, you should only apply this to short arrays.

The performance characteristics of this approach are very interesting. 
Firstly, I can observe doubled performance on my dual core by invoking two 
simple but CPU-intensive operations concurrently:

  map fib [|43; 43|]

However, performance is easily degraded using this approach, partly because 
forking is expensive but also because of other effects that I do not yet 
understand. My original benchmark summed the elements of an array using 
fold_left. For some reason, this is extremely inefficient, as if the entire 
array is copied.

Anyway, this function is so simple that it took no time to work it into my ray 
tracer benchmark. The benefits of concurrency on my dual-core system reduce 
the time taken by OCaml from 4s to 3s.

I'll try a concurrent F# version and see how it compares...

-- 
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?e

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Faking concurrency using Unix forks and pipes (ray tracing results)
  2007-06-04 10:56 Faking concurrency using Unix forks and pipes (ray tracing results) Jon Harrop
@ 2007-06-04 15:33 ` Jonathan Bryant
  2007-06-04 15:39   ` Jonathan Bryant
       [not found]   ` <9b415f950706040850v586a285ax1448d23c0c78a375@mail.gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Jonathan Bryant @ 2007-06-04 15:33 UTC (permalink / raw)
  To: caml-list


On Jun 4, 2007, at 6:56 AM, Jon Harrop wrote:

>
> When you apply this map to an array, a new process is forked for  
> each element.
> As forking is time consuming, you should only apply this to short  
> arrays.

It might be worth your while to implement a process pool.  I said  
before that I fork of one base process from which to fork the others,  
and it's not possible to make that process just a controller for a  
pool of waiting processes.

--Jonathan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Faking concurrency using Unix forks and pipes (ray tracing results)
  2007-06-04 15:33 ` [Caml-list] " Jonathan Bryant
@ 2007-06-04 15:39   ` Jonathan Bryant
       [not found]   ` <9b415f950706040850v586a285ax1448d23c0c78a375@mail.gmail.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Jonathan Bryant @ 2007-06-04 15:39 UTC (permalink / raw)
  To: caml-list


On Jun 4, 2007, at 11:33 AM, Jonathan Bryant wrote:

> process from which to fork the others, and it's not possible to  
> make that process just a controller for a

I don't know why I put the "not" in there...

--Jonathan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Faking concurrency using Unix forks and pipes (ray tracing results)
       [not found]   ` <9b415f950706040850v586a285ax1448d23c0c78a375@mail.gmail.com>
@ 2007-06-04 16:13     ` Jonathan Bryant
       [not found]       ` <9b415f950706040933r22e1560fhc088368ccb8444fa@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Bryant @ 2007-06-04 16:13 UTC (permalink / raw)
  To: Benedikt Grundmann; +Cc: caml-list

On Jun 4, 2007, at 11:50 AM, Benedikt Grundmann wrote:

> Hi Jonathan,
>
> 2007/6/4, Jonathan Bryant <jtbryant@valdosta.edu>:
>>
>> On Jun 4, 2007, at 6:56 AM, Jon Harrop wrote:
>>
>> >
>> > When you apply this map to an array, a new process is forked for
>> > each element.
>> > As forking is time consuming, you should only apply this to short
>> > arrays.
>>
>> It might be worth your while to implement a process pool.  I said
>> before that I fork of one base process from which to fork the others,
>> and it's not possible to make that process just a controller for a
>> pool of waiting processes.
>>
>> --Jonathan
>>
>
> Do you also do that in your work for the OSP?  I'm asking cause you
> mentioned previously that you want to stay compatible to the ocaml
> threads library... and I just can not figure out how you want to
> implement Thread.create using that base process...

I'm staying with the feel of the threads library, but I'm going to  
have to make it incompatible.  My API uses functors to build  
everything up and then resembles the existing API.  I know there has  
been some talk on this list about performance issues with that, but I  
prefer a clean design over speed (at least at the moment).  Mainly  
this is because Marshal is not trustworthy, so every message type  
needs to have to_ and of_ string functions:

module type Message =
	sig
		type 'a t
		val to_string : 'a t -> string
		val of_string : string -> 'a t
	end

 From there it builds up.  I'm actually trying to design it in such a  
way that channels aren't limited to  UNIX/TCP sockets, and that the  
on-wire protocol can be changed as well.  Hopefully that will go as  
planned.

   As for threads, there are actually 3 levels: concurrent (the  
existing threads), parallel (forked processes), and remote (forked  
process on another machine).  The level of concurrency is specified  
at thread creation either by using a special version of Thread.create  
(such as Thread.concurrent or Thread.parallel) or by an optional  
parameter to Thread.create.  This allows the programmer to use the  
mechanism needed (threads/processes) in the same manner.  One  
additional change is that threads can either be attached (return a  
value), or detached (like they are now and return unit), allowing for  
a future-esque usage of threads.  Also, since it's internally just  
HOFs filled in by the runtime, it allows programs written to run on a  
cluster (using remote threads) to run in a single process for testing  
(using just concurrent threads), and then moved to a multi-core /  
multi-machine environment without any recoding or recompilation.

> BTW: Maybe we should subscribe each others project's mailing list? So
> tha twe can help and learn from each other?
>

I'm already on yours :).

Mine's http://groups.google.com/group/osp2007-northpole

 From what you've said, I think we might be covering a lot of the  
same ground.

>
> Here is the link to mine:
>
> http://groups.google.com/group/osp2007-econcurrency
>
> Cheers,
>
> Bene
>
>
> -- 
> Calvin: I try to make everyone's day a little more
> surreal.
>
> (From Calvin & Hobbes)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Faking concurrency using Unix forks and pipes (ray tracing results)
       [not found]       ` <9b415f950706040933r22e1560fhc088368ccb8444fa@mail.gmail.com>
@ 2007-06-04 16:53         ` Jonathan Bryant
  2007-06-04 18:00           ` Brian Hurt
  0 siblings, 1 reply; 6+ messages in thread
From: Jonathan Bryant @ 2007-06-04 16:53 UTC (permalink / raw)
  To: Benedikt Grundmann; +Cc: caml-list

On Jun 4, 2007, at 12:33 PM, Benedikt Grundmann wrote:

>
> It looks a bit more complex, but that's just to avoid big strings in
> case of big messages
> (e.g. with the "simple" interface you end up with the "same" content
> in memory twice).

I think big strings are unavoidable in this case.  They can be broken  
up at the protocol level for sending, but a large data structure is  
going to be marshaled into a big string.  As far as same content in  
memory twice, that should be the case for pure values even in a  
regular OCaml program.  As for mutable values, they shouldn't be sent  
over a channel to begin with.  Once channels are available though,  
creating a synchronous mutable cell is only a few lines of code.   
(Check out Reppy's book/papers).

> I'm trying to do something similar, but I'm not 100% sure right now
> that it will all work out exactly as I initially thought.. :-)    You
> can have a look at my current "interface", I already have a prototype
> commited...  src/mailbox.mli is the "main" interface of the library.
> The comments are not totally in sync with the interface right now as
> I'm restructuring a lot these days.

I need to do a commit.  I'm using modules which return a record of  
functions, and you can compose them to get channels that work over  
multiple mediums.

> Okay I'm still a little bit confused how does your Thread.create looks
> like (e.g what is
> its signature)?

Something like:

module Thread =
	struct
		type con_type =
		| Concurrent
		| Parallel
		| Remote

		let create ?(con = Concurrent) f x = (* Create thread *)
		let concurrent f x = create ~con:Concurrent f x
		let parallel f x = create ~con:Parallel f x
		let remote f x = create ~con:Remote f x

		let create_attached ?(con = Concurrent) f x = (* Create thread and  
return event *)
		let concurrent_attached f x = create ~con:Concurrent f x
		let parallel_attached f x = create ~con:Parallel f x
		let remote_attached f x = create ~con:Remote f x

		(* Other thread functions *)
	end

I've implemented a good chunk of channels/events so far, but haven't  
got to the threads yet, so don't quote me on that exact signature.

--Jonathan

>
> Cheers,
>
> Bene
>
> -- 
> Calvin: I try to make everyone's day a little more
> surreal.
>
> (From Calvin & Hobbes)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Caml-list] Faking concurrency using Unix forks and pipes (ray tracing results)
  2007-06-04 16:53         ` Jonathan Bryant
@ 2007-06-04 18:00           ` Brian Hurt
  0 siblings, 0 replies; 6+ messages in thread
From: Brian Hurt @ 2007-06-04 18:00 UTC (permalink / raw)
  To: Jonathan Bryant; +Cc: Benedikt Grundmann, caml-list

Jonathan Bryant wrote:

>
> On Jun 4, 2007, at 12:33 PM, Benedikt Grundmann wrote:
>
>>
>> It looks a bit more complex, but that's just to avoid big strings in
>> case of big messages
>> (e.g. with the "simple" interface you end up with the "same" content
>> in memory twice).
>
>
> I think big strings are unavoidable in this case.  They can be broken  
> up at the protocol level for sending, but a large data structure is  
> going to be marshaled into a big string.  As far as same content in  
> memory twice, that should be the case for pure values even in a  
> regular OCaml program.  As for mutable values, they shouldn't be sent  
> over a channel to begin with.  Once channels are available though,  
> creating a synchronous mutable cell is only a few lines of code.   
> (Check out Reppy's book/papers).


I'm wondering if inversion of control isn't an answer here.  The idea is 
to have a function of type buf_t -> string -> unit.  Instead of building 
up a giant string, the of_* functions would call this function on small 
strings.  Not unlike Buffers. But instead of building up in memory, 
these function would fill a buffer and then send it off.

Brian


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-06-04 18:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-04 10:56 Faking concurrency using Unix forks and pipes (ray tracing results) Jon Harrop
2007-06-04 15:33 ` [Caml-list] " Jonathan Bryant
2007-06-04 15:39   ` Jonathan Bryant
     [not found]   ` <9b415f950706040850v586a285ax1448d23c0c78a375@mail.gmail.com>
2007-06-04 16:13     ` Jonathan Bryant
     [not found]       ` <9b415f950706040933r22e1560fhc088368ccb8444fa@mail.gmail.com>
2007-06-04 16:53         ` Jonathan Bryant
2007-06-04 18:00           ` Brian Hurt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).