Slightly OT question: I often want to parallelize algorithms in computational biology in which (a) the parallel computations take a long time (seconds/minutes) to complete, (b) they use a very large heap (gigs) of immutable data, and (c) they don't really need to synchronize at intermediate points in the computation. This seems best accomplished with fork(). What would be your favorite way to collect the results from the child processes?

Right now I have a hacked up thing to marshal them through pipes. The parent process reads the values from the pipes serially, which is obviously sub-optimal, but I was too lazy to write a select() loop. Is this what you would do or can you think of a better way?

For my purposes (embarassingly parallelizable computational biology), a convenient and type-safe little library for doing this would satisfy 80% of my SMP needs.