On Fri, Apr 12, 2019 at 8:51 AM Noel Chiappa wrote: > > From: Richard Salz > > > Any view on this? > > > https://www.microsoft.com/en-us/research/publication/a-fork-in-the-road/ > > Having read this, and seen the subsequent discussion, I think both sides > have > good points. > > What I perceive to be happening is something I've described previously, but > never named, which is that as a system scales up, it can be necessary to > take > one subsystem which did two things, and split it up so there's a custom > subsystem for each. > > I've seen this a lot in networking; I've been trying to remember some of > the > examples I've seen, and here's the best one I can come up with at the > moment: > having the routing track 'unique-ID network interface names' (i.e. > interface > 'addresses') - think 48-bit IEEE interface IDs' - directly. In a small > network, this works fine for routing traffic, and as a side-benefit, gives > you > mobility. Doesn't scale, though - you have to build an 'interface ID to > location name mapping system', and use 'location names' (i.e. 'addresses') > in > the routing. > > So classic Unix 'fork' does two things: i) creates a new process, and ii) > replicates > the environment/etc of an existing process. (In early Unix, the latter was > pretty > simple, but as the paper points out, it has now become a) complex and b) > expensive.) > Signals, fds, address space, copy vs share, COW vs copy now, etc are all things. Also I'd split hairs on (i): you need some way to create a new thread of execution within a process, which is where a lot of the focus of criticisms of fork has focused on the past. > I think the answer has to include decomposing the functionality of old > fork() > into several separate sub-primitives (albeit not all necessarily directly > accessible to the user): a new-process primitive, which can be bundled > with a > number of different alternatives (e.g. i) exec(), ii) environment > replication, > iii) address-space replication, etc) - perhaps more than one at once. > > So that shell would want a form of fork() which bundled in i) and ii), but > large applications might want something else. And there might be several > variants of ii), e.g. one might replicate only environment variables, > another > might add I/O channels, etc. > > In a larger system, there's just no 'one size fits all' answer, I think. > Agreed. We've already seen that happening, some examples are quite old. We had vfork() (dating back to 3BSD) which tried to optimize the duplication stuff. More recently, rfork() (plan9 and later BSD) and clone() (Linux) [*] have been used to specify what parts of process are copied and/or shared to allow, among other things, light weight threads to be one of the possible answers, to allow the fork to happen asynchronously, etc. Linux has a bunch of other variants as well. fork as a boogie man is a well known trope, honestly. Criticism of it, and solutions for it's all-or-nothing approach have been proffered for a long time. These solutions range from having the helper child process to spawn other things a more complex process wants, to specialized ways to create threads (which are process-like things that share an address space and benefit from special handling in the kernel), to things like rfork or clone that try to pick-and-choose what aspects of process duplication are needed. There's a reason that the clone man page is maybe 10x longer than the classic fork man page. Warner [*] This doesn't even begin to look at things like what Solaris, Irix, or a dozen other unix derivatives did to create threads and/or optimize different use cases of fork..