The Unix Heritage Society mailing list
 help / color / mirror / Atom feed
From: Warner Losh <imp@bsdimp.com>
To: Noel Chiappa <jnc@mercury.lcs.mit.edu>
Cc: The Eunuchs Hysterical Society <tuhs@tuhs.org>
Subject: Re: [TUHS] "Fork considered harmful"
Date: Fri, 12 Apr 2019 09:33:35 -0600	[thread overview]
Message-ID: <CANCZdfqQXenzx3uN-EL1js6NSbn_9Pz6T5DKMrcL38UB4AAWfQ@mail.gmail.com> (raw)
In-Reply-To: <20190412145102.4876F18C0A9@mercury.lcs.mit.edu>

[-- Attachment #1: Type: text/plain, Size: 3696 bytes --]

On Fri, Apr 12, 2019 at 8:51 AM Noel Chiappa <jnc@mercury.lcs.mit.edu>
wrote:

>     > From: Richard Salz
>
>     > Any view on this?
>     >
> https://www.microsoft.com/en-us/research/publication/a-fork-in-the-road/
>
> Having read this, and seen the subsequent discussion, I think both sides
> have
> good points.
>
> What I perceive to be happening is something I've described previously, but
> never named, which is that as a system scales up, it can be necessary to
> take
> one subsystem which did two things, and split it up so there's a custom
> subsystem for each.
>
> I've seen this a lot in networking; I've been trying to remember some of
> the
> examples I've seen, and here's the best one I can come up with at the
> moment:
> having the routing track 'unique-ID network interface names' (i.e.
> interface
> 'addresses') - think 48-bit IEEE interface IDs' - directly. In a small
> network, this works fine for routing traffic, and as a side-benefit, gives
> you
> mobility. Doesn't scale, though - you have to build an 'interface ID to
> location name mapping system', and use 'location names' (i.e. 'addresses')
> in
> the routing.
>
> So classic Unix 'fork' does two things: i) creates a new process, and ii)
> replicates
> the environment/etc of an existing process. (In early Unix, the latter was
> pretty
> simple, but as the paper points out, it has now become a) complex and b)
> expensive.)
>

Signals, fds, address space, copy vs share, COW vs copy now, etc are all
things. Also I'd split hairs on (i): you need some way to create a new
thread of execution within a process, which is where a lot of the focus of
criticisms of fork has focused on the past.


> I think the answer has to include decomposing the functionality of old
> fork()
> into several separate sub-primitives (albeit not all necessarily directly
> accessible to the user): a new-process primitive, which can be bundled
> with a
> number of different alternatives (e.g. i) exec(), ii) environment
> replication,
> iii) address-space replication, etc) - perhaps more than one at once.
>
> So that shell would want a form of fork() which bundled in i) and ii), but
> large applications might want something else. And there might be several
> variants of ii), e.g. one might replicate only environment variables,
> another
> might add I/O channels, etc.
>
> In a larger system, there's just no 'one size fits all' answer, I think.
>

Agreed. We've already seen that happening, some examples are quite old. We
had vfork() (dating back to 3BSD) which tried to optimize the duplication
stuff. More recently, rfork() (plan9 and later BSD) and clone() (Linux) [*]
have been used to specify what parts of process are copied and/or shared to
allow, among other things, light weight threads to be one of the possible
answers, to allow the fork to happen asynchronously, etc. Linux has a bunch
of other variants as well.

fork as a boogie man is a well known trope, honestly. Criticism of it, and
solutions for it's all-or-nothing approach have been proffered for a long
time. These solutions range from having the helper child process to spawn
other things a more complex process wants, to specialized ways to create
threads (which are process-like things that share an address space and
benefit from special handling in the kernel), to things like rfork or clone
that try to pick-and-choose what aspects of process duplication are needed.
There's a reason that the clone man page is maybe 10x longer than the
classic fork man page.

Warner

[*] This doesn't even begin to look at things like what Solaris, Irix, or a
dozen other unix derivatives did to create threads and/or optimize
different use cases of fork..

[-- Attachment #2: Type: text/html, Size: 4600 bytes --]

  reply	other threads:[~2019-04-12 15:34 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 14:51 Noel Chiappa
2019-04-12 15:33 ` Warner Losh [this message]
2019-04-12 19:55 ` Dan Cross
  -- strict thread matches above, loose matches on Subject: below --
2019-04-10 23:06 Richard Salz
2019-04-10 23:24 ` Bakul Shah
2019-04-10 23:37   ` George Michaelson
2019-04-11 11:38     ` Tony Finch
2019-04-11 23:37 ` Chris Hanson
2019-04-12  0:12   ` Derek Fawcus
2019-04-12 16:11 ` Jim Capp

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANCZdfqQXenzx3uN-EL1js6NSbn_9Pz6T5DKMrcL38UB4AAWfQ@mail.gmail.com \
    --to=imp@bsdimp.com \
    --cc=jnc@mercury.lcs.mit.edu \
    --cc=tuhs@tuhs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).