From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on inbox.vuxu.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,MAILING_LIST_MULTI, RCVD_IN_DNSWL_NONE autolearn=ham autolearn_force=no version=3.4.2 Received: from minnie.tuhs.org (minnie.tuhs.org [45.79.103.53]) by inbox.vuxu.org (OpenSMTPD) with ESMTP id 438e126a for ; Fri, 12 Apr 2019 19:57:23 +0000 (UTC) Received: by minnie.tuhs.org (Postfix, from userid 112) id C31F695084; Sat, 13 Apr 2019 05:57:21 +1000 (AEST) Received: from minnie.tuhs.org (localhost [127.0.0.1]) by minnie.tuhs.org (Postfix) with ESMTP id F396395078; Sat, 13 Apr 2019 05:56:35 +1000 (AEST) Authentication-Results: minnie.tuhs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="LfAaakI2"; dkim-atps=neutral Received: by minnie.tuhs.org (Postfix, from userid 112) id 47D1295075; Sat, 13 Apr 2019 05:56:32 +1000 (AEST) Received: from mail-qt1-f181.google.com (mail-qt1-f181.google.com [209.85.160.181]) by minnie.tuhs.org (Postfix) with ESMTPS id F324895074 for ; Sat, 13 Apr 2019 05:56:30 +1000 (AEST) Received: by mail-qt1-f181.google.com with SMTP id h39so12690524qte.2 for ; Fri, 12 Apr 2019 12:56:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0n3sQDEM52UBLdQbzx9yHc879G271+wbzCMLGQBQ/Jg=; b=LfAaakI2wRHbqvUzxgybKOD9RdwDP/xrtStMwpgnd0hvjbXP48p5Qjn0nCmdFuRTn0 2TOiAHZ9w17QoffFv7oJl171H50LeQGC7mwO4RzbjJv5pGqctrPWokEGxWzpfGsYOFs2 lucRlMdzTUus7JDYnpTgUlKhBwDHPced4Yd5HfZGHY2Fd4ZRuh3kocdsP7+Iz4hCM9fJ iYa5OlvyU00v7NGOn7p2c1mRoPORRAoq9/71ob4qTnmfh9Qp9pYSeXmg1IsmTG1exNNW WoQEHtTERORCf2/K3iuK1eILnA/8AQdjVW3jj+4SUkRVU5rEEEPGtqjrAx+S3ypysGve ogxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0n3sQDEM52UBLdQbzx9yHc879G271+wbzCMLGQBQ/Jg=; b=Be7FhNwoMxGywUVwRLE2ln4KfDgorGltmU4EV4EEQJ0IdEuXNNTF2rLGiN4D2wcX/J AG6B27J2KHUJto/K3Nj5eFM9Q34+3mb+eev/qFBoNuQg+PgMC/BFuc+MaEyGknEoXmPF TQtbbdS4j71XRwQQVib9D++J+bDp/BRtdXD9hrDVf5HmkJF6h8fviAy+P4gedkMJb5hB 8drW3CZnjUxYJXIuCRj52oTyPmGentcrT9SaCxDGalf/rDE/ylpCa2cbw+d3RS27dJCF LT232X5kOQ/FJMJj36ID2CPB/f0TNRIqfZu3xFD5r8FrVMrsfIeL2kfb1aBzlwFMl870 hwBw== X-Gm-Message-State: APjAAAXJ8fY6Hm7BGYszUtvaanZRX4obHDEqPMyIqY67tajOfK4fNVDo mvdm4mh2wxrZHWVV0FOiul1hOonjhmg80J0Cn80RVvjE X-Google-Smtp-Source: APXvYqwfCPeM2OD6K7Kops6o57ixDnHz9ynZoGzUQTTysFVRUiYtrYAbQgM5DT6lPUxwqo6Iip1MqFZVMW0xztbUfhk= X-Received: by 2002:a0c:924a:: with SMTP id 10mr47972958qvz.168.1555098989804; Fri, 12 Apr 2019 12:56:29 -0700 (PDT) MIME-Version: 1.0 References: <20190412145102.4876F18C0A9@mercury.lcs.mit.edu> In-Reply-To: <20190412145102.4876F18C0A9@mercury.lcs.mit.edu> From: Dan Cross Date: Fri, 12 Apr 2019 15:55:54 -0400 Message-ID: To: The Eunuchs Hysterical Society Content-Type: multipart/alternative; boundary="000000000000e0c5fb05865ab205" Subject: Re: [TUHS] "Fork considered harmful" X-BeenThere: tuhs@minnie.tuhs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: The Unix Heritage Society mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Noel Chiappa Errors-To: tuhs-bounces@minnie.tuhs.org Sender: "TUHS" --000000000000e0c5fb05865ab205 Content-Type: text/plain; charset="UTF-8" From: Richard Salz > Any view on this? > https://www.microsoft.com/en-us/research/publication/a-fork-in-the-road/ Yes. First, I dislike the presentation of the paper. From the pithy title to snarky section headings to the overly informal register of the writing, I think the authors did themselves a disservice writing in a style that is far too colloquial. The argument is presented with too much editorial comment, as if it's trying to be "edgy" or something. I find that off-putting and annoying at best. But stylistic issues aside, the substance of the paper is largely on point. The fact is that, for better or worse, fork() has not aged gracefully into the modern age of multi-threaded programs, big graphical applications, and the like, and the problems the authors point out are very, very real. An argument can be made along the lines of, "well, just don't write programs that way..." but the universe of interesting and useful userspace programs is now large enough that I'm afraid we're well inside the event horizon of increasing entropy. It's interesting that they make an oblique reference to Austin's dissertation, or at least one of the papers that came out of his work (Clements et al) but they don't go the full way in exploring the implications. On Fri, Apr 12, 2019 at 10:51 AM Noel Chiappa wrote: > Having read this, and seen the subsequent discussion, I think both sides > have > good points. > > What I perceive to be happening is something I've described previously, but > never named, which is that as a system scales up, it can be necessary to > take > one subsystem which did two things, and split it up so there's a custom > subsystem for each. > > I've seen this a lot in networking; I've been trying to remember some of > the > examples I've seen, and here's the best one I can come up with at the > moment: > having the routing track 'unique-ID network interface names' (i.e. > interface > 'addresses') - think 48-bit IEEE interface IDs' - directly. In a small > network, this works fine for routing traffic, and as a side-benefit, gives > you > mobility. Doesn't scale, though - you have to build an 'interface ID to > location name mapping system', and use 'location names' (i.e. 'addresses') > in > the routing. > > So classic Unix 'fork' does two things: i) creates a new process, and ii) > replicates > the environment/etc of an existing process. (In early Unix, the latter was > pretty > simple, but as the paper points out, it has now become a) complex and b) > expensive.) > > I think the answer has to include decomposing the functionality of old > fork() > into several separate sub-primitives (albeit not all necessarily directly > accessible to the user): a new-process primitive, which can be bundled > with a > number of different alternatives (e.g. i) exec(), ii) environment > replication, > iii) address-space replication, etc) - perhaps more than one at once. > This is approximately what was done in Akaros. The observation, in the MSR paper and elsewhere, that fork() is a poor abstraction because it tries to do too much and doesn't interact well with threads (which Akaros was all about) is essentially correct and created undue burdens on the system's implementors. The solution there was to split process creation into two steps: creation proper, and marking a process runnable for the first time. This gave a parent an opportunity to create a child process and then populate its file descriptors, environment variables, and so forth before setting it loose on the system: one got much of the elegance of the fork() model, but without the bad behavior. The critical observation is that a hard distinction between fork()/exec() on one side and spawn() on the other as the only two possible designs is unnecessary; an intermediate step in the spawn() model preserving the two-step nature of the fork()/exec() model yields many of the benefits of both, without many of the problems of one or the other. So that shell would want a form of fork() which bundled in i) and ii), but > large applications might want something else. And there might be several > variants of ii), e.g. one might replicate only environment variables, > another > might add I/O channels, etc. > > In a larger system, there's just no 'one size fits all' answer, I think. > This is, I think, the crux of the argument: Unix fork, as introduced on the PDP-7, wasn't designed for "large" systems. I'd be curious to know how much intention was behind the consequences of fork()'s behavior were known in advance. As Dennis Ritchie's document on early Unix history (well known to most of the denizens of this list) pointed out, the implementation was an expedient, and the construct was loosely based on prior art (Genie etc). Was the idea of the implicit dup()'ing of open file descriptors something that people thought about consciously at the time? Certainly, once it was there, people found out that they could exploit it in clever ways for e.g. IO redirection, pipes (which came later, of course), etc, but I wonder to what extent that was discovered as opposed to being an explicit design objective. Put another way, the thought process may have been along the lines of, "look at the neat properties that fall out of this thing we've already got..." as opposed to, "we designed this thing to have all these neat properties..." Much of the current literature and extant course material seems to take the latter tack, but it's not at all clear (to me, anyway) that that's an accurate reflection of the historical reality. I briefly mentioned Clements's dissertation above. In essence, the scalability commutativity rule says that interfaces that commute scale better than those that do not because they do not _require_ serialization, so they can be parallelized etc. Many of the algorithms in early Unix do not commute: the behavior of returning the lowest available number when allocating a file descriptor is an example (consider IO redirection here and the familiar sequence, "close(1), open("output", O_WRONLY)": this doesn't work if one does the close after the open, etc). But fork() and exec() might be the ultimate example. Obviously if I exec() before I fork() my semantics are very, very different, but more generally the commutativity must be considered in some context: operations on file descriptors don't commute with one another, but they do commute with, say, memory writes. On the other hand, fork() doesn't commute with much of anything at all. The early Unix algorithms are so incredibly simple (one can easily imagine the loop over a process's file descriptor table, for example), but one can't help but wonder if there was any sense of what the consequences of the details of those algorithms leaking out to user programs would be 50 years later. Surely not? - Dan C. --000000000000e0c5fb05865ab205 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
From: Ric= hard Salz
Any view o= n this?
https://www.microsoft.com/en-us/rese= arch/publication/a-fork-in-the-road/

Ye= s.

First, I dislike the presentation of the paper.= From the pithy title to snarky section headings to the overly informal reg= ister of the writing, I think the authors did themselves a disservice writi= ng in a style that is far too colloquial. The argument is presented with to= o much editorial comment, as if it's trying to be "edgy" or s= omething. I find that off-putting and annoying at best.

But stylistic issues aside, the substance of the paper is largely on = point. The fact is that, for better or worse, fork() has not aged gracefull= y into the modern age of multi-threaded programs, big graphical application= s, and the like, and the problems the authors point out are very, very real= . An argument can be made along the lines of, "well, just don't wr= ite programs that way..." but the universe of interesting and useful u= serspace programs is now large enough that I'm afraid we're well in= side the event horizon of increasing entropy.

It&#= 39;s interesting that they make an oblique reference to Austin's disser= tation, or at least one of the papers that came out of his work (Clements e= t al) but they don't go the full way in exploring the implications.

On Fri, Apr 12, 2019 at 10:51 AM Noel Chiappa <jnc@mercury.lcs.m= it.edu> wrote:
So that shell would want a form of fork() which bundled in i) and ii), but<= br> large applications might want something else. And there might be several variants of ii), e.g. one might replicate only environment variables, anoth= er
might add I/O channels, etc.

In a larger system, there's just no 'one size fits all' answer,= I think.

This is, I think, the crux of= the argument: Unix fork, as introduced on the PDP-7, wasn't designed f= or "large" systems. I'd be curious to know how much intention= was behind the consequences of fork()'s behavior were known in advance= . As Dennis Ritchie's document on early Unix history (well known to mos= t of the denizens of this list) pointed out, the implementation was an expe= dient, and the construct was loosely based on prior art (Genie etc). Was th= e idea of the implicit dup()'ing of open file descriptors something tha= t people thought about consciously at the time? Certainly, once it was ther= e, people found out that they could exploit it in clever ways for e.g. IO r= edirection, pipes (which came later, of course), etc, but I wonder to what = extent that was discovered as opposed to being an explicit design objective= .

Put another way, the thought process may have be= en along the lines of, "look at the neat properties that fall out of t= his thing we've already got..." as opposed to, "we designed t= his thing to have all these neat properties..." Much of the current li= terature and extant course material seems to take the latter tack, but it&#= 39;s not at all clear (to me, anyway) that that's an accurate reflectio= n of the historical reality.

I briefly mentioned C= lements's dissertation above. In essence, the scalability commutativity= rule says that interfaces that commute scale better than those that do not= because they do not _require_ serialization, so they can be parallelized e= tc. Many of the algorithms in early Unix do not commute: the behavior of re= turning the lowest available number when allocating a file descriptor is an= example (consider IO redirection here and the familiar sequence, "clo= se(1), open("output", O_WRONLY)": this doesn't work if o= ne does the close after the open, etc). But fork() and exec() might be the = ultimate example. Obviously if I exec() before I fork() my semantics are ve= ry, very different, but more generally the commutativity must be considered= in some context: operations on file descriptors don't commute with one= another, but they do commute with, say, memory writes. On the other hand, = fork() doesn't commute with much of anything at all.

The early Unix algorithms are so incredibly simple (one can easily i= magine the loop over a process's file descriptor table, for example), b= ut one can't help but wonder if there was any sense of what the consequ= ences of the details of those algorithms leaking out to user programs would= be 50 years later. Surely not?

=C2=A0 =C2=A0 =C2= =A0 =C2=A0 - Dan C.

--000000000000e0c5fb05865ab205--